| 1 |
Impact of Task Phrasing on Presumptions in Large Language Models |
研究表明任务措辞会影响大语言模型中的预设,降低其适应性 |
large language model |
|
|
| 2 |
ML-Bench&Guard: Policy-Grounded Multilingual Safety Benchmark and Guardrail for Large Language Models |
提出ML-Bench与ML-Guard,解决大语言模型在多语言环境下安全对齐区域法规与文化差异的难题。 |
large language model |
|
|
| 3 |
SC-Taxo: Hierarchical Taxonomy Generation under Semantic Consistency Constraints using Large Language Models |
提出SC-Taxo框架,利用大语言模型生成语义一致的层级化科学分类体系。 |
large language model |
|
|
| 4 |
FollowTable: A Benchmark for Instruction-Following Table Retrieval |
提出FollowTable基准,用于评估模型在指令约束下的表格检索能力,填补了现有方法对细粒度指令理解的不足。 |
instruction following |
|
|
| 5 |
Prompt-Induced Score Variance in Zero-Shot Binary Vision-Language Safety Classification |
揭示零样本视觉-语言模型安全分类中提示词诱导的分数方差问题,并提出均值集成方法。 |
multimodal |
|
|
| 6 |
Adaptive Querying with AI Persona Priors |
提出基于AI Persona先验的自适应查询方法,解决用户依赖型兴趣学习问题 |
large language model |
|
|
| 7 |
Escaping Mode Collapse in LLM Generation via Geometric Regulation |
提出RMR几何调控方法,解决LLM生成中模式崩塌问题,提升生成质量。 |
large language model |
|
|
| 8 |
When LLMs Stop Following Steps: A Diagnostic Study of Procedural Execution in Language Models |
诊断性研究揭示LLM在程序执行中存在的步骤遵循问题 |
large language model |
|
|
| 9 |
Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs |
构建MathArena平台,持续评估LLM在数学领域的推理能力 |
large language model |
|
|
| 10 |
AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs |
AGoQ:通过激活和梯度量化实现LLM分布式训练的内存高效性 |
large language model |
|
|
| 11 |
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost? |
ReLay:个性化LLM生成的简明语言摘要,提升理解但需权衡安全性 |
large language model |
|
|
| 12 |
Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning |
提出TokenUnlearn,通过token级别归因实现语言模型精准不可学习,提升遗忘效果和效用保持。 |
large language model |
|
|
| 13 |
Block-wise Codeword Embedding for Reliable Multi-bit Text Watermarking |
提出BREW框架,解决多比特文本水印中高误报率问题,实现可靠的指定验证。 |
large language model |
|
|