| 1 |
Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring |
提出V-Skip,通过双路径锚定解决多模态CoT推理中的视觉失忆问题,实现高效压缩。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs |
FutureOmni:首个面向多模态LLM的、评估全模态上下文未来预测能力的基准 |
large language model multimodal |
✅ |
|
| 3 |
Pro-AI Bias in Large Language Models |
揭示大型语言模型中存在的亲AI偏见,可能影响决策。 |
large language model |
|
|
| 4 |
RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models |
RECAP:一种资源高效的LLM对抗提示方法,通过检索复用降低计算成本 |
large language model |
|
|
| 5 |
Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law |
利用合成数据微调大语言模型,提升其在德国法律领域的问答能力 |
large language model |
|
|
| 6 |
Towards robust long-context understanding of large language model via active recap learning |
提出主动回顾学习(ARL)框架,增强LLM对长文本的理解能力。 |
large language model |
|
|
| 7 |
No Reliable Evidence of Self-Reported Sentience in Small Large Language Models |
通过内部激活分类验证,小型LLM自述无意识 |
large language model |
|
|
| 8 |
Large Language Models for Large-Scale, Rigorous Qualitative Analysis in Applied Health Services Research |
提出人机协同框架,利用大语言模型高效严谨地进行大规模定性健康服务研究 |
large language model |
|
|
| 9 |
BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models |
BACH-V:构建大语言模型中抽象与具体人类价值观的桥梁 |
large language model |
|
|
| 10 |
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models |
提出“定位、引导、改进”框架,实现大语言模型可操作的机制可解释性 |
large language model |
✅ |
|
| 11 |
OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models |
OpenLearnLM:用于评估教育大语言模型知识、技能和态度的统一基准 |
large language model |
|
|
| 12 |
Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models |
提出AAAC框架,通过激活空间锚定实现大语言模型多类别权限控制 |
large language model |
|
|
| 13 |
Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants |
剖析扩散语言模型未来发展十大挑战,探索超越自回归范式的AI新方向 |
large language model multimodal |
|
|
| 14 |
NewsRECON: News article REtrieval for image CONtextualization |
NewsRECON:提出一种新闻文章检索方法,用于图像上下文推断,解决反向图像搜索失效问题。 |
large language model multimodal |
|
|
| 15 |
Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues |
提出TRACE框架以实现高效的人类对齐语音评估 |
large language model chain-of-thought |
|
|
| 16 |
CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks |
提出 CommunityBench,用于评估 LLM 在不同群体和任务中的社区层面价值观对齐能力 |
large language model foundation model |
|
|
| 17 |
Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning |
提出GRADFILTERING,利用不确定性指导指令调优数据选择,提升LLM效率。 |
large language model |
|
|
| 18 |
XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs |
提出XCR-Bench基准,用于评估大型语言模型中的文化推理能力 |
large language model |
|
|
| 19 |
OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents |
提出OP-Bench基准测试集,用于评估记忆增强对话Agent中的过度个性化问题 |
large language model |
|
|
| 20 |
Simulated Ignorance Fails: A Systematic Study of LLM Behaviors on Forecasting Problems Before Model Knowledge Cutoff |
揭示大语言模型预测中“模拟无知”的局限性,不建议用于回顾性基准测试。 |
chain-of-thought |
|
|
| 21 |
Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge |
研究发现成对LLM评判器存在显著的语言偏见,并分析了其与困惑度的关系 |
large language model |
|
|
| 22 |
TREX: Tokenizer Regression for Optimal Data Mixture |
TREX:通过Tokenizer回归优化数据混合比例,提升多语言LLM分词器效率 |
large language model |
|
|
| 23 |
When Wording Steers the Evaluation: Framing Bias in LLM judges |
揭示LLM评判中的措辞偏差:提示框架影响LLM评判结果 |
large language model |
|
|
| 24 |
Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks |
微调LLM提升统计推理能力,可用于教育和自动化评估 |
large language model |
|
|
| 25 |
HALT: Hallucination Assessment via Latent Testing |
HALT:通过隐空间测试评估大语言模型的幻觉问题 |
large language model |
|
|
| 26 |
From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs |
利用集成语言模型对政治辩论进行轴向编码,实现从引言到概念的转换 |
large language model |
|
|
| 27 |
GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark |
提出GerAV:一个用于德语作者身份验证的新基准,并利用微调LLM达到新高度 |
large language model |
|
|
| 28 |
Beyond Known Facts: Generating Unseen Temporal Knowledge to Address Data Contamination in LLM Evaluation |
提出一种基于生成未来知识的评估方法,解决LLM在时序知识图谱抽取任务中数据污染问题。 |
large language model |
|
|