| 1 |
CRAM: Centroid-Routing and Adaptive MoE for Multimodal Continual Instruction Tuning |
CRAM:面向多模态持续指令调优的质心路由与自适应MoE |
large language model multimodal |
|
|
| 2 |
Unveiling the Entropy Dynamics of Chain-of-Thought Reasoning |
揭示CoT推理的熵动态,提出基于CUSUM的免训练实时推理控制框架 |
chain-of-thought |
|
|
| 3 |
Multilinguality of Large Language Models From a Structural Perspective |
通过结构分析揭示大型语言模型的多语言能力 |
large language model |
|
|
| 4 |
Unveiling the Limits of Large Language Models in Inferring Pragmatic Meaning from Non-Verbal Responses |
评估大型语言模型在仅通过非语言反应推断语用意义方面的局限性 |
large language model |
|
|
| 5 |
THRD: A Training-Free Multi-Turn Defense Framework for Jailbreak Attacks on Large Language Models |
提出THRD,一种免训练的多轮对话防御框架,用于抵御大语言模型的越狱攻击。 |
large language model |
|
|
| 6 |
SentGuard: Sentence-Level Streaming Guardrails for Large Language Models |
提出SentGuard,一种句子级流式Guardrail,用于保障大语言模型的实时安全输出。 |
large language model |
|
|
| 7 |
PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning |
PaSBench-Video:用于主动安全预警的流视频基准测试 |
large language model multimodal |
|
|
| 8 |
Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity |
研究表明LLM在群体决策中更易被误导而非纠正,需谨慎对待群体答案。 |
large language model chain-of-thought |
|
|
| 9 |
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts |
提出K-BrowseComp:一个基于韩语环境的Web浏览Agent基准测试,用于评估和诊断LLM的Agent能力。 |
foundation model instruction following |
|
|
| 10 |
Geometric Latent Reasoning Induces Shorter Generations in LLMs |
提出几何潜在推理(GLR),通过隐空间路径近似缩短LLM生成长度。 |
large language model chain-of-thought |
|
|
| 11 |
Better with Experience: Self-Evolving LLM Agents for Evidence-Grounded Health Community Notes |
EvoNote:基于经验自进化的LLM Agent,用于生成证据充分的健康社区笔记 |
large language model multimodal |
|
|
| 12 |
What to Format and How: A Benchmark and Workflow Approach for Document Formatting |
提出DocFormBench和DocFormFlow,解决内容感知文档格式化难题。 |
large language model multimodal |
|
|
| 13 |
FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes |
FigSIM:用于细粒度自杀倾向和隐喻表达的自杀梗数据集 |
multimodal |
|
|
| 14 |
Investigating and Alleviating Harm Amplification in LLM Interactions |
提出HarmAmp基准与TrajSafe主动防御框架,缓解LLM交互中的恶意放大问题 |
large language model |
|
|
| 15 |
MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? |
MMG2Skill:将Web指南提炼为可自我进化的智能体技能 |
multimodal |
|
|
| 16 |
Resonant Context Anchoring: Decoupling Attention Routing and Signal Gain at Inference Time |
提出共振上下文锚定(RCA),在推理时解耦注意力路由和信号增益,提升LLM的事实一致性。 |
large language model |
|
|
| 17 |
Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning |
提出Chunk-Level Guided Generation,利用离线LLM作为过程评分器,无需训练即可提升数学推理能力。 |
large language model |
|
|
| 18 |
From Layers to Submodules: Rethinking Granularity in Replacement-Based LLM Compression |
SubFit:提出一种子模块粒度的LLM压缩方法,提升压缩效率和精度。 |
large language model |
✅ |
|
| 19 |
SimSD: Simple Speculative Decoding in Diffusion Language Models |
提出SimSD,一种用于扩散语言模型的高效推理解码算法,显著提升生成速度。 |
large language model |
|
|
| 20 |
Identifying High-Confidence Social Biases in LLMs for Trustworthy Conversational Tutoring Agents |
评估LLM在对话式辅导中高置信度社会偏见,提升教育场景可信度 |
large language model |
|
|
| 21 |
Not What, But How: A Communicative Audit of LLM Response Framing |
提出FRANZ框架,用于评估LLM在主观问题回答中的沟通方式 |
large language model |
|
|
| 22 |
TVIR: Building Deep Research Agents Towards Text--Visual Interleaved Report Generation |
提出TVIR:构建深度研究Agent,用于生成文本-图像交错的报告 |
multimodal |
|
|
| 23 |
Beyond Isolated Behaviors: Hierarchical User Modeling for LLM Personalization |
提出PHF框架以解决LLM个性化问题 |
large language model |
|
|
| 24 |
Do Gender Cues Affect LLM Value Trade-offs? Evidence from a Controlled Decision Benchmark |
构建可控决策基准RVDB,揭示性别线索对LLM价值权衡的系统性影响 |
large language model |
|
|
| 25 |
Cross-Environment Neural Reranking for Sample-Efficient Action Selection in Text-Based Agents |
提出跨环境神经重排序方法,提升文本Agent在多任务场景下的样本效率。 |
large language model |
|
|
| 26 |
CARTE: A Benchmark for Mapping Language Model Knowledge Across France |
CARTE:一个评估LLM在法国区域知识推理能力的基准 |
large language model |
|
|
| 27 |
Training Prompt Matters: State-Adaptive Optimization for Robust Fine-Tuning |
提出状态自适应Prompt优化(SAPO),提升微调LLM的泛化性和鲁棒性 |
large language model |
✅ |
|
| 28 |
Mitigating Bias in Locally Constrained Decoding via Tractable Proposals |
提出基于可处理提案的全局约束解码以缓解偏差问题 |
large language model |
|
|
| 29 |
Cost-Aware Diffusion Draft Trees for Speculative Decoding |
提出CaDDTree,通过优化token吞吐量实现更高效的推测解码。 |
instruction following |
|
|
| 30 |
Encoded but Not Routed: Explaining the Table-Chart Gap in Scientific Claim Verification |
揭示科学声明验证中表格-图表差距:信息编码但未有效路由 |
multimodal |
|
|
| 31 |
When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models |
提出Hybrid-MoE框架Varnika,提升语言模型在多语言成语理解中的表现。 |
multimodal |
|
|
| 32 |
Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation |
提出LongJudgeBench以解决长文本输出评估的可靠性问题 |
large language model |
|
|