| 1 |
Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models |
提出LDKE框架,解决多模态大语言模型知识编辑的泛化性和局部性问题 |
large language model multimodal |
|
|
| 2 |
LLMSurgeon: Diagnosing Data Mixture of Large Language Models |
LLMSurgeon:诊断大型语言模型预训练数据混合比例,实现事后溯源。 |
large language model foundation model |
|
|
| 3 |
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation |
提出Ptah:一个多智能体框架,用于生成可验证的多模态深度研究报告 |
large language model multimodal |
|
|
| 4 |
Unlocking the Working Memory of Large Language Models for Latent Reasoning |
提出 Reasoning in Memory (RiM),利用大语言模型的工作记忆进行潜在推理 |
large language model |
|
|
| 5 |
Latent Performance Profiling of Large Language Models |
提出潜变量性能剖析(LPP)框架,用于从模型内部状态评估大语言模型。 |
large language model |
|
|
| 6 |
ActTraitBench: Quantifying the Knowledge-Decision Gap in Large Language Models via Human-Grounded Behavioral Validation |
ActTraitBench:通过行为验证量化大语言模型中的知行差距 |
large language model |
|
|
| 7 |
Spurious Prompts: Can Irrelevant Prompts Steer Large Language Models? |
发现LLM对无关提示的敏感性:无关提示可有效引导模型行为 |
large language model |
✅ |
|
| 8 |
CCS: Clinical Consensus Selection for Radiology Report Generation |
提出临床共识选择(CCS)框架,提升放射报告生成中推理阶段的报告质量。 |
large language model multimodal |
|
|
| 9 |
How LoRA Remembers? A Parametric Memory Law for LLM Finetuning |
提出参数记忆定律,量化LoRA微调中LLM的记忆容量,并提出MemFT优化策略。 |
large language model |
✅ |
|
| 10 |
Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning |
PPC:通过预先规划增强LLM的数学推理能力 |
large language model |
|
|
| 11 |
Causal Interventions on Continuous Variables: A Case Study on Verb Bias in Steering Vectors for In-Context Learning |
提出连续变量因果干预方法,研究语言模型中动词偏向对上下文学习的影响 |
large language model |
|
|
| 12 |
Does The Way You Plan Matter? An Empirical Study of Planning Representations for LLM Web Agents |
PlanAhead框架评估LLM Web Agent中规划表示的影响,提升任务成功率。 |
multimodal |
|
|
| 13 |
Nine Judges, Two Effective Votes: Correlated Errors Undermine LLM Evaluation Panels |
LLM评判团存在误差相关性,导致有效投票数远低于预期 |
chain-of-thought |
|
|
| 14 |
DySem: Uncovering Dynamic Semantic Components via Multilingual Consensus for Calculating Semantic Textual Similarity |
DySem:通过多语言共识发现动态语义成分,用于计算语义文本相似度 |
large language model |
✅ |
|
| 15 |
User-Aware Active Knowledge Acquisition for Emotional Support Dialogue |
提出用户感知主动知识获取框架,提升情感支持对话系统性能。 |
large language model |
|
|
| 16 |
Teaching Language Models to Check Grounded Claim Factuality with Human Test-Taking Strategies |
提出基于人类应试策略的语言模型,用于检查生成文本的事实性 |
large language model |
|
|
| 17 |
Beyond English and Evasion: A Human-Annotated Multi-Domain Benchmark for High-Stakes LLM Safety Evaluation in Chinese |
提出ChiSafe-PAS:一个中文多领域对抗性提示基准,用于评估大语言模型的安全性。 |
large language model |
|
|
| 18 |
Evaluating Cross-lingual Knowledge Consistency in Code-Mixed vis-a-vis Indian Languages using IndicKLAR |
IndiKLAR揭示了代码混合输入在提升印度语言知识一致性方面的作用 |
large language model |
|
|
| 19 |
Predicting Causal Effects from Natural Language Queries using Structured Representations |
提出Query2Effect基准和两阶段框架,利用自然语言查询预测因果效应。 |
large language model |
|
|
| 20 |
CONCAT: Consensus- and Confidence-Driven Ad Hoc Teaming for Efficient LLM-Based Multi-Agent Systems |
提出CONCAT,一种基于共识和置信度的LLM多智能体高效协作框架 |
large language model |
|
|
| 21 |
From Blind Guess to Informed Judgment: Teaching LLMs to Evaluate Materials by Building Knowledge-Augmented Preference Signals |
MaterEval:构建知识增强偏好信号,指导LLM进行材料评估 |
large language model |
|
|
| 22 |
COFT: Counterfactual-Conformal Decoding for Fair Chain-of-Thought Reasoning in Large Language Models |
COFT:一种无训练的反事实-共形解码方法,用于大语言模型中公平的思维链推理 |
large language model chain-of-thought |
|
|
| 23 |
Latent Performance Profiling of Large Language Models |
提出Latent Performance Profiling (LPP),用于从隐空间评估大语言模型。 |
large language model |
|
|
| 24 |
When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models |
揭示大语言模型中英语叙事主导地位:以孟加拉语文化知识为例 |
large language model |
|
|
| 25 |
Your Multimodal Speech Model Says I Have a Face for Radio |
评估多模态语音识别模型中的人脸偏见,揭示显著的性别和种族差异。 |
multimodal |
|
|
| 26 |
DySem: Uncovering Dynamic Semantic Components of Large Language Models for Calculating Semantic Textual Similarity |
DySem:通过动态语义成分挖掘提升大语言模型语义文本相似度计算 |
large language model |
✅ |
|
| 27 |
Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting |
提出Mask the Target正则化方法,解决LoRA微调中的灾难性遗忘问题 |
large language model |
|
|
| 28 |
Exploring Autonomous Agentic Data Engineering for Model Specialization |
提出自主代理数据工程以解决模型专业化问题 |
large language model |
✅ |
|
| 29 |
Kronecker Embeddings: Byte-Level Structured Token Representations for Parameter-Efficient Language Models |
提出Kronecker嵌入,通过字节级结构化表示显著降低语言模型参数量。 |
large language model |
|
|
| 30 |
Implicit Identity Technologies for LLMs: Fingerprinting and Watermarking across Datasets, Models, and Generated Content |
提出LLM隐式身份技术框架,用于指纹识别和水印,实现数据集、模型和生成内容溯源。 |
large language model |
|
|
| 31 |
EUDAIMONIA: Evaluating Undesirable Dynamics in AI |
提出社会AI设计规范以评估语言模型的社会动态问题 |
large language model |
|
|
| 32 |
MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery |
MOOSE-Copilot:用于统一探索式和精细化科学假设发现的交互式Web助手 |
large language model |
|
|
| 33 |
Adaptive Interviewing for Persona Simulation in LLMs: Evidence-Grounded Reasoning Improves Decision Alignment |
提出自适应访谈框架,提升LLM在个体决策模拟中的证据一致性 |
large language model |
|
|
| 34 |
Beyond Bilingual Transfer: Multilingual Code-Switching in Instruction Tuning |
多语言指令调优中,多语言Code-Switching超越双语迁移 |
large language model |
|
|
| 35 |
DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents |
DynSess:用于角色扮演Agent的动态会话级评估与优化框架 |
large language model |
|
|
| 36 |
Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents |
AgentREVEAL揭示了Web检索如何降低LLM Agent的安全对齐,并提出了HarmURLBench基准。 |
large language model |
|
|
| 37 |
Configurable Reward Model for Balanced Safety Alignment |
提出可配置奖励模型(CSRM)以平衡大语言模型的安全性对齐问题 |
large language model |
|
|
| 38 |
Can LLM Teams Play What? Where? When? |
LLM团队协作提升智力问答游戏表现,最高提升20个百分点 |
large language model |
|
|
| 39 |
Cross-Lingual Steering for Figurative Language Generation |
提出跨语言激活调控方法,探索并利用多语言LLM中比喻语言生成的通用信号。 |
large language model |
|
|