| 1 |
Protecting multimodal large language models against misleading visualizations |
提出六种方法以提高多模态大语言模型对误导性可视化的鲁棒性 |
large language model multimodal |
|
|
| 2 |
A Thousand Words or An Image: Studying the Influence of Persona Modality in Multimodal LLMs |
研究人物角色模态对多模态大语言模型表达能力的影响,揭示图像模态的局限性。 |
large language model multimodal |
✅ |
|
| 3 |
Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge |
提出层感知任务算术(LATA),解耦任务特定知识和指令遵循知识,提升模型合并与编辑效果。 |
large language model instruction following |
|
|
| 4 |
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking |
研究Transformer+CoT在有限状态自动机中的状态跟踪能力,揭示其内部机制。 |
large language model chain-of-thought |
✅ |
|
| 5 |
Self-Training Elicits Concise Reasoning in Large Language Models |
自训练方法引导大语言模型进行更简洁的推理,降低计算成本 |
large language model chain-of-thought |
✅ |
|
| 6 |
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge |
提出MMKE-Bench:一个用于评估多模态模型视觉知识编辑能力的综合基准。 |
large language model multimodal |
|
|
| 7 |
Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation |
Chitranuvad:通过多语言LLM适配实现多模态翻译 |
multimodal |
|
|
| 8 |
NANOGPT: A Query-Driven Large Language Model Retrieval-Augmented Generation System for Nanotechnology Research |
提出NANOGPT:一个查询驱动的LLM-RAG系统,用于加速纳米技术研究。 |
large language model |
|
|
| 9 |
Re-evaluating Open-ended Evaluation of Large Language Models |
提出基于三方博弈的LLM开放式评估方法,提升冗余数据下的鲁棒性 |
large language model |
|
|
| 10 |
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models |
提出PerMU,通过概率扰动实现大语言模型中更广义的隐式知识遗忘。 |
large language model |
✅ |
|
| 11 |
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models |
揭示LLM涌现抽象推理能力:一种涌现的符号机制 |
large language model |
|
|
| 12 |
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents |
提出Collab-Overcooked基准测试,用于评估LLM在协作环境中的智能体能力 |
large language model |
✅ |
|
| 13 |
Collaborative Stance Detection via Small-Large Language Model Consistency Verification |
提出CoVer框架,通过大小语言模型一致性验证提升社交媒体立场检测效率。 |
large language model |
|
|
| 14 |
KEDRec-LM: A Knowledge-distilled Explainable Drug Recommendation Large Language Model |
提出KEDRec-LM,一种知识蒸馏的可解释药物推荐大语言模型,并构建expRxRec数据集。 |
large language model |
|
|
| 15 |
LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder |
LinguaLens:通过稀疏自编码器解析大型语言模型的语言机制 |
large language model |
|
|
| 16 |
ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models |
提出ChineseEcomQA,一个可扩展的电商概念评估基准,用于评估大型语言模型在电商领域的性能。 |
large language model |
|
|
| 17 |
Mapping Trustworthiness in Large Language Models: A Bibliometric Analysis Bridging Theory to Practice |
通过文献计量分析,揭示大型语言模型可信度理论与实践的差距及提升策略。 |
large language model |
|
|
| 18 |
GeoEdit: Geometric Knowledge Editing for Large Language Models |
提出GeoEdit,利用几何知识编辑大型语言模型,提升知识更新效果并保持通用性。 |
large language model |
|
|
| 19 |
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture |
提出HaLoRA,一种硬件感知的低秩适应方法,提升LLM在混合存内计算架构上的鲁棒性。 |
large language model |
|
|
| 20 |
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents |
ViSA:基于智能体协作的视觉中心数据选择方法,提升多模态大模型性能 |
large language model multimodal |
✅ |
|
| 21 |
PolyPrompt: Automating Knowledge Extraction from Multilingual Language Models with Dynamic Prompt Generation |
PolyPrompt:通过动态Prompt生成,自动化多语言模型中的知识提取。 |
large language model |
|
|
| 22 |
LLM as a Broken Telephone: Iterative Generation Distorts Information |
研究表明LLM迭代生成会扭曲信息,类似“传话游戏”效应,提示工程可缓解。 |
large language model |
|
|
| 23 |
Deterministic or probabilistic? The psychology of LLMs as random number generators |
揭示LLM生成随机数时的确定性偏差,源于训练数据中的人类认知偏见。 |
large language model |
|
|
| 24 |
Order Doesn't Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation |
提出基于顺序增强的LLM训练方法,提升模型逻辑推理能力。 |
large language model |
✅ |
|
| 25 |
Foot-In-The-Door: A Multi-turn Jailbreak for LLMs |
提出FITD多轮jailbreak方法,利用心理学原理提升LLM攻击成功率 |
large language model |
✅ |
|
| 26 |
KunlunBaize: LLM with Multi-Scale Convolution and Multi-Token Prediction Under TransformerX Framework |
KunlunBaize:TransformerX框架下多尺度卷积与多Token预测的大语言模型 |
large language model |
|
|
| 27 |
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing |
提出Multi2框架,通过测试时扩展提升多文档摘要生成质量并探索其边界。 |
large language model |
|
|
| 28 |
OmniRouter: Budget and Performance Controllable Multi-LLM Routing |
OmniRouter:提出预算和性能可控的多LLM路由框架,优化资源分配。 |
large language model |
✅ |
|
| 29 |
HuAMR: A Hungarian AMR Parser and Dataset |
提出HuAMR:首个匈牙利语AMR数据集与解析器,填补非英语语义资源空白。 |
large language model |
|
|
| 30 |
Supervised Fine-Tuning LLMs to Behave as Pedagogical Agents in Programming Education |
提出GuideLM:通过监督微调LLM,使其在编程教育中作为教学助手 |
large language model |
|
|
| 31 |
RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding |
RAPID:检索增强推测解码加速长文本LLM推理并提升生成质量 |
large language model |
|
|
| 32 |
Beyond Natural Language Perplexity: Detecting Dead Code Poisoning in Code Generation Datasets |
提出DePA,通过行级困惑度分析检测代码生成数据集中存在的死代码污染问题 |
large language model |
|
|
| 33 |
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving |
FINEREASON:通过反思性解谜评估和提升LLM的审慎推理能力 |
large language model |
|
|
| 34 |
LongRoPE2: Near-Lossless LLM Context Window Scaling |
LongRoPE2:通过进化搜索和混合训练实现LLM近乎无损的上下文窗口扩展 |
large language model |
✅ |
|
| 35 |
The Lookahead Limitation: Why Multi-Operand Addition is Hard for LLMs |
揭示LLM算术能力局限:单步预测限制多操作数加法 |
large language model |
|
|
| 36 |
What's Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in LLMs |
提出基于描述的偏见基准DBB,评估LLM在微妙语境下的社会偏见 |
large language model |
✅ |
|
| 37 |
Unsupervised Concept Vector Extraction for Bias Control in LLMs |
提出一种无监督概念向量提取方法,用于控制大型语言模型中的偏见。 |
large language model |
✅ |
|