| 1 |
Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs |
揭示LLM内部思维链:层级化子任务调度机制的实证研究 |
large language model chain-of-thought |
|
|
| 2 |
ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models |
提出ABBA-Adapters,通过高效且富有表现力的微调方法提升基础模型性能。 |
large language model foundation model |
✅ |
|
| 3 |
EfficientLLM: Efficiency in Large Language Models |
EfficientLLM:大规模语言模型效率评估基准与优化技术综合研究 |
large language model foundation model |
|
|
| 4 |
ModRWKV: Transformer Multimodality in Linear Time |
提出ModRWKV,一种基于RWKV7的线性时间复杂度多模态Transformer框架。 |
large language model multimodal |
|
|
| 5 |
Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales |
提出LRSA框架,利用LLM生成的原因解释增强SLM在多模态情感分析中的性能。 |
large language model multimodal |
|
|
| 6 |
CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring |
提出CAFES框架以解决多模态自动评分的局限性 |
large language model multimodal |
|
|
| 7 |
DecIF: Improving Instruction-Following through Meta-Decomposition |
DecIF:通过元分解提升大型语言模型的指令跟随能力 |
large language model instruction following |
|
|
| 8 |
Large Language Models Implicitly Learn to See and Hear Just By Reading |
仅通过阅读文本,大语言模型隐式学习视觉和听觉能力 |
large language model |
|
|
| 9 |
Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models |
提出稀疏增强张量网络Saten,用于大语言模型后训练压缩。 |
large language model |
|
|
| 10 |
Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning |
提出N-rep一致性方法,无需CoT或微调,实现低成本高鲁棒性的Text-to-SQL |
chain-of-thought |
|
|
| 11 |
Scaling Laws for State Dynamics in Large Language Models |
研究揭示大语言模型在状态动态建模中面临的挑战,并探究其内部状态追踪机制。 |
large language model |
|
|
| 12 |
Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models |
提出TruthHypo基准和KnowHD检测器,评估LLM生成科学假设的真实性和幻觉问题。 |
large language model |
✅ |
|
| 13 |
Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations |
揭示代码混合扰动下大语言模型归因安全性失效问题,并提出修复策略。 |
large language model |
|
|
| 14 |
Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models |
揭示大语言模型跨尺度参数知识迁移的神经不兼容性难题 |
large language model |
✅ |
|
| 15 |
DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models |
DiagnosisArena:构建诊断推理基准,评估大型语言模型在医疗诊断中的能力。 |
large language model |
✅ |
|
| 16 |
Development and Validation of Engagement and Rapport Scales for Evaluating User Experience in Multimodal Dialogue Systems |
为多模态对话系统用户体验评估,开发并验证了交互投入度和亲和度量表 |
multimodal |
|
|
| 17 |
Multimodal Cultural Safety: Evaluation Framework and Alignment Strategies |
提出CROSS基准与CROSS-Eval框架,提升LVLM文化安全意识与合规性 |
multimodal |
|
|
| 18 |
DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis |
提出DECASTE框架以揭示大型语言模型中的种姓偏见 |
large language model |
|
|
| 19 |
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples |
提出LISTEN方法,通过合成负样本缓解音频大语言模型中的幻觉问题 |
large language model |
|
|
| 20 |
S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models |
S2SBench:用于量化语音到语音大语言模型智能退化的基准测试 |
large language model |
✅ |
|
| 21 |
OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking |
OmniGenBench:用于基因组基础模型可复现基准测试的模块化平台 |
foundation model |
|
|
| 22 |
QA-prompting: Improving Summarization with Large Language Models using Question-Answering |
提出QA-prompting方法,利用问答提升大语言模型长文本摘要能力 |
large language model |
|
|
| 23 |
Cross-Lingual Optimization for Language Transfer in Large Language Models |
提出跨语言优化(CLO)方法,提升大语言模型跨语言迁移能力并保持英语性能 |
large language model |
|
|
| 24 |
Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification |
构建统一框架,探索大语言模型在作者身份隐私保护中的混淆、模仿与验证作用 |
large language model |
|
|
| 25 |
Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering |
提出PDRR框架,弥合大语言模型与知识库在复杂问答中的鸿沟 |
large language model |
|
|
| 26 |
ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs |
提出ShieldVLM,通过审议推理增强LVLM在多模态隐式毒性检测中的安全性。 |
multimodal |
|
|
| 27 |
AUTOLAW: Enhancing Legal Compliance in Large Language Models via Case Law Generation and Jury-Inspired Deliberation |
AutoLaw:通过案例生成与陪审团审议增强大语言模型法律合规性 |
large language model |
|
|
| 28 |
Activation-Guided Consensus Merging for Large Language Models |
提出激活引导的共识合并方法ACM,提升大语言模型合并效果。 |
large language model |
|
|
| 29 |
Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection |
研究多模态情感识别中模型预测分歧,揭示模态冲突下的潜在歧义。 |
multimodal |
|
|
| 30 |
Informatics for Food Processing |
提出FoodProX和多模态AI模型,提升食品加工评估的客观性和可扩展性 |
large language model multimodal |
|
|
| 31 |
Amadeus-Verbo Technical Report: The powerful Qwen2.5 family models trained in Portuguese |
Amadeus-Verbo:针对巴西葡萄牙语的Qwen2.5系列大语言模型微调与开源 |
large language model foundation model |
✅ |
|
| 32 |
PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs |
PersonaTAB:利用全双工语音对话中的文本、声学和行为线索预测人格特质 |
large language model TAMP |
|
|
| 33 |
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst |
提出自推理语言模型(SRLM),通过少量推理催化剂迭代提升复杂推理能力。 |
large language model chain-of-thought |
|
|
| 34 |
Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM |
提出图基分析框架以提升推理大型语言模型的理解 |
large language model chain-of-thought |
|
|
| 35 |
Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels |
提出TLDM基准,揭示LLM在长文本小说理解中超过64k tokens后性能显著下降 |
large language model |
|
|
| 36 |
EasyMath: A 0-shot Math Benchmark for SLMs |
EasyMath:面向小型语言模型的零样本数学推理评测基准 |
chain-of-thought |
|
|
| 37 |
Automated Journalistic Questions: A New Method for Extracting 5W1H in French |
提出法语新闻5W1H自动抽取流程,性能媲美GPT-4o。 |
large language model |
|
|
| 38 |
UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models |
UltraEdit:一种免训练、免主题、免记忆的语言模型终身编辑方法 |
large language model |
✅ |
|
| 39 |
WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications |
WirelessMathBench:无线通信领域大语言模型数学建模能力评测基准 |
large language model |
|
|
| 40 |
Temporal Alignment of Time Sensitive Facts with Activation Engineering |
利用激活工程实现LLM的时间敏感事实对齐,无需训练即可提升时间感知能力。 |
large language model |
|
|
| 41 |
Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall |
研究量化对大语言模型事实知识回忆的影响,揭示量化引入的信息损失。 |
large language model |
|
|
| 42 |
Mechanistic Interpretability of GPT-like Models on Summarization Tasks |
提出一种针对GPT类模型在摘要任务上的可解释性分析框架,并实现性能提升。 |
large language model |
|
|
| 43 |
WebNovelBench: Placing LLM Novelists on the Web Novel Distribution |
提出WebNovelBench,用于评估LLM在长文本小说生成中的能力,并将其置于真实网络小说分布中进行对比。 |
large language model |
|
|
| 44 |
Creative Preference Optimization |
提出创造性偏好优化(CrPO),提升大语言模型生成内容的新颖性、多样性和质量。 |
large language model |
|
|
| 45 |
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language |
MUG-Eval:提出一种与语言无关的代理评估框架,用于评估任意语言的大语言模型生成能力 |
large language model |
|
|
| 46 |
GemMaroc: Unlocking Darija Proficiency in LLMs with Minimal Data |
GemMaroc:利用少量数据提升LLM在摩洛哥阿拉伯语(Darija)上的能力 |
large language model |
|
|
| 47 |
Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits |
揭示LLM中Token化约束对符号和算术推理的限制,提出Token Awareness概念。 |
chain-of-thought |
|
|
| 48 |
A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations |
PersonaConvBench:提出一个大规模个性化对话基准,用于评估LLM在多轮对话中的推理和生成能力。 |
large language model |
|
|
| 49 |
GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace |
GloSS:通过全局毒性子空间抑制LLM中的毒性生成。 |
large language model |
|
|
| 50 |
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora |
利用多路平行语料提升多语言大语言模型跨语言语义理解能力 |
large language model |
|
|
| 51 |
FlashThink: An Early Exit Method For Efficient Reasoning |
FlashThink:一种用于高效推理的提前退出方法 |
large language model |
|
|
| 52 |
EEG-to-Text Translation: A Model for Deciphering Human Brain Activity |
提出R1 Translator模型,提升脑电信号到文本的解码性能 |
large language model |
✅ |
|
| 53 |
ConspEmoLLM-v2: A robust and stable model to detect sentiment-transformed conspiracy theories |
ConspEmoLLM-v2:提出一种鲁棒稳定的模型,用于检测情感转换后的阴谋论。 |
large language model |
✅ |
|
| 54 |
Concept Incongruence: An Exploration of Time and Death in Role Playing |
探索角色扮演中时间与死亡的概念不一致性,揭示LLM的潜在问题 |
large language model |
|
|
| 55 |
Incorporating Token Usage into Prompting Strategy Evaluation |
提出Big-$O_{tok}$框架,评估提示策略的token使用效率,优化大语言模型应用。 |
large language model |
|
|
| 56 |
SEPS: A Separability Measure for Robust Unlearning in LLMs |
提出SEPS评估框架与MP混合提示学习,提升LLM在混合查询场景下的不可学习能力 |
large language model |
|
|
| 57 |
Tracing Multilingual Factual Knowledge Acquisition in Pretraining |
追踪预训练中多语言事实知识的获取过程,揭示频率驱动学习和跨语言迁移两种机制。 |
large language model |
✅ |
|
| 58 |
Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes |
系统研究推理语言模型中的语言混合现象,揭示其模式、影响和内在原因。 |
chain-of-thought |
|
|
| 59 |
sudoLLM: On Multi-role Alignment of Language Models |
sudoLLM:提出一种多角色对齐框架,提升LLM在用户权限控制下的安全性。 |
large language model |
|
|
| 60 |
TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring |
提出TRATES框架,利用LLM和rubric进行特定写作特征的跨prompt作文评分 |
large language model |
|
|
| 61 |
Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders |
利用稀疏自编码器进行LLM解毒:打破不良Token |
large language model |
|
|
| 62 |
MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance |
提出MoMoE框架,用于可解释、跨社区的AI辅助在线内容审核。 |
large language model |
|
|
| 63 |
Rank-K: Test-Time Reasoning for Listwise Reranking |
Rank-K:一种用于列表式重排序的测试时推理方法,提升难例查询效果。 |
large language model |
|
|
| 64 |
From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning |
研究指令调优LLM在空间推理中从模板到自然语言泛化的挑战 |
large language model |
|
|
| 65 |
Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis |
提出PhantomCircuit框架,通过知识电路分析解决LLM中的知识遮蔽问题 |
large language model |
|
|
| 66 |
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs |
针对开源LLM的提示注入攻击研究及新型攻击方法 |
large language model |
|
|
| 67 |
Dual Decomposition of Weights and Singular Value Low Rank Adaptation |
DuDe:基于权重分解和奇异值分解的低秩自适应方法,提升LLM微调的稳定性和知识迁移效率。 |
large language model |
|
|
| 68 |
OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation |
OSoRA:一种输出维度和奇异值初始化的低秩自适应方法,用于高效微调大型语言模型。 |
large language model |
|
|
| 69 |
Teaching Small Language Models to Learn Logic through Meta-Learning |
通过元学习训练小语言模型学习逻辑推理能力 |
large language model |
|
|
| 70 |
JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling |
JOLT-SQL:通过混淆感知噪声模式采样联合优化Text-to-SQL的损失函数。 |
large language model |
✅ |
|
| 71 |
Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs |
提出针对语音LLM的通用声学对抗攻击,实现灵活控制 |
large language model |
|
|
| 72 |
ThinkSwitcher: When to Think Hard, When to Think Fast |
提出ThinkSwitcher,动态切换CoT推理模式以提升大语言模型效率 |
chain-of-thought |
|
|
| 73 |
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation |
提出SlangDIT基准测试和SlangOWL模型,用于提升LLM在解释性俚语翻译中的性能。 |
large language model |
|
|
| 74 |
The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models |
提出轻量级架构改进以解决字符级理解问题 |
large language model |
|
|
| 75 |
Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents |
提出法律规则归纳任务与基准数据集,提升LLM从判例中发现法律原则的能力 |
large language model |
|
|
| 76 |
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations |
MultiHal:用于知识图谱 grounding 的 LLM 幻觉多语言评估数据集 |
large language model |
|
|
| 77 |
BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks |
提出BAR以解决复杂Minecraft任务中的推理问题 |
large language model |
|
|
| 78 |
Enhancing LLMs via High-Knowledge Data Selection |
提出高知识评分器HKS,提升LLM在知识密集型任务和通用理解任务上的性能。 |
large language model |
|
|
| 79 |
Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation |
揭示多模态检索增强生成系统中新的隐私漏洞,提出组合结构化提示攻击方法。 |
multimodal |
|
|
| 80 |
Cross-Linguistic Transfer in Multilingual NLP: The Role of Language Families and Morphology |
研究语言家族和形态学对多语言NLP跨语言迁移的影响 |
zero-shot transfer |
|
|
| 81 |
Let's Verify Math Questions Step by Step |
提出MathQ-Verify,用于验证数学问题有效性,提升数学QA数据质量。 |
large language model |
✅ |
|
| 82 |
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks |
PandaGuard:系统性评估大型语言模型针对越狱攻击的安全防护能力 |
large language model |
|
|
| 83 |
Improve Language Model and Brain Alignment via Associative Memory |
通过结合联想记忆提升语言模型与人脑的对齐 |
large language model |
|
|