| 1 |
PDDL-Mind: Large Language Models are Capable on Belief Reasoning with Reliable State Tracking |
PDDL-Mind:利用大语言模型进行可靠状态追踪的信念推理 |
large language model chain-of-thought |
|
|
| 2 |
Retrieval-Augmented Multimodal Model for Fake News Detection |
提出检索增强多模态模型RAMM,解决假新闻检测中跨实例叙事一致性和领域知识缺乏问题。 |
large language model multimodal |
✅ |
|
| 3 |
How Creative Are Large Language Models in Generating Molecules? |
首个系统性研究:评估大语言模型在分子生成中的创造性表现 |
large language model |
|
|
| 4 |
TLoRA: Task-aware Low Rank Adaptation of Large Language Models |
TLoRA:面向任务的大语言模型低秩自适应优化框架 |
large language model |
|
|
| 5 |
MHSafeEval: Role-Aware Interaction-Level Evaluation of Mental Health Safety in Large Language Models |
提出MHSafeEval框架,用于评估大型语言模型在心理健康咨询中的角色感知交互安全性 |
large language model |
|
|
| 6 |
River-LLM: Large Language Model Seamless Exit Based on KV Share |
River-LLM:基于KV共享的大语言模型无缝退出框架,提升推理速度。 |
large language model |
|
|
| 7 |
Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval |
提出Omni-Embed-Audio,利用多模态LLM提升音频-文本检索的鲁棒性 |
multimodal |
|
|
| 8 |
Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs |
提出多模态乘法基准测试,揭示多模态LLM在算术计算中的能力瓶颈。 |
multimodal |
|
|
| 9 |
FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings |
提出FLiP模型,用于理解和解释多模态多语言句子嵌入 |
multimodal |
✅ |
|
| 10 |
JudgeMeNot: Personalizing Large Language Models to Emulate Judicial Reasoning in Hebrew |
提出JudgeMeNot,通过个性化大语言模型模拟希伯来语司法推理 |
large language model |
|
|
| 11 |
Employing General-Purpose and Biomedical Large Language Models with Advanced Prompt Engineering for Pharmacoepidemiologic Study Design |
利用通用和生物医学大语言模型及高级Prompt工程改进药物流行病学研究设计 |
large language model |
|
|
| 12 |
Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models |
提出动态协作框架,协同大小语言模型解决多步推理问题 |
large language model |
|
|
| 13 |
DeInfer: Efficient Parallel Inferencing for Decomposed Large Language Models |
提出DeInfer,加速分解大语言模型的并行推理。 |
large language model |
|
|
| 14 |
MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge |
提出MM-JudgeBias基准,用于评估多模态大语言模型作为评判者时的组合偏差。 |
large language model multimodal |
|
|
| 15 |
Heterogeneity in Formal Linguistic Competence of Language Models: Is Data the Real Bottleneck? |
通过注入少量针对性数据,显著提升小规模语言模型在特定语言现象上的理解能力。 |
large language model |
✅ |
|
| 16 |
Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens |
提出熵引导的Token权重(ETW)方法,用于大语言模型中信息量Token的选择性遗忘,提升模型效用。 |
large language model |
|
|
| 17 |
Multilingual Training and Evaluation Resources for Vision-Language Models |
构建多语言视觉-语言模型训练与评估资源,提升非英语环境性能。 |
multimodal |
|
|
| 18 |
Beyond Reproduction: A Paired-Task Framework for Assessing LLM Comprehension and Creativity in Literary Translation |
提出配对任务框架,评估LLM在文学翻译中的理解与创造力 |
large language model |
|
|
| 19 |
Latent Abstraction for Retrieval-Augmented Generation |
提出LAnR:一种在LLM隐空间内进行检索增强生成的新框架 |
large language model |
|
|
| 20 |
Bridging the Reasoning Gap in Vietnamese with Small Language Models via Test-Time Scaling |
通过测试时缩放,利用小型语言模型弥合越南语推理差距 |
chain-of-thought |
|
|
| 21 |
SPENCE: A Syntactic Probe for Detecting Contamination in NL2SQL Benchmarks |
SPENCE:一种用于检测NL2SQL基准测试集中污染的句法探针 |
large language model |
|
|
| 22 |
Dual Alignment Between Language Model Layers and Human Sentence Processing |
通过双重对齐语言模型层级与人类句子处理,提升句法复杂场景下的认知努力建模。 |
large language model |
|
|
| 23 |
MASS-RAG: Multi-Agent Synthesis Retrieval-Augmented Generation |
提出MASS-RAG,一种多智能体合成的检索增强生成方法,提升噪声环境下知识整合效果。 |
large language model |
|
|
| 24 |
BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources |
BhashaSutra:印度NLP数据集、语料库和资源的以任务为中心的统一综述 |
multimodal |
|
|
| 25 |
Understanding the Prompt Sensitivity |
分析LLM梯度与概率关系,揭示Prompt敏感性内在原因 |
large language model |
✅ |
|
| 26 |
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation |
ArbGraph:面向长文本RAG,提出冲突感知的证据仲裁框架,提升生成可靠性。 |
large language model |
✅ |
|
| 27 |
HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents |
提出HiGMem:一种层级化和LLM引导的记忆系统,用于长期对话Agent。 |
large language model |
✅ |
|
| 28 |
Beyond Pattern Matching: Seven Cross-Domain Techniques for Prompt Injection Detection |
提出七种跨领域技术用于提示注入检测,突破传统模式匹配和微调分类器的局限。 |
large language model |
|
|
| 29 |
Decisive: Guiding User Decisions with Optimal Preference Elicitation from Unstructured Documents |
Decisive:通过非结构化文档的最优偏好诱导,引导用户决策。 |
large language model |
|
|
| 30 |
Efficient Low-Resource Language Adaptation via Multi-Source Dynamic Logit Fusion |
提出TriMix,通过多源动态Logit融合实现高效的低资源语言模型适配 |
large language model |
|
|
| 31 |
Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards |
提出基于规划领域定义语言的流程奖励模型数据集生成方法,提升LLM推理能力。 |
large language model |
|
|
| 32 |
Automatic Slide Updating with User-Defined Dynamic Templates and Natural Language Instructions |
提出DynaSlide基准和SlideAgent框架,实现基于自然语言指令和用户自定义模板的幻灯片自动更新。 |
multimodal |
✅ |
|
| 33 |
HiRAS: A Hierarchical Multi-Agent Framework for Paper-to-Code Generation and Execution |
提出HiRAS层级多智能体框架,用于论文到代码生成与执行,提升实验结果复现的鲁棒性与性能。 |
large language model |
✅ |
|
| 34 |
Do LLMs Use Cultural Knowledge Without Being Told? A Multilingual Evaluation of Implicit Pragmatic Adaptation |
评估大语言模型在隐含文化情境下的语用适应能力,揭示其对文化知识的利用程度。 |
large language model |
|
|