| 1 |
SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific Intelligence |
SciIF:提出科学指令遵循基准,评估LLM在科学推理中的严谨性 |
large language model instruction following |
|
|
| 2 |
Bridging Temporal and Textual Modalities: A Multimodal Framework for Automated Cloud Failure Root Cause Analysis |
提出一种多模态框架,用于自动化云故障根因分析,弥合时间序列和文本模态之间的鸿沟。 |
large language model multimodal |
|
|
| 3 |
Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning |
提出InstruCoT,通过多样数据合成和指令级CoT学习增强LLM抵御Prompt注入攻击的能力 |
large language model chain-of-thought |
|
|
| 4 |
Challenges and Research Directions for Large Language Model Inference Hardware |
针对大语言模型推理硬件挑战,提出高带宽闪存、近内存计算等架构优化方向 |
large language model |
|
|
| 5 |
Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop |
研究大型语言模型在自消费执行循环中的偏差,并提出缓解策略。 |
large language model |
|
|
| 6 |
Large language models can effectively convince people to believe conspiracies |
探讨大型语言模型在传播阴谋论中的双重影响 |
large language model |
|
|
| 7 |
An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions |
揭示大语言模型在表格数据扭曲下的脆弱性,并探究提升鲁棒性的方法 |
large language model |
|
|
| 8 |
DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation |
提出DVD以解决大型语言模型评估中的变体污染问题 |
large language model |
|
|
| 9 |
AECV-Bench: Benchmarking Multimodal Models on Architectural and Engineering Drawings Understanding |
AECV-Bench:用于建筑工程图理解的多模态模型基准测试 |
multimodal |
|
|
| 10 |
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment |
提出CIEA,通过互补信息提取与对齐增强多模态检索效果 |
multimodal |
✅ |
|
| 11 |
AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation |
AdaptEval:用于评估大语言模型在代码片段适配能力上的基准测试 |
large language model |
|
|
| 12 |
Token-Level LLM Collaboration via FusionRoute |
FusionRoute:一种基于token级融合路由的多LLM协同框架 |
large language model instruction following |
|
|
| 13 |
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents |
BackdoorAgent:LLM Agent后门攻击的统一框架,揭示跨阶段触发传播 |
large language model multimodal |
|
|
| 14 |
CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts |
CircuitLM:多智能体LLM辅助电路设计框架,从自然语言生成电路原理图 |
large language model chain-of-thought |
|
|
| 15 |
CAOS: Conformal Aggregation of One-Shot Predictors |
提出CAOS,通过集成单样本预测器并结合一致性预测,实现快速自适应和不确定性量化。 |
foundation model |
|
|
| 16 |
Higher-Order Knowledge Representations for Agentic Scientific Reasoning |
提出基于超图的知识表示方法,用于Agentic科学推理,加速新材料发现。 |
large language model |
|
|
| 17 |
Neurosymbolic Retrievers for Retrieval-augmented Generation |
提出神经符号检索器,增强检索增强生成的可解释性和性能 |
large language model |
|
|
| 18 |
Internal Representations as Indicators of Hallucinations in Agent Tool Selection |
利用LLM内部表征实时检测Agent工具选择中的幻觉问题 |
large language model |
|
|
| 19 |
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Large Reasoning Models |
提出ReasonMark,用于推理大模型的语义引导水印,提升水印性能同时保持推理连贯性。 |
large language model |
|
|
| 20 |
Arabic Prompts with English Tools: A Benchmark |
提出Arabic Tool-Calling基准,评估阿拉伯语提示下LLM的工具调用能力 |
large language model |
|
|
| 21 |
Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models |
提出隐私优先的推理方法以解决个人信息泄露问题 |
chain-of-thought |
|
|
| 22 |
Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence |
在材料科学中发布FAIR和机器可操作的综述:神经符号人工智能中符号知识的案例 |
large language model |
|
|
| 23 |
T-Retriever: Tree-based Hierarchical Retrieval Augmented Generation for Textual Graphs |
T-Retriever:提出基于树形结构的层级检索增强生成框架,解决文本图中的信息检索问题。 |
large language model |
|
|
| 24 |
CurricuLLM: Designing Personalized and Workforce-Aligned Cybersecurity Curricula Using Fine-Tuned LLMs |
CurricuLLM:利用微调LLM设计个性化、符合行业需求的网络安全课程 |
large language model |
|
|
| 25 |
Orchestrating Intelligence: Confidence-Aware Routing for Efficient Multi-Agent Collaboration across Multi-Scale Models |
提出OI-MAS框架,通过置信度感知的路由实现多尺度模型下高效的多智能体协作。 |
large language model |
|
|
| 26 |
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation |
DR-LoRA:动态调整专家LoRA秩,提升MoE模型微调效率 |
large language model |
|
|
| 27 |
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning |
提出CompassMem,利用事件中心记忆作为逻辑地图,提升Agent的搜索和推理能力 |
large language model |
|
|
| 28 |
Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search |
提出M-ASK框架,解耦Agentic搜索中的搜索行为与知识管理,提升多跳问答性能。 |
large language model |
✅ |
|
| 29 |
LLM-Guided Quantified SMT Solving over Uninterpreted Functions |
AquaForte:利用LLM指导的量化SMT求解非解释函数问题 |
large language model |
|
|
| 30 |
LAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence |
LAMB:通过柯西-施瓦茨散度桥接模态差距的LLM音频描述框架 |
large language model |
|
|
| 31 |
Vibe Coding an LLM-powered Theorem Prover |
Isabellm:一种基于LLM的Isabelle/HOL定理证明器,实现全自动证明合成 |
large language model |
✅ |
|
| 32 |
Beyond the "Truth": Investigating Election Rumors on Truth Social During the 2024 Election |
提出基于LLM的多阶段谣言检测Agent,用于分析Truth Social平台2024年选举谣言 |
large language model |
|
|
| 33 |
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks |
提出Constitutional Classifiers++,高效防御通用越狱攻击,降低计算成本和拒绝率。 |
large language model |
|
|
| 34 |
GUITester: Enabling GUI Agents for Exploratory Defect Discovery |
提出GUITester,通过多智能体框架实现GUI应用的自主探索性缺陷发现 |
large language model |
✅ |
|