| 1 |
SciIF: Benchmarking Scientific Instruction Following Towards Rigorous Scientific Intelligence |
SciIF:提出科学指令遵循基准,评估LLM在科学推理中的严谨性 |
large language model instruction following |
|
|
| 2 |
Bridging Temporal and Textual Modalities: A Multimodal Framework for Automated Cloud Failure Root Cause Analysis |
提出一种多模态框架,用于自动化的云故障根因分析,弥合时间序列和文本模态之间的鸿沟。 |
large language model multimodal |
|
|
| 3 |
Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning |
InstruCoT:通过多样数据合成与指令级CoT学习增强LLM抵御Prompt注入攻击 |
large language model chain-of-thought |
|
|
| 4 |
Challenges and Research Directions for Large Language Model Inference Hardware |
针对大语言模型推理硬件挑战,提出高带宽闪存、近内存计算等架构优化方向 |
large language model |
|
|
| 5 |
Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop |
研究大型语言模型在自消费执行循环中的偏差,并提出相应的缓解策略。 |
large language model |
|
|
| 6 |
Large language models can effectively convince people to believe conspiracies |
大型语言模型能有效说服人们相信阴谋论,但纠正措施可缓解 |
large language model |
|
|
| 7 |
An Empirical Investigation of Robustness in Large Language Models under Tabular Distortions |
研究表明大语言模型在表格数据存在扭曲时缺乏鲁棒性,需显式提示才能部分纠正。 |
large language model |
|
|
| 8 |
DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation |
提出DVD方法以解决大语言模型评估中的变体污染问题 |
large language model |
|
|
| 9 |
AECV-Bench: Benchmarking Multimodal Models on Architectural and Engineering Drawings Understanding |
AECV-Bench:用于建筑工程图理解的多模态模型基准测试 |
multimodal |
|
|
| 10 |
Enhancing Multimodal Retrieval via Complementary Information Extraction and Alignment |
提出CIEA,通过互补信息提取与对齐增强多模态检索效果 |
multimodal |
✅ |
|
| 11 |
AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation |
AdaptEval:用于评估大型语言模型在代码片段适配能力上的基准测试。 |
large language model |
|
|
| 12 |
Token-Level LLM Collaboration via FusionRoute |
FusionRoute:一种基于token级LLM协作的路由融合框架 |
large language model instruction following |
|
|
| 13 |
BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents |
BackdoorAgent:针对LLM Agent的统一后门攻击框架 |
large language model multimodal |
|
|
| 14 |
CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts |
CircuitLM:多智能体LLM辅助电路设计框架,从自然语言生成电路原理图 |
large language model chain-of-thought |
|
|
| 15 |
CAOS: Conformal Aggregation of One-Shot Predictors |
提出CAOS框架,通过集成单样本预测器并结合一致性预测,实现快速自适应和可靠的不确定性量化。 |
foundation model |
|
|
| 16 |
Higher-Order Knowledge Representations for Agentic Scientific Reasoning |
提出基于超图的知识表示方法,用于Agentic科学推理,加速新材料发现。 |
large language model |
|
|
| 17 |
Neurosymbolic Retrievers for Retrieval-augmented Generation |
提出神经符号检索器,提升检索增强生成的可解释性和性能 |
large language model |
|
|
| 18 |
Internal Representations as Indicators of Hallucinations in Agent Tool Selection |
利用LLM内部表征实时检测Agent工具选择中的幻觉问题 |
large language model |
|
|
| 19 |
Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Large Reasoning Models |
ReasonMark:一种面向大语言模型推理过程的语义引导水印方法 |
large language model |
|
|
| 20 |
Arabic Prompts with English Tools: A Benchmark |
提出Arabic Prompts with English Tools基准,评估阿拉伯语提示下LLM的工具调用能力。 |
large language model |
|
|
| 21 |
Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models |
提出PII-CoT-Bench,通过prompt和微调提升大模型CoT推理的隐私性,减少PII泄露。 |
chain-of-thought |
|
|
| 22 |
Publishing FAIR and Machine-actionable Reviews in Materials Science: The Case for Symbolic Knowledge in Neuro-symbolic Artificial Intelligence |
在材料科学中发布FAIR和机器可操作的评论:神经符号人工智能中符号知识的案例 |
large language model |
|
|
| 23 |
T-Retriever: Tree-based Hierarchical Retrieval Augmented Generation for Textual Graphs |
T-Retriever:提出基于树形结构的层级检索增强生成框架,用于处理文本图推理任务。 |
large language model |
|
|
| 24 |
CurricuLLM: Designing Personalized and Workforce-Aligned Cybersecurity Curricula Using Fine-Tuned LLMs |
CurricuLLM:利用微调LLM设计个性化、工作导向的自动化网络安全课程 |
large language model |
|
|
| 25 |
Orchestrating Intelligence: Confidence-Aware Routing for Efficient Multi-Agent Collaboration across Multi-Scale Models |
提出OI-MAS框架,通过置信度感知路由实现多尺度模型高效多智能体协作 |
large language model |
|
|
| 26 |
DR-LoRA: Dynamic Rank LoRA for Mixture-of-Experts Adaptation |
提出DR-LoRA以解决Mixture-of-Experts适应中的资源不匹配问题 |
large language model |
|
|
| 27 |
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning |
提出CompassMem,利用事件中心记忆作为逻辑地图,提升Agent的搜索和推理能力 |
large language model |
|
|
| 28 |
Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search |
提出M-ASK框架,解耦Agentic搜索中的搜索行为与知识管理,提升多跳问答性能。 |
large language model |
✅ |
|
| 29 |
LLM-Guided Quantified SMT Solving over Uninterpreted Functions |
AquaForte:利用LLM指导的量化SMT求解,解决未解释函数问题 |
large language model |
|
|
| 30 |
LAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence |
LAMB:通过柯西-施瓦茨散度桥接模态鸿沟的LLM音频描述框架 |
large language model |
|
|
| 31 |
Vibe Coding an LLM-powered Theorem Prover |
Isabellm:一种基于LLM的Isabelle/HOL定理证明器,实现全自动证明合成 |
large language model |
✅ |
|
| 32 |
Beyond the "Truth": Investigating Election Rumors on Truth Social During the 2024 Election |
提出基于LLM的多阶段谣言检测Agent,用于分析Truth Social平台2024年选举谣言。 |
large language model |
|
|
| 33 |
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks |
提出Constitutional Classifiers++,高效防御通用越狱攻击,降低计算成本和拒绝率。 |
large language model |
|
|
| 34 |
GUITester: Enabling GUI Agents for Exploratory Defect Discovery |
提出GUITester以解决GUI缺陷自主发现问题 |
large language model |
✅ |
|