| 1 |
From Mirage to Grounding: Towards Reliable Multimodal Circuit-to-Verilog Code Generation |
揭示MLLM电路图到Verilog代码生成中的“幻影”现象,提出VeriGround模型提升可靠性。 |
large language model multimodal visual grounding |
|
|
| 2 |
Heterogeneous Scientific Foundation Model Collaboration |
Eywa:异构科学基础模型协作框架,扩展Agentic LLM在科学领域的应用 |
large language model foundation model |
|
|
| 3 |
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? |
InteractWeb-Bench:评估多模态Agent在交互式网站生成中避免盲目执行的能力 |
large language model multimodal |
|
|
| 4 |
Design Structure Matrix Modularization with Large Language Models |
利用大语言模型进行设计结构矩阵模块化,无需专业优化代码。 |
large language model |
|
|
| 5 |
Iterative Multimodal Retrieval-Augmented Generation for Medical Question Answering |
提出MED-VRAG,一种迭代多模态检索增强生成框架,用于医学问答。 |
multimodal |
|
|
| 6 |
Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations |
提出Veroic框架,通过可验证观测实现大语言模型服务中风险感知的推理控制。 |
large language model |
|
|
| 7 |
SpecVQA: A Benchmark for Spectral Understanding and Visual Question Answering in Scientific Images |
提出SpecVQA:科学图像中光谱理解与视觉问答的专业评测基准。 |
large language model multimodal |
|
|
| 8 |
The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models |
研究视觉启动对视觉语言模型合作行为的影响,以迭代囚徒困境为测试场景 |
chain-of-thought |
|
|
| 9 |
Exploring Interaction Paradigms for LLM Agents in Scientific Visualization |
探索LLM Agent在科学可视化中的交互范式,权衡性能、鲁棒性和灵活性。 |
large language model |
|
|
| 10 |
Math Education Digital Shadows for facilitating learning with LLMs: Math performance, anxiety and confidence in simulated students and AIs |
提出MEDS数据集,用于评估LLM在数学教育中的能力、偏差及心理特征。 |
large language model |
|
|
| 11 |
What Makes a Good Terminal-Agent Benchmark Task: A Guideline for Adversarial, Difficult, and Legible Evaluation Design |
为终端代理基准测试任务设计提供指导,强调对抗性、难度和可读性 |
large language model |
|
|
| 12 |
Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents |
CARE:一种三方协作的AI Agent工程方法,提升科学领域LLM Agent开发效率 |
large language model |
|
|
| 13 |
LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning |
提出LLM+ASP框架,利用自校正实现任务无关的非单调推理 |
large language model |
|
|
| 14 |
MM-StanceDet: Retrieval-Augmented Multi-modal Multi-agent Stance Detection |
提出MM-StanceDet,通过检索增强的多智能体框架解决多模态立场检测中的融合难题。 |
multimodal |
|
|
| 15 |
MCPHunt: An Evaluation Framework for Cross-Boundary Data Propagation in Multi-Server MCP Agents |
MCPHunt:多服务器MCP代理中跨边界数据传播的评估框架 |
instruction following |
|
|
| 16 |
Post-Optimization Adaptive Rank Allocation for LoRA |
提出PARA,一种LoRA后优化自适应秩分配方法,提升参数效率。 |
foundation model |
|
|
| 17 |
Test Before You Deploy: Governing Updates in the LLM Supply Chain |
提出LLM供应链治理框架,保障部署端LLM更新的兼容性和安全性 |
large language model |
|
|
| 18 |
RuC: HDL-Agnostic Rule Completion Benchmark Generation |
RuC:一种与硬件描述语言无关的、基于规则的可控代码补全基准生成框架 |
large language model |
|
|
| 19 |
Intent2Tx: Benchmarking LLMs for Translating Natural Language Intents into Ethereum Transactions |
Intent2Tx:构建基准测试,评估LLM将自然语言意图转化为以太坊交易的能力 |
large language model |
|
|
| 20 |
Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation |
PAD-Rec:针对LLM生成式列表推荐的位置感知草稿加速推理 |
large language model |
|
|
| 21 |
AgentEconomist: An End-to-end Agentic System Translating Economic Intuitions into Executable Computational Experiments |
AgentEconomist:将经济学直觉转化为可执行计算实验的端到端智能系统 |
large language model |
|
|
| 22 |
Bridging Values and Behavior: A Hierarchical Framework for Proactive Embodied Agents |
提出ValuePlanner,解决具身智能体长期自主行为决策问题 |
instruction following |
|
|
| 23 |
When Agents Evolve, Institutions Follow |
提出基于历史政治制度的多智能体架构,提升LLM的集体智能。 |
large language model |
✅ |
|
| 24 |
HAVEN: Hybrid Automated Verification ENgine for UVM Testbench Synthesis with LLMs |
HAVEN:一种混合自动化验证引擎,利用LLM进行UVM测试平台综合 |
large language model |
|
|
| 25 |
Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading |
揭示LLM评估陷阱:未优化Prompt可能导致模型排序失真 |
large language model |
|
|
| 26 |
Political Bias Audits of LLMs Capture Sycophancy to the Inferred Auditor |
揭示大型语言模型的政治偏见与迎合审计者的关系 |
large language model |
|
|
| 27 |
In-Context Examples Suppress Scientific Knowledge Recall in LLMs |
在LLM中,上下文示例会抑制科学知识的调用,导致模型倾向于经验模式拟合。 |
large language model |
|
|
| 28 |
Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study |
针对自主Agent框架的安全风险,提出分层分析与防御策略综述 |
large language model |
|
|
| 29 |
Toward Autonomous SOC Operations: End-to-End LLM Framework for Threat Detection, Query Generation, and Resolution in Security Operations |
提出端到端LLM框架,自动化SOC威胁检测、查询生成和事件响应。 |
large language model |
|
|
| 30 |
METASYMBO: Multi-Agent Language-Guided Metamaterial Discovery via Symbolic Latent Evolution |
提出MetaSymbO,通过符号驱动的潜在演化实现语言引导的多智能体超材料发现。 |
large language model |
|
|
| 31 |
Evaluating Epistemic Guardrails in AI Reading Assistants: A Behavioral Audit of a Minimal Prototype |
提出评估AI阅读助手认知防护栏的协议,揭示交互行为动态及边界功能。 |
large language model |
|
|
| 32 |
ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts |
提出ARMOR 2025:一个面向军事场景的大语言模型安全评估基准 |
large language model |
|
|
| 33 |
Minimal, Local, Causal Explanations for Jailbreak Success in Large Language Models |
提出LOCA方法,为大语言模型越狱攻击提供最小化、局部化和因果解释 |
large language model |
|
|
| 34 |
DeGenTWeb: A First Look at LLM-dominant Websites |
DeGenTWeb:首次系统性识别并分析LLM主导的网站,揭示其普遍性和演变趋势 |
large language model |
|
|
| 35 |
A Survey of Reasoning-Intensive Retrieval: Progress and Challenges |
针对推理密集型检索的综述:系统性地分析现有方法,并展望未来研究方向 |
large language model |
|
|
| 36 |
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data |
TADI:通过Agentic LLM编排异构井场数据,实现工具增强的钻井智能 |
large language model |
|
|