| 1 |
Efficient Data Selection for Multimodal Models via Incremental Optimization Utility |
提出One-Step-Train (OST)框架,通过增量优化效用评估实现多模态模型高效数据筛选 |
multimodal |
|
|
| 2 |
Vaporizer: Breaking Watermarking Schemes for Large Language Model Outputs |
提出Vaporizer攻击框架,通过语义保持的文本修改有效破解大语言模型水印方案 |
large language model |
|
|
| 3 |
Abductive Reasoning with Probabilistic Commonsense |
提出概率性溯因常识推理框架PACS,通过集成LLM与形式化求解器解决常识认知差异问题。 |
large language model chain-of-thought |
|
|
| 4 |
Do Joint Audio-Video Generation Models Understand Physics? |
提出AV-Phys Bench基准与AV-Phys Agent评估框架,揭示音视频生成模型在物理常识理解上的局限性。 |
multimodal |
|
|
| 5 |
MathlibPR: Pull Request Merge-Readiness Benchmark for Formal Mathematical Libraries |
提出MathlibPR基准测试,旨在利用大语言模型辅助形式化数学库的Pull Request评审 |
large language model |
|
|
| 6 |
TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples |
提出TraceFix框架:利用TLA+模型检测与反例反馈实现多智能体协作协议的自动修复与验证 |
large language model |
|
|
| 7 |
CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios |
提出CyBiasBench基准测试,揭示大模型智能体在网络攻击场景中的选择性偏见与行为模式 |
large language model |
✅ |
|
| 8 |
RuleSafe-VL: Evaluating Rule-Conditioned Decision Reasoning in Vision-Language Content Moderation |
提出RuleSafe-VL基准,通过规则条件化决策推理评估多模态内容审核能力 |
multimodal |
|
|
| 9 |
LLM hallucinations in the wild: Large-scale evidence from non-existent citations |
大规模实证研究揭示:大语言模型幻觉正导致科学文献中虚假引用激增 |
large language model |
|
|
| 10 |
The AI-Native Large-Scale Agile Software Development Manifesto |
提出AI原生大规模敏捷软件开发宣言,通过AI代理重塑组织级开发范式 |
large language model |
|
|
| 11 |
Hierarchical Task Network Planning with LLM-Generated Heuristics |
利用LLM生成启发式函数,提升分层任务网络规划效率 |
large language model |
|
|
| 12 |
GASim: A Graph-Accelerated Hybrid Framework for Social Simulation |
提出GASim图加速混合框架,通过图神经网络优化大规模社会模拟的计算效率 |
large language model |
✅ |
|
| 13 |
LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation |
提出LARAG检索策略,通过利用超链接拓扑结构提升技术文档RAG系统的检索准确性与效率。 |
large language model |
|
|
| 14 |
Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study |
评估LLM在软件工程心理安全定性编码中的提示工程策略,揭示模型稳定性与偏差规律 |
large language model |
|
|
| 15 |
GraphReAct: Reasoning and Acting for Multi-step Graph Inference |
提出GraphReAct框架,通过推理与行动交替机制实现图结构数据的多步推理 |
large language model |
|
|
| 16 |
SOM: Structured Opponent Modeling for LLM-based Agents via Structural Causal Model |
提出结构化对手建模(SOM)框架,利用结构因果模型提升LLM智能体在多智能体环境下的预测能力。 |
large language model |
|
|
| 17 |
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization |
提出HMACE异构多智能体协作进化框架,以解决组合优化问题中的启发式算法自动设计难题。 |
large language model |
|
|
| 18 |
ARMOR: An Agentic Framework for Reaction Feasibility Prediction via Adaptive Utility-aware Multi-tool Reasoning |
提出ARMOR代理框架,通过自适应效用感知多工具推理解决化学反应可行性预测难题 |
large language model |
|
|
| 19 |
2.5-D Decomposition for LLM-Based Spatial Construction |
提出2.5-D分解方法,通过解耦空间维度提升大模型在自主构建任务中的空间推理精度 |
large language model |
|
|