| 1 |
Adaptive Multimodal Agents-Based Framework for Automatic Workflow Execution |
提出自适应多模态Agent框架,用于自动工作流执行,提升非稳态场景下的可靠性。 |
multimodal |
|
|
| 2 |
A Conflict-Aware Penalty and Statistical Loss Framework for Balancing Modalities and Enhancing Stability in Multimodal Sentiment Analysis |
提出冲突感知惩罚与统计损失框架,平衡多模态信息并提升多模态情感分析稳定性。 |
multimodal |
|
|
| 3 |
GS-FUSE: Granger-Supervised Gated Fusion and Multi-Granularity Alignment for Event-Driven Financial Forecasting |
GS-Fuse:基于Granger因果监督门控融合和多粒度对齐的事件驱动金融预测框架 |
large language model foundation model multimodal |
|
|
| 4 |
Diffusion Large Language Models for Visual Speech Recognition |
提出基于扩散大语言模型的视觉语音识别框架DLLM-VSR,解决传统自回归解码的局限性。 |
large language model |
|
|
| 5 |
Explaining is Harder Than Predicting Alone: Evaluating Concept-based Explanations of MLLMs as ICL Visual Classifiers |
评估MLLM作为ICL视觉分类器的概念解释能力,揭示解释比预测更难 |
large language model multimodal chain-of-thought |
|
|
| 6 |
CyberJurors: A Multi-Agent Simulation Task for E-Commerce Disputes Verdict |
提出CyberJurors多智能体框架,解决电商纠纷判决任务,模拟众裁陪审团决策。 |
multimodal chain-of-thought |
✅ |
|
| 7 |
Revealing Algorithmic Deductive Circuits for Logical Reasoning |
揭示LLM逻辑推理的算法演绎回路,定位关键注意力头 |
large language model chain-of-thought |
|
|
| 8 |
Do Agents Need Semantic Metadata? A Comparative Study in Agentic Data Retrieval |
对比实验揭示:在Agent数据检索中,语义元数据对保证数据质量至关重要 |
large language model |
|
|
| 9 |
Multi-Adapter Representation Interventions via Energy Calibration |
提出MARI:通过能量校准的多适配器表征干预,提升大语言模型对齐效果。 |
large language model |
✅ |
|
| 10 |
The Importance of Being Statistically Earnest: A Critical Re-evaluation of GSM-Symbolic |
重新评估GSM-Symbolic基准,揭示LLM推理能力评估的统计学陷阱 |
large language model |
|
|
| 11 |
An LLM-Based Assistance System for Intuitive and Flexible Capability-Based Planning |
提出基于LLM的辅助系统,提升基于能力的规划在工业自动化中的可访问性和适应性。 |
large language model |
|
|
| 12 |
Blind PRNG Hijacking: An Undetectable Integrity-Preserving Attack Against LLM Watermarking |
提出SeedHijack攻击,针对LLM水印的PRNG供应链盲篡改,实现完整性保持和正交检测。 |
large language model |
|
|
| 13 |
Satisfiability Solving with LLMs: A Matched-Pair Evaluation of Reasoning Capability |
提出基于配对公式和ADR的评估方法,更可靠地评估LLM在SAT问题上的推理能力 |
large language model |
|
|
| 14 |
MUSE: Benchmarking Manufacturable, Functional, and Assemblable Text-to-CAD Generation |
MUSE:面向可制造、功能化和可组装的文本驱动CAD生成基准测试 |
large language model |
✅ |
|
| 15 |
Tree of Thoughts as a Classical Heuristic Search Problem: Formal Foundations and Design Patterns |
将思维树(ToT)框架形式化为经典启发式搜索问题,并提出设计模式 |
large language model |
|
|
| 16 |
Let Relations Speak: An End-to-End LLM-GNN Soft Prompt Framework for Fraud Detection |
提出LLM-GNN软提示框架LGSPF,用于解决欺诈检测中多关系复杂性和文本信息缺失问题。 |
large language model |
|
|
| 17 |
Do LLMs Favor Their Providers? Measuring Vertical Integration Bias in Code Generation |
VIBench:评估代码生成中大型语言模型对提供商的垂直整合偏见 |
large language model |
|
|
| 18 |
Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets |
提出HYBRIDSOURCETRACKER,实现LLM生成代码片段的高效可扩展溯源跟踪 |
large language model |
|
|
| 19 |
From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints |
提出基于LLM和图约束的标签方法,实现学习资源到能力模型的自动对齐。 |
large language model |
|
|
| 20 |
HRBench: Benchmarking and Understanding Thinking-Mode Switch Strategies in Hybrid-Reasoning LLMs |
HRBench:混合推理LLM中思维模式切换策略的基准测试与理解 |
large language model |
✅ |
|
| 21 |
A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test |
提出一种固定预算、聚类感知的LLM评判标准,用于多跳RAG系统的压力测试。 |
large language model |
|
|