| 1 |
SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding |
提出SONIC-O1:一个用于评估多模态大语言模型音视频理解能力的真实世界基准 |
large language model multimodal |
✅ |
|
| 2 |
Chain Of Thought Compression: A Theoritical Analysis |
提出ALiCoT框架,通过对齐隐式推理状态,提升大语言模型推理效率并保持性能。 |
large language model chain-of-thought |
|
|
| 3 |
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization |
提出PLaT:解耦推理与表达的潜在思维链规划框架 |
large language model chain-of-thought |
|
|
| 4 |
ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models |
ToolWeaver:通过编织协作语义实现大语言模型中可扩展的工具使用 |
large language model |
|
|
| 5 |
Looking Beyond Accuracy: A Holistic Benchmark of ECG Foundation Models |
提出心电图(ECG)基础模型的全面基准测试框架,超越传统准确率评估 |
foundation model |
|
|
| 6 |
CORE:Toward Ubiquitous 6G Intelligence Through Collaborative Orchestration of Large Language Model Agents Over Hierarchical Edge |
CORE:通过分层边缘上LLM智能体协同编排,实现无处不在的6G智能 |
large language model |
|
|
| 7 |
Moral Outrage Shapes Commitments Beyond Attention: Multimodal Moral Emotions on YouTube in Korea and the US |
提出多模态道德情感分类器,揭示YouTube新闻中道德愤怒如何驱动用户参与。 |
multimodal |
|
|
| 8 |
Assessing the Business Process Modeling Competences of Large Language Models |
提出BEF4LLM框架,评估大语言模型在业务流程建模中的能力 |
large language model |
|
|
| 9 |
The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation |
通过风格向量操控大型语言模型:一项人类评估研究 |
large language model |
|
|
| 10 |
LION: A Clifford Neural Paradigm for Multimodal-Attributed Graph Learning |
提出基于Clifford代数的LION模型,用于多模态属性图学习中的对齐与融合。 |
multimodal |
|
|
| 11 |
TeachBench: A Syllabus-Grounded Framework for Evaluating Teaching Ability in Large Language Models |
提出TeachBench:一个基于教学大纲评估大语言模型教学能力的框架 |
large language model |
|
|
| 12 |
EHR-RAG: Bridging Long-Horizon Structured Electronic Health Records and Large Language Models via Enhanced Retrieval-Augmented Generation |
EHR-RAG:通过增强检索增强生成,弥合长程结构化电子病历与大型语言模型之间的差距 |
large language model |
|
|
| 13 |
Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks |
分析LLM在创意任务中提示与模型选择对输出方差的影响 |
large language model |
|
|
| 14 |
Industrialized Deception: The Collateral Effects of LLM-Generated Misinformation on Digital Ecosystems |
提出JudgeGPT和RogueGPT平台,研究LLM生成虚假信息对数字生态的影响及应对策略 |
large language model multimodal |
|
|
| 15 |
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities |
提出DeR2基准,解耦检索与推理能力,评估大语言模型在科学信息上的推理能力。 |
large language model foundation model |
|
|
| 16 |
TCAP: Tri-Component Attention Profiling for Unsupervised Backdoor Detection in MLLM Fine-Tuning |
提出TCAP,用于无监督检测多模态大语言模型微调中的后门攻击。 |
large language model multimodal |
|
|
| 17 |
Ostrakon-VL: Towards Domain-Expert MLLM for Food-Service and Retail Stores |
提出面向餐饮和零售场景的领域专家多模态大语言模型Ostrakon-VL |
large language model multimodal |
|
|
| 18 |
Bridging the Arithmetic Gap: The Cognitive Complexity Benchmark and Financial-PoT for Robust Financial Reasoning |
提出认知复杂度基准CCB与Financial-PoT框架,提升LLM在金融量化推理中的鲁棒性 |
large language model chain-of-thought |
|
|
| 19 |
Learning to Communicate Across Modalities: Perceptual Heterogeneity in Multi-Agent Systems |
研究异构多智能体系统中的跨模态通信,解决感知差异下的信息传递问题 |
multimodal |
|
|
| 20 |
SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents |
SWE-Replay:为软件工程Agent提供高效的测试时扩展方法 |
large language model |
|
|
| 21 |
RedSage: A Cybersecurity Generalist LLM |
RedSage:一个面向网络安全的通用LLM,通过领域自适应预训练和智能体增强实现卓越性能。 |
instruction following |
|
|
| 22 |
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty |
CAR-bench:评估LLM智能体在真实不确定性下的可靠性与能力边界 |
large language model |
|
|
| 23 |
AgenticSimLaw: A Juvenile Courtroom Multi-Agent Debate Simulation for Explainable High-Stakes Tabular Decision Making |
AgenticSimLaw:用于可解释高风险表格决策的青少年法庭多智能体辩论模拟 |
chain-of-thought |
|
|
| 24 |
astra-langchain4j: Experiences Combining LLMs and Agent Programming |
探索LLM与Agent编程融合:基于ASTRA语言的Langchain4j集成实践 |
large language model |
|
|
| 25 |
KnowBias: Mitigating Social Bias in LLMs via Know-Bias Neuron Enhancement |
KnowBias:通过增强偏见知识神经元缓解大型语言模型中的社会偏见 |
large language model |
✅ |
|
| 26 |
EWSJF: An Adaptive Scheduler with Hybrid Partitioning for Mixed-Workload LLM Inference |
EWSJF:一种混合负载LLM推理的自适应混合分区调度器 |
large language model |
|
|
| 27 |
E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory |
提出E-mem,通过多智能体情景重建增强LLM Agent的记忆能力,提升复杂推理性能。 |
large language model |
|
|
| 28 |
FBS: Modeling Native Parallel Reading inside a Transformer |
提出FBS Transformer,通过模拟人类阅读机制提升LLM推理效率。 |
large language model |
|
|
| 29 |
CORE: Collaborative Reasoning via Cross Teaching |
提出CORE:通过交叉教学实现协同推理,提升大语言模型解题能力 |
large language model |
|
|
| 30 |
Meta Context Engineering via Agentic Skill Evolution |
提出Meta Context Engineering,通过智能体技能进化优化大语言模型上下文工程。 |
large language model |
|
|
| 31 |
ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory |
提出ShardMemo以解决大规模语言模型的内存瓶颈问题 |
large language model |
|
|
| 32 |
LLaMEA-SAGE: Guiding Automated Algorithm Design with Structural Feedback from Explainable AI |
LLaMEA-SAGE:利用可解释AI的结构化反馈指导自动算法设计 |
large language model |
|
|
| 33 |
The Path of Least Resistance: Guiding LLM Reasining Trajectories with Prefix Consensus |
PoLR:利用前缀一致性引导LLM推理,提升效率并保持准确性 |
large language model |
|
|
| 34 |
Adaptive Confidence Gating in Multi-Agent Collaboration for Efficient and Optimized Code Generation |
提出 DebateCoder,利用多智能体协作和自适应置信门控提升小模型代码生成能力。 |
large language model |
|
|
| 35 |
ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design |
ChipBench:用于评估LLM在AI辅助芯片设计中性能的新基准 |
large language model |
✅ |
|
| 36 |
NEMO: Execution-Aware Optimization Modeling via Autonomous Coding Agents |
NEMO:通过自主编码代理实现执行感知的优化建模 |
large language model |
|
|
| 37 |
More Code, Less Reuse: Investigating Code Quality and Reviewer Sentiment towards AI-generated Pull Requests |
研究表明AI生成的Pull Request代码质量较低,但评审者情绪更积极 |
large language model |
|
|