| 1 |
Diagnosing Pathological Chain-of-Thought in Reasoning Models |
提出一套评估指标,用于诊断推理模型中思维链(CoT)的病态现象 |
chain-of-thought |
|
|
| 2 |
StackingNet: Collective Inference Across Independent AI Foundation Models |
StackingNet:通过跨独立AI基础模型的集体推理实现性能提升 |
foundation model |
|
|
| 3 |
From What to How: Bridging User Requirements with Software Development Using Large Language Models |
提出DesBench基准,评估大语言模型在软件设计任务中的能力 |
large language model |
|
|
| 4 |
Attention in Constant Time: Vashista Sparse Attention for Long-Context Decoding with Exponential Guarantees |
提出Vashista稀疏注意力以解决长上下文解码效率问题 |
large language model |
|
|
| 5 |
RDBLearn: Simple In-Context Prediction Over Relational Databases |
RDBLearn:关系数据库上的简单上下文学习预测 |
foundation model |
✅ |
|
| 6 |
Multi-Modal Sensing and Fusion in mmWave Beamforming for Connected Vehicles: A Transformer Based Framework |
提出基于Transformer的多模态融合毫米波波束赋形框架,降低车联网环境下的波束训练开销。 |
multimodal |
|
|
| 7 |
Evaluating LLM-Generated ACSL Annotations for Formal Verification |
评估LLM生成的ACSL注解在形式化验证中的有效性 |
large language model |
|
|
| 8 |
DTBench: A Synthetic Benchmark for Document-to-Table Extraction |
DTBench:一个用于文档到表格抽取任务的合成基准测试,着重评估LLM的结构化数据生成能力。 |
large language model |
✅ |
|
| 9 |
OneLatent: Single-Token Compression for Visual Latent Reasoning |
OneLatent:通过单token压缩视觉潜在推理,降低CoT推理成本。 |
chain-of-thought |
|
|
| 10 |
Can a Lightweight Automated AI Pipeline Solve Research-Level Mathematical Problems? |
轻量级AI自动流程解决研究级数学难题:基于引用验证优化 |
large language model |
|
|
| 11 |
PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning |
提出PhGPO,利用信息素引导策略优化,解决长时程工具规划问题 |
large language model |
|
|
| 12 |
AllMem: A Memory-centric Recipe for Efficient Long-context Modeling |
AllMem:一种以内存为中心的方案,用于高效的长文本建模。 |
large language model |
|
|
| 13 |
MAS-on-the-Fly: Dynamic Adaptation of LLM-based Multi-Agent Systems at Test Time |
MASFly:测试时动态自适应LLM多智能体系统框架 |
large language model |
|
|
| 14 |
Guided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience Retrieval |
提出基于熵的多智能体协作框架,解决异构LLM系统中认知失配问题。 |
large language model |
|
|
| 15 |
Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges |
揭示LLM评判器中基于规则的隐蔽偏好漂移攻击,并提出RIPD风险 |
large language model |
✅ |
|
| 16 |
Who Do LLMs Trust? Human Experts Matter More Than Other LLMs |
LLM更信任谁?人类专家比其他LLM更重要 |
large language model |
|
|