| 1 |
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision |
提出MM-PRM,通过可扩展的步骤级监督增强多模态数学推理能力 |
large language model multimodal |
✅ |
|
| 2 |
Aneumo: A Large-Scale Multimodal Aneurysm Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks |
Aneumo:大规模多模态脑动脉瘤数据集,含CFD模拟与深度学习基准。 |
multimodal |
✅ |
|
| 3 |
Survey: Multi-Armed Bandits Meet Large Language Models |
探索多臂老虎机与大语言模型的协同:优化决策与自然语言处理 |
large language model |
|
|
| 4 |
Advancing Software Quality: A Standards-Focused Review of LLM-Based Assurance Techniques |
利用LLM提升软件质量:一项聚焦标准的软件质量保证技术综述 |
large language model multimodal |
|
|
| 5 |
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers |
提出因果头门控(CHG)框架,用于Transformer模型中注意力头的功能角色解释。 |
large language model instruction following |
|
|
| 6 |
Unified Cross-modal Translation of Score Images, Symbolic Music, and Performance Audio |
提出统一跨模态翻译框架,实现乐谱图像、符号音乐和演奏音频间的转换。 |
multimodal |
|
|
| 7 |
Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference |
CausalPitfalls:评估LLM在因果推断中应对统计陷阱能力的基准测试 |
large language model |
|
|
| 8 |
Security Degradation in Iterative AI Code Generation -- A Systematic Analysis of the Paradox |
揭示AI代码迭代生成中的安全退化悖论,强调人工干预的重要性 |
large language model |
|
|
| 9 |
Safety Alignment Can Be Not Superficial With Explicit Safety Signals |
引入显式安全信号,提升大语言模型对抗攻击的鲁棒性 |
large language model |
|
|
| 10 |
Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models |
提出反事实干预框架,评估大型推理模型中思维草稿的忠实性 |
chain-of-thought |
|
|
| 11 |
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations |
提出神经反馈范式以量化语言模型的元认知能力 |
large language model |
|
|
| 12 |
CoT-Kinetics: A Theoretical Modeling Assessing LRM Reasoning Process |
提出CoT-Kinetics能量方程,评估大型推理模型(LRM)推理过程的合理性。 |
large language model |
|
|
| 13 |
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database |
AutoMathKG:提出一种基于LLM和向量数据库的自动化数学知识图谱构建方法 |
large language model |
|
|
| 14 |
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition |
提出CompeteSMoE以解决稀疏专家模型训练效率问题 |
large language model |
✅ |
|
| 15 |
LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs |
LLM-KG-Bench 3.0:评估大语言模型在语义技术和知识图谱工程能力的基准框架 |
large language model |
|
|