| 22 |
Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM |
提出基于查询的跨模态投影器以提升Mamba多模态LLM效率 |
Mamba large language model multimodal |
|
|
| 23 |
VCIFBench: Evaluating Complex Instruction Following for Video Understanding |
提出VCIFBench以评估视频理解中的复杂指令跟随能力 |
DPO large language model multimodal |
|
|
| 24 |
Imbuing Large Language Models with Bidirectional Logic for Robust Chain Repair |
提出Teleological Reasoning Infilling以解决LLMs推理链错误问题 |
DPO large language model chain-of-thought |
|
|
| 25 |
Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models |
提出CAPR算法以优化扩散语言模型的强化学习 |
reinforcement learning PPO large language model |
|
|
| 26 |
GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards |
提出GRAIL以解决强化学习中奖励分配不均问题 |
reinforcement learning large language model |
|
|
| 27 |
LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling |
提出LDARNet以解决基因组建模中的固定标记化问题 |
Mamba large language model foundation model |
|
|
| 28 |
Self-Evaluation Is Already There: Eliciting Latent Judge Calibration in Base LLMs with Minimal Data |
提出自我评估引导方法以提升大语言模型的输出质量预测 |
reinforcement learning distillation large language model |
|
|
| 29 |
GARL: Game-Theoretic Reinforcement Learning for Multi-Agent Strategic Prioritisation |
提出GARL框架以解决多智能体战略优先级问题 |
reinforcement learning reward design |
|
|
| 30 |
DuDi: Dual-Signal Distillation with Cross-Lingual Verbalizer |
提出DuDi框架以提升东南亚语言的小型语言模型性能 |
teacher-student distillation |
|
|
| 31 |
Self-Evolving Deep Research via Joint Generation and Evaluation |
提出自演化共同进化框架以解决深度研究生成与评估问题 |
reinforcement learning reward design large language model |
|
|
| 32 |
Arithmetic Pedagogy for Language Models |
提出基于人类数学教学法的语言模型算术推理训练方法 |
reinforcement learning chain-of-thought |
|
|
| 33 |
Rethinking Continual Experience Internalization for Self-Evolving LLM Agents |
提出经验内化新方法以解决LLM的能力崩溃问题 |
distillation large language model |
|
|
| 34 |
When Clients Stop Following: A Cognitive Conceptualization Diagram-driven Framework for Strategic Counseling |
提出基于认知行为疗法的抵抗感知框架以解决心理咨询评估不匹配问题 |
reinforcement learning large language model |
|
|