cs.LG(2025-07-03)

📊 共 25 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (13 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (11 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
1 Automated Grading of Students' Handwritten Graphs: A Comparison of Meta-Learning and Vision-Large Language Models 对比元学习与视觉大语言模型,实现学生手写图表的自动评分 large language model multimodal
2 How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models 揭示大语言模型中过度自信与批评下的信心不足如何调节思维转变 large language model
3 Adopting a human developmental visual diet yields robust, shape-based AI vision 提出基于人类视觉发育的AI视觉训练方法,提升AI的形状感知能力和鲁棒性 foundation model
4 DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing DistZO2:通过分布式并行计算实现高吞吐、内存高效的Zeroth-Order微调LLM large language model
5 Beyond SEO: A Transformer-Based Approach for Reinventing Web Content Optimisation 提出基于Transformer的GEO方法,提升网页内容在生成式AI搜索中的可见性 large language model
6 HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference HGCA:用于长上下文LLM推理的混合GPU-CPU注意力机制 large language model
7 Fast and Simplex: 2-Simplicial Attention in Triton 提出基于Triton加速的2-Simplicial Transformer,提升Transformer的token效率。 large language model
8 From 2:4 to 8:16 sparsity patterns in LLMs for Outliers and Weights with Variance Correction 针对LLM,提出基于方差校正的8:16稀疏模式,提升离群点权重处理能力。 large language model
9 Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability Transformer推理时可移除LayerNorm:扩展至GPT-2 XL并应用于可解释性研究 large language model
10 Continual Gradient Low-Rank Projection Fine-Tuning for LLMs 提出GORP,通过梯度低秩投影微调LLM,解决持续学习中的效率与表达力权衡问题。 large language model
11 Improving Consistency in Vehicle Trajectory Prediction Through Preference Optimization 通过偏好优化提升车辆轨迹预测中的一致性 large language model
12 Optimas: Optimizing Compound AI Systems with Globally Aligned Local Rewards Optimas:通过全局对齐的局部奖励优化复合AI系统 large language model
13 NLP4Neuro: Sequence-to-sequence learning for neural population decoding NLP4Neuro:利用序列到序列学习的LLM进行神经群体解码,提升行为预测精度。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
14 Offline Reinforcement Learning with Penalized Action Noise Injection 提出PANI:通过惩罚性动作噪声注入提升离线强化学习性能 reinforcement learning offline RL offline reinforcement learning
15 Uncertainty-aware Reward Design Process 提出不确定性感知的奖励设计流程URDP,提升强化学习奖励函数设计的效率与质量。 reinforcement learning reward design large language model
16 A Forget-and-Grow Strategy for Deep Reinforcement Learning Scaling in Continuous Control 提出Forget and Grow算法,通过遗忘早期经验和动态扩展网络解决深度强化学习中的首因偏差问题。 reinforcement learning deep reinforcement learning
17 Deep Reinforcement Learning-Based DRAM Equalizer Parameter Optimization Using Latent Representations 提出基于深度强化学习的DRAM均衡器参数优化方法,提升信号完整性。 reinforcement learning deep reinforcement learning
18 Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms HIPPO:一种用于跨物种蛋白质互作预测的分层多标签对比学习框架 contrastive learning zero-shot transfer
19 ExPO: Unlocking Hard Reasoning with Self-Explanation-Guided Reinforcement Learning ExPO:通过自解释引导的强化学习解锁复杂推理能力 reinforcement learning DPO
20 Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks 研究数据科学家如何通过拼凑法构建预测模型的目标变量,以解决模糊概念建模问题。 predictive model
21 Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions 提出基于多智能体强化学习的动态定价方法,优化供应链策略。 reinforcement learning
22 RLHGNN: Reinforcement Learning-driven Heterogeneous Graph Neural Network for Next Activity Prediction in Business Processes 提出RLHGNN,利用强化学习驱动的异构图神经网络进行业务流程中的下一活动预测。 reinforcement learning
23 On Efficient Bayesian Exploration in Model-Based Reinforcement Learning 提出基于贝叶斯探索的预测轨迹采样(PTS-BE)方法,提升模型强化学习的数据效率。 reinforcement learning
24 Understanding and Improving Length Generalization in Recurrent Models 针对循环模型长度泛化性不足问题,提出基于状态覆盖的训练干预方法。 state space model linear attention

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
25 Mitigating Goal Misgeneralization via Minimax Regret 提出基于最小最大后悔的强化学习方法,缓解目标泛化性缺失问题 domain randomization reinforcement learning

⬅️ 返回 cs.LG 首页 · 🏠 返回主页