cs.LG（2026-04-09）

📊 共 35 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (16 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (14) 支柱八：物理动画 (Physics-based Animation) (3) 支柱一：机器人控制 (Robot Control) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (16 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Tree-of-Evidence: Efficient "System 2" Search for Faithful Multimodal Grounding	提出Tree-of-Evidence算法，用于提升多模态大模型的决策可解释性与忠实度	multimodal
2	Preference Redirection via Attention Concentration: An Attack on Computer Use Agents	提出PRAC：通过注意力集中重定向计算机使用代理偏好的攻击方法	foundation model multimodal
3	SOLAR: Communication-Efficient Model Adaptation via Subspace-Oriented Latent Adapter Reparametrization	SOLAR：通过子空间导向的潜在适配器重参数化实现通信高效的模型适配	foundation model
4	Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding	提出一种基于元学习的上下文学习方法，实现无需训练的跨个体脑解码。	foundation model
5	What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal	通过机制性案例研究揭示表征引导的内在机理，聚焦于拒绝回答现象	large language model
6	Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks	提出基于表格先验拟合网络的零样本多元时间序列预测框架	foundation model
7	ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification	ADAPT：面向时序分类的多对一预训练，解决输入差异性难题。	foundation model
8	Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models	提出基于冻结语言模型的Feedforward图架构，实现知识融合与性能提升	large language model
9	Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference	提出Alloc-MoE以解决稀疏激活导致的推理延迟问题	large language model
10	Automating aggregation strategy selection in federated learning	提出自动化联邦学习聚合策略选择框架，提升非独立同分布数据下的泛化性。	large language model
11	Rethinking Residual Errors in Compensation-based LLM Quantization	重新审视量化残差，提升基于补偿的大语言模型量化性能	large language model	✅
12	QoS-QoE Translation with Large Language Model	构建QoS-QoE翻译数据集，并利用大语言模型实现双向翻译，提升多媒体质量预测与优化。	large language model	✅
13	PRAGMA: Revolut Foundation Model	PRAGMA：用于金融事件序列的Revolut基础模型	foundation model
14	HiFloat4 Format for Language Model Pre-training on Ascend NPUs	研究华为昇腾NPU上HiFloat4格式在LLM预训练中的应用，并优化训练稳定性。	large language model foundation model
15	Adaptive Simulation Experiment for LLM Policy Optimization	提出基于对比的自适应仿真实验框架以优化LLM策略	large language model
16	Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition	提出MATU框架，通过张量分解量化LLM多智能体系统的不确定性	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Multimodal Latent Reasoning via Predictive Embeddings	提出Pearl，通过预测嵌入对齐实现多模态隐空间推理，无需显式工具调用。	JEPA depth estimation multimodal
18	Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning	提出VGM$^2$P，通过值引导MeanFlow解决离线多智能体强化学习中的策略学习效率问题。	reinforcement learning policy learning behavior cloning
19	CausalVAE as a Plug-in for World Models: Towards Reliable Counterfactual Dynamics	提出CausalVAE插件式模块，提升世界模型的反事实动态预测可靠性	world model world models
20	Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization	MolReAct：基于LLM引导和反应模板约束的强化学习药物先导化合物优化	reinforcement learning large language model
21	Less Approximates More: Harmonizing Performance and Confidence Faithfulness via Hybrid Post-Training for High-Stakes Tasks	提出HyTuning框架，通过混合后训练提升大模型在高风险任务中的置信度可靠性	reinforcement learning distillation large language model
22	TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis	提出TTVS，通过测试时变分合成提升自探索强化学习，解决专业领域监督数据匮乏问题。	reinforcement learning
23	QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch	QaRL：提出Rollout对齐的量化感知强化学习，加速LLM训练并提升稳定性	reinforcement learning large language model
24	Structured Distillation of Web Agent Capabilities Enables Generalization	提出Agent-as-Annotators框架，通过结构化蒸馏提升Web Agent在复杂环境中的泛化能力。	distillation
25	MIPT-SSM: Scaling Language Models with $O(1)$ Inference Cache via Phase Transitions	提出MIPT-SSM以解决语言模型推理效率问题	SSM
26	An Imperfect Verifier is Good Enough: Learning with Noisy Rewards	研究表明：带噪声奖励的强化学习在LLM训练中具有鲁棒性	reinforcement learning large language model
27	Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning	提出基于多智能体Actor-Critic强化学习的灾害社区恐慌缓解方法	reinforcement learning
28	Wireless Communication Enhanced Value Decomposition for Multi-Agent Reinforcement Learning	提出CLOVER框架，利用无线通信图增强多智能体强化学习中的值分解。	reinforcement learning
29	StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning	提出StructRL框架以从分布式强化学习中恢复动态规划结构	reinforcement learning
30	From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity	提出FEAT：联邦几何感知校正方法，提升动态异构联邦持续学习中Exemplar Replay性能	distillation geometric consistency

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
31	Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency	提出Kuramoto振荡相位编码(KoPE)，通过神经启发的同步机制提升Vision Transformer的学习效率。	spatiotemporal
32	Bias-Constrained Diffusion Schedules for PDE Emulations: Reconstruction Error Minimization and Efficient Unrolled Training	提出偏差约束扩散调度方法，提升PDE模拟精度和训练效率。	spatiotemporal
33	Bias-Constrained Diffusion Schedules for PDE Emulations: Reconstruction Error Minimization and Efficient Unrolled Training	提出偏差约束扩散调度方法，提升PDE模拟精度与训练效率	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
34	PriPG-RL: Privileged Planner-Guided Reinforcement Learning for Partially Observable Systems with Anytime-Feasible MPC	提出Privileged Planner-Guided RL以解决部分可观测系统中的强化学习问题	quadruped MPC model predictive control

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
35	EgoEverything: A Benchmark for Human Behavior Inspired Long Context Egocentric Video Understanding in AR Environment	EgoEverything：一个受人类行为启发的AR环境长时程第一视角视频理解基准	egocentric

⬅️ 返回 cs.LG 首页 · 🏠 返回主页