cs.LG（2024-10-11）

📊 共 41 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (18 🔗8) 支柱九：具身大模型 (Embodied Foundation Models) (15 🔗4) 支柱八：物理动画 (Physics-based Animation) (4) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (18 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient	Drama：基于Mamba的状态空间模型提升模型强化学习的样本效率和参数效率	reinforcement learning world model model-based RL	✅
2	Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both	提出DRDO，同时进行奖励蒸馏和偏好学习，提升语言模型性能。	preference learning RLHF DPO
3	When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning	提出MAGB基准数据集，系统评估多模态属性图学习的GNN和VLM方法。	representation learning multimodal	✅
4	Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization	提出DQO：通过直接Q函数优化提升语言模型的多步推理能力	reinforcement learning PPO SAC
5	On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning	提出基于判别概率建模的自监督表征学习方法，提升对比学习性能	representation learning multimodal	✅
6	Parameter-Efficient Fine-Tuning of State Space Models	提出稀疏维度调整(SDT)方法，高效微调状态空间模型(SSM)，提升性能。	Mamba SSM state space model
7	Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization	揭示DPO中似然位移现象，提出CHES指标以缓解非预期对齐问题	DPO direct preference optimization
8	M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation	M$^3$-Impute：利用掩码引导的表征学习进行缺失值插补	representation learning MAE
9	Zero-Shot Offline Imitation Learning via Optimal Transport	提出基于最优传输的零样本离线模仿学习方法，解决传统方法短视问题。	imitation learning world model	✅
10	DFM: Interpolant-free Dual Flow Matching	提出无插值的对偶流匹配(DFM)方法，提升无监督异常检测性能。	flow matching
11	AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration	综述AI学习算法：深度学习、混合模型与大规模模型集成	reinforcement learning large language model
12	Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control	提出序列强化学习(SRL)，解决连续控制中低决策频率下的控制难题。	reinforcement learning
13	MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL	MAD-TD：模型增强数据稳定高更新率强化学习，提升样本效率	reinforcement learning deep reinforcement learning world model
14	Distillation of Discrete Diffusion through Dimensional Correlations	提出混合模型以解决离散扩散模型采样速度慢的问题	distillation	✅
15	DistDD: Distributed Data Distillation Aggregation through Gradient Matching	DistDD：通过梯度匹配实现分布式数据蒸馏聚合，减少联邦学习中的重复通信。	distillation
16	CYCLE: Cross-Year Contrastive Learning in Entity-Linking	提出CYCLE以解决实体链接中的时间性能退化问题	contrastive learning	✅
17	Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning	提出Kaleidoscope以解决多智能体强化学习中的策略同质性问题	reinforcement learning	✅
18	NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs	NextLocLLM：利用LLM进行位置语义建模和基于坐标的下一位置预测	predictive model spatiotemporal	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
19	A Systematic Survey on Large Language Models for Algorithm Design	综述性研究：利用大型语言模型进行算法设计的系统性分析	large language model	✅
20	Transformers Provably Solve Parity Efficiently with Chain of Thought	提出CoT Transformer理论分析，证明其能高效解决奇偶校验问题	chain-of-thought
21	MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models	MergePrint：用于大语言模型黑盒所有权验证的抗合并指纹	large language model
22	Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs	利用LLM生成代码转换而非直接重写代码，提升代码重写的精确性	large language model chain-of-thought
23	DeepOSets: Non-Autoregressive In-Context Learning with Permutation-Invariance Inductive Bias	提出DeepOSets以解决非自回归上下文学习问题	large language model
24	Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	提出Retro-Holdouts方法，揭示LLM在TruthfulQA上的benchmark膨胀问题。	large language model
25	Automated Rewards via LLM-Generated Progress Functions	利用LLM生成进度函数，自动化奖励工程，提升机器人灵巧手操作性能。	large language model
26	Zeroth-Order Fine-Tuning of LLMs in Random Subspaces	提出SubZero：一种随机子空间零阶优化方法，用于高效微调大型语言模型。	large language model	✅
27	On the Adversarial Transferability of Generalized "Skip Connections"	提出Skip Gradient Method (SGM)，提升跳跃连接模型对抗样本的迁移性。	large language model	✅
28	Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory	利用随机矩阵理论提升合成数据质量，改善二分类器性能	large language model
29	Do Unlearning Methods Remove Information from Language Model Weights?	提出对抗性评估方法，揭示现有语言模型“遗忘”技术的信息移除局限性	large language model
30	Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models	Superpipeline：一种通用的大模型GPU内存优化方案，适用于训练和推理。	large language model	✅
31	Preferential Normalizing Flows	提出基于偏好信息的归一化流方法，用于专家知识的概率分布建模	large language model
32	DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization	DeltaDQ：通过分组Dropout和分离量化实现微调LLM的超高Delta压缩	large language model
33	Retraining-Free Merging of Sparse MoE via Hierarchical Clustering	提出HC-SMoE以解决稀疏专家模型的参数合并问题	large language model

🔬 支柱八：物理动画 (Physics-based Animation) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
34	Encoding Agent Trajectories as Representations with Sequence Transformers	提出STARE模型，利用Transformer编码智能体轨迹，解决时空轨迹表示问题。	spatiotemporal
35	Meta-Transfer Learning Empowered Temporal Graph Networks for Cross-City Real Estate Appraisal	提出MetaTransfer，利用元迁移学习增强时序图网络，解决跨城市房地产估值问题。	spatiotemporal
36	Edge AI Collaborative Learning: Bayesian Approaches to Uncertainty Estimation	提出基于贝叶斯神经网络的边缘AI协同学习方法，用于不确定性估计。	spatiotemporal
37	Establishing Nationwide Power System Vulnerability Index across US Counties Using Interpretable Machine Learning	利用可解释机器学习构建美国县级电力系统脆弱性全国指数	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
38	SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels	提出SOLD：基于Slot注意力的对象中心潜在动力学模型，用于像素级关系操作学习	manipulation reinforcement learning world model	✅
39	Can we hop in general? A discussion of benchmark selection and design using the Hopper environment	基于Hopper环境的基准测试选择与设计讨论，揭示RL评估的潜在问题	legged robot reinforcement learning

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
40	Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing	提出姿态与视频条件编辑方法，增强文本到动作模型中的动作多样性	text-to-motion

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
41	The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses	形式化分析水印、可迁移攻击与对抗防御的权衡，揭示三者至少存在其一	OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页