cs.LG(2024-10-11)

📊 共 41 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (18 🔗8) 支柱九:具身大模型 (Embodied Foundation Models) (15 🔗4) 支柱八:物理动画 (Physics-based Animation) (4) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (18 篇)

#题目一句话要点标签🔗
1 Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient Drama:基于Mamba的状态空间模型提升模型强化学习的样本效率和参数效率 reinforcement learning world model model-based RL
2 Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both 提出DRDO,同时进行奖励蒸馏和偏好学习,提升语言模型性能。 preference learning RLHF DPO
3 When Graph meets Multimodal: Benchmarking and Meditating on Multimodal Attributed Graphs Learning 提出MAGB基准数据集,系统评估多模态属性图学习的GNN和VLM方法。 representation learning multimodal
4 Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization 提出DQO:通过直接Q函数优化提升语言模型的多步推理能力 reinforcement learning PPO SAC
5 On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning 提出基于判别概率建模的自监督表征学习方法,提升对比学习性能 representation learning multimodal
6 Parameter-Efficient Fine-Tuning of State Space Models 提出稀疏维度调整(SDT)方法,高效微调状态空间模型(SSM),提升性能。 Mamba SSM state space model
7 Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization 揭示DPO中似然位移现象,提出CHES指标以缓解非预期对齐问题 DPO direct preference optimization
8 M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation M$^3$-Impute:利用掩码引导的表征学习进行缺失值插补 representation learning MAE
9 Zero-Shot Offline Imitation Learning via Optimal Transport 提出基于最优传输的零样本离线模仿学习方法,解决传统方法短视问题。 imitation learning world model
10 DFM: Interpolant-free Dual Flow Matching 提出无插值的对偶流匹配(DFM)方法,提升无监督异常检测性能。 flow matching
11 AI Learning Algorithms: Deep Learning, Hybrid Models, and Large-Scale Model Integration 综述AI学习算法:深度学习、混合模型与大规模模型集成 reinforcement learning large language model
12 Overcoming Slow Decision Frequencies in Continuous Control: Model-Based Sequence Reinforcement Learning for Model-Free Control 提出序列强化学习(SRL),解决连续控制中低决策频率下的控制难题。 reinforcement learning
13 MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL MAD-TD:模型增强数据稳定高更新率强化学习,提升样本效率 reinforcement learning deep reinforcement learning world model
14 Distillation of Discrete Diffusion through Dimensional Correlations 提出混合模型以解决离散扩散模型采样速度慢的问题 distillation
15 DistDD: Distributed Data Distillation Aggregation through Gradient Matching DistDD:通过梯度匹配实现分布式数据蒸馏聚合,减少联邦学习中的重复通信。 distillation
16 CYCLE: Cross-Year Contrastive Learning in Entity-Linking 提出CYCLE以解决实体链接中的时间性能退化问题 contrastive learning
17 Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning 提出Kaleidoscope以解决多智能体强化学习中的策略同质性问题 reinforcement learning
18 NextLocLLM: Location Semantics Modeling and Coordinate-Based Next Location Prediction with LLMs NextLocLLM:利用LLM进行位置语义建模和基于坐标的下一位置预测 predictive model spatiotemporal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (15 篇)

#题目一句话要点标签🔗
19 A Systematic Survey on Large Language Models for Algorithm Design 综述性研究:利用大型语言模型进行算法设计的系统性分析 large language model
20 Transformers Provably Solve Parity Efficiently with Chain of Thought 提出CoT Transformer理论分析,证明其能高效解决奇偶校验问题 chain-of-thought
21 MergePrint: Merge-Resistant Fingerprints for Robust Black-box Ownership Verification of Large Language Models MergePrint:用于大语言模型黑盒所有权验证的抗合并指纹 large language model
22 Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs 利用LLM生成代码转换而非直接重写代码,提升代码重写的精确性 large language model chain-of-thought
23 DeepOSets: Non-Autoregressive In-Context Learning with Permutation-Invariance Inductive Bias 提出DeepOSets以解决非自回归上下文学习问题 large language model
24 Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts 提出Retro-Holdouts方法,揭示LLM在TruthfulQA上的benchmark膨胀问题。 large language model
25 Automated Rewards via LLM-Generated Progress Functions 利用LLM生成进度函数,自动化奖励工程,提升机器人灵巧手操作性能。 large language model
26 Zeroth-Order Fine-Tuning of LLMs in Random Subspaces 提出SubZero:一种随机子空间零阶优化方法,用于高效微调大型语言模型。 large language model
27 On the Adversarial Transferability of Generalized "Skip Connections" 提出Skip Gradient Method (SGM),提升跳跃连接模型对抗样本的迁移性。 large language model
28 Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory 利用随机矩阵理论提升合成数据质量,改善二分类器性能 large language model
29 Do Unlearning Methods Remove Information from Language Model Weights? 提出对抗性评估方法,揭示现有语言模型“遗忘”技术的信息移除局限性 large language model
30 Superpipeline: A Universal Approach for Reducing GPU Memory Usage in Large Models Superpipeline:一种通用的大模型GPU内存优化方案,适用于训练和推理。 large language model
31 Preferential Normalizing Flows 提出基于偏好信息的归一化流方法,用于专家知识的概率分布建模 large language model
32 DeltaDQ: Ultra-High Delta Compression for Fine-Tuned LLMs via Group-wise Dropout and Separate Quantization DeltaDQ:通过分组Dropout和分离量化实现微调LLM的超高Delta压缩 large language model
33 Retraining-Free Merging of Sparse MoE via Hierarchical Clustering 提出HC-SMoE以解决稀疏专家模型的参数合并问题 large language model

🔬 支柱八:物理动画 (Physics-based Animation) (4 篇)

#题目一句话要点标签🔗
34 Encoding Agent Trajectories as Representations with Sequence Transformers 提出STARE模型,利用Transformer编码智能体轨迹,解决时空轨迹表示问题。 spatiotemporal
35 Meta-Transfer Learning Empowered Temporal Graph Networks for Cross-City Real Estate Appraisal 提出MetaTransfer,利用元迁移学习增强时序图网络,解决跨城市房地产估值问题。 spatiotemporal
36 Edge AI Collaborative Learning: Bayesian Approaches to Uncertainty Estimation 提出基于贝叶斯神经网络的边缘AI协同学习方法,用于不确定性估计。 spatiotemporal
37 Establishing Nationwide Power System Vulnerability Index across US Counties Using Interpretable Machine Learning 利用可解释机器学习构建美国县级电力系统脆弱性全国指数 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
38 SOLD: Slot Object-Centric Latent Dynamics Models for Relational Manipulation Learning from Pixels 提出SOLD:基于Slot注意力的对象中心潜在动力学模型,用于像素级关系操作学习 manipulation reinforcement learning world model
39 Can we hop in general? A discussion of benchmark selection and design using the Hopper environment 基于Hopper环境的基准测试选择与设计讨论,揭示RL评估的潜在问题 legged robot reinforcement learning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
40 Enhancing Motion Variation in Text-to-Motion Models via Pose and Video Conditioned Editing 提出姿态与视频条件编辑方法,增强文本到动作模型中的动作多样性 text-to-motion

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
41 The Good, the Bad and the Ugly: Meta-Analysis of Watermarks, Transferable Attacks and Adversarial Defenses 形式化分析水印、可迁移攻击与对抗防御的权衡,揭示三者至少存在其一 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页