cs.LG(2024-06-21)

📊 共 27 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (14) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
1 SAIL: Self-Improving Efficient Online Alignment of Large Language Models SAIL:通过自迭代在线对齐提升大型语言模型性能 reinforcement learning RLHF DPO
2 Robust Reinforcement Learning from Corrupted Human Feedback 提出R³M方法,通过建模稀疏异常值,提升RLHF在含噪声人类反馈下的鲁棒性。 reinforcement learning RLHF DPO
3 KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty KalMamba:面向不确定性强化学习的高效概率状态空间模型 reinforcement learning Mamba SSM
4 MU-Bench: A Multitask Multimodal Benchmark for Machine Unlearning MU-Bench:一个用于机器遗忘的多任务多模态综合基准测试平台 curriculum learning multimodal
5 Towards Dynamic Resource Allocation and Client Scheduling in Hierarchical Federated Learning: A Two-Phase Deep Reinforcement Learning Approach 提出一种双阶段深度强化学习框架,用于能量收集驱动的分层联邦学习中动态资源分配和客户端调度。 reinforcement learning deep reinforcement learning
6 Pareto-Optimal Learning from Preferences with Hidden Context 提出POPL算法,解决多人群偏好下的强化学习对齐问题,实现帕累托最优 reinforcement learning preference learning RLHF
7 Investigating the Transferability of Code Repair for Low-Resource Programming Languages 研究代码修复能力在低资源编程语言上的迁移性,揭示推理能力与代码修复能力的弱相关性。 distillation large language model chain-of-thought
8 Behaviour Distillation 提出行为蒸馏方法HaDES,仅用少量合成数据训练强化学习策略 reinforcement learning distillation
9 Open Problem: Order Optimal Regret Bounds for Kernel-Based Reinforcement Learning 针对基于核函数的强化学习,探索最优遗憾界限的开放性问题 reinforcement learning
10 Towards General Negotiation Strategies with End-to-End Reinforcement Learning 提出基于图神经网络的端到端强化学习方法,解决通用协商策略问题 reinforcement learning
11 From Overfitting to Robustness: Quantity, Quality, and Variety Oriented Negative Sample Selection in Graph Contrastive Learning 提出NegAmplify框架,通过累积样本选择解决图对比学习中的过拟合问题 contrastive learning
12 SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement Learning 提出对称不变Transformer(SiT),提升强化学习在MiniGrid和Procgen等环境中的泛化能力。 reinforcement learning
13 An Idiosyncrasy of Time-discretization in Reinforcement Learning 针对强化学习中时间离散化问题,提出一种改进方法以对齐连续时间与离散时间回报定义。 reinforcement learning
14 DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning 提出DN-CL,通过对比学习增强深度符号回归在噪声环境下的性能 contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
15 LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models LatentExplainer:利用多模态大语言模型解释深度生成模型中的隐变量 large language model multimodal
16 Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research Geneverse:用于基因组学和蛋白质组学研究的开源多模态大语言模型集合 large language model multimodal
17 Anime Popularity Prediction Before Huge Investments: a Multimodal Approach Using Deep Learning 提出一种基于深度学习的多模态方法,用于预测动漫作品的受欢迎程度。 multimodal
18 How Intermodal Interaction Affects the Performance of Deep Multimodal Fusion for Mixed-Type Time Series 针对混合类型时间序列,研究模态间交互对深度多模态融合性能的影响 multimodal
19 Data Efficient Evaluation of Large Language Models and Text-to-Image Models via Adaptive Sampling SubLIME:通过自适应采样实现大语言模型和文图模型的数据高效评估 large language model
20 AdaGrad under Anisotropic Smoothness 针对各向异性平滑,提出AdaGrad算法的加速收敛保证 foundation model instruction following
21 GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis GenoTEX:用于自动化基因表达数据分析的LLM Agent基准测试 large language model
22 Specify What? Enhancing Neural Specification Synthesis by Symbolic Methods 利用符号方法增强神经程序规约合成,提升C程序规约质量 large language model
23 Unlocking the Global Synergies in Low-Rank Adapters HeteroLoRA:利用零成本代理搜索,优化LoRA参数分配以提升大模型微调性能。 large language model
24 Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths 提出混合注意力跨度(MoA),优化LLM在长文本场景下的推理效率。 large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
25 Root Cause Analysis of Anomalies in 5G RAN Using Graph Neural Network and Transformer Simba:利用图神经网络和Transformer进行5G RAN异常根因分析 spatial relationship
26 FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection FT-AED:首个大规模高速公路异常事件早期检测基准数据集 spatial relationship

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
27 Tempora-Fusion: Time-Lock Puzzle with Efficient Verifiable Homomorphic Linear Combination 提出Tempora-Fusion,实现高效可验证同态线性组合时间锁难题 OMOMO

⬅️ 返回 cs.LG 首页 · 🏠 返回主页