cs.LG（2025-05-12）

📊 共 30 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (14 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (10) 支柱八：物理动画 (Physics-based Animation) (4) 支柱五：交互与反应 (Interaction & Reaction) (2)

🔬 支柱二：RL算法与架构 (RL & Architecture) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains	提出缓存高效的后验采样框架，加速LLM先验强化学习在离散和连续域的应用	reinforcement learning offline RL CQL
2	RLSR: Reinforcement Learning from Self Reward	提出RLSR：利用自奖励的强化学习，提升LLM在复杂问题求解中的能力。	reinforcement learning large language model
3	Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review	综述：结合贝叶斯推断与强化学习的智能体决策方法	reinforcement learning policy learning model-based RL
4	Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization	提出双头优化(DHO)，通过视觉-语言模型的知识蒸馏实现高效半监督学习	distillation	✅
5	An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits	仅需额外RMSNorm即可微调至1.58比特量化精度	distillation large language model
6	A Theoretical Framework for Explaining Reinforcement Learning with Shapley Values	提出SVERL框架，利用Shapley值解释强化学习智能体的行为、结果和预测。	reinforcement learning
7	MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering	MLE-Dojo：交互式环境赋能LLM智能体进行机器学习工程	reinforcement learning large language model
8	Self-Supervised Transformer-based Contrastive Learning for Intrusion Detection Systems	提出基于Transformer的自监督对比学习入侵检测系统，提升泛化能力。	contrastive learning
9	EAGLE: Contrastive Learning for Efficient Graph Anomaly Detection	EAGLE：基于对比学习的高效图异常检测模型，适用于异构图。	contrastive learning
10	Online Episodic Convex Reinforcement Learning	提出在线情景凸强化学习算法，解决具有凸目标函数的MDP在线学习问题	reinforcement learning
11	INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning	INTELLECT-2：通过全球分布式强化学习训练的320亿参数推理模型	reinforcement learning
12	REMEDI: Relative Feature Enhanced Meta-Learning with Distillation for Imbalanced Prediction	REMEDI：结合相对特征增强的元学习与蒸馏，解决极度不平衡预测问题	distillation
13	Representation Learning with Mutual Influence of Modalities for Node Classification in Multi-Modal Heterogeneous Networks	提出HGNN-IMA模型，通过模态互影响学习提升多模异构网络节点分类性能	representation learning
14	VoI-Driven Joint Optimization of Control and Communication in Vehicular Digital Twin Network	提出基于信息价值驱动的车辆数字孪生网络控制与通信联合优化框架	reinforcement learning deep reinforcement learning DRL

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Symbolic Regression with Multimodal Large Language Models and Kolmogorov Arnold Networks	提出基于多模态大语言模型和Kolmogorov Arnold网络的符号回归方法	large language model multimodal
16	Multimodal Cancer Modeling in the Age of Foundation Model Embeddings	提出基于Foundation Model嵌入的多模态癌症建模方法，提升癌症生存预测性能。	foundation model multimodal
17	Assessing the Chemical Intelligence of Large Language Models	ChemIQ：评估大型语言模型在有机化学推理能力的新基准	large language model
18	SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models	SpecRouter：面向大语言模型多级推测解码的自适应路由框架	large language model
19	Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models	提出直接密度比优化(DDRO)方法，实现大语言模型与人类偏好更可靠的对齐	large language model
20	Injecting Knowledge Graphs into Large Language Models	提出一种将知识图谱注入大语言模型的方法，提升符号推理能力。	large language model
21	Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders	提出梯度稀疏自编码器(GradSAE)，通过梯度信息识别大语言模型中具有影响力的隐变量。	large language model
22	TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining	TACOS：用于语言-音频预训练的时序对齐音频字幕数据集	large language model
23	LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning	LEAD：一种高效的LLM指令调优迭代数据选择框架，无需额外模型推理。	large language model
24	Uncertainty Profiles for LLMs: Uncertainty Source Decomposition and Adaptive Model-Metric Selection	提出LLM不确定性分解框架，实现任务自适应的模型与指标选择，提升可靠性。	large language model

🔬 支柱八：物理动画 (Physics-based Animation) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
25	The Geography of Transportation Cybersecurity: Visitor Flows, Industry Clusters, and Spatial Dynamics	提出BiTransGCN框架，预测交通网络安全产业集群的访客流量和空间动态。	spatiotemporal
26	Self-cross Feature based Spiking Neural Networks for Efficient Few-shot Learning	提出基于自交叉特征的脉冲神经网络，用于高效小样本学习	spatiotemporal
27	Joint Graph Convolution and Sequential Modeling for Scalable Network Traffic Estimation	提出基于图卷积和序列建模的交通流量预测方法，提升复杂网络环境下的预测精度。	spatiotemporal
28	EnvCDiff: Joint Refinement of Environmental Information and Channel Fingerprints via Conditional Generative Diffusion Model	EnvCDiff：利用条件生成扩散模型联合优化环境信息和信道指纹	diff-sim

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
29	Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption	提出基于同态加密的私有LoRA微调方案，保护LLM训练数据隐私	OMOMO large language model
30	Latent Behavior Diffusion for Sequential Reaction Generation in Dyadic Setting	提出潜变量行为扩散模型，用于生成对话场景中更自然的面部反应	reaction synthesis

⬅️ 返回 cs.LG 首页 · 🏠 返回主页