cs.CL(2025-10-31)

📊 共 23 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (18 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (18 篇)

#题目一句话要点标签🔗
1 MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models MemeArena:自动化情境感知无偏评估多模态大语言模型对有害信息的理解 large language model multimodal
2 Self-HarmLLM: Can Large Language Model Harm Itself? 提出Self-HarmLLM,探索大语言模型利用自身输出来进行对抗攻击的潜在风险。 large language model
3 AgentBnB: A Browser-Based Cybersecurity Tabletop Exercise with Large Language Model Support and Retrieval-Aligned Scaffolding AgentBnB:基于浏览器、大语言模型支持的网络安全桌面演练系统 large language model
4 A Unified Representation Underlying the Judgment of Large Language Models 发现大语言模型中存在统一的Valence-Assent轴,揭示其对判断和推理的潜在影响 large language model
5 Characterizing Selective Refusal Bias in Large Language Models 揭示大型语言模型中选择性拒绝偏差,强调安全防护措施的公平性。 large language model
6 Continuous Autoregressive Language Models 提出CALM:通过连续向量预测,显著提升大语言模型的效率与性能。 large language model
7 SpecAttn: Speculating Sparse Attention SpecAttn:利用推测解码实现高效稀疏注意力机制,加速LLM推理 large language model
8 Languages are Modalities: Cross-Lingual Alignment via Encoder Injection 提出LLINK,通过编码器注入实现低资源语言LLM的跨语言对齐 large language model
9 Training LLMs Beyond Next Token Prediction -- Filling the Mutual Information Gap 提出基于互信息差距填充的LLM训练方法,提升模型性能。 large language model
10 EncouRAGe: Evaluating RAG Local, Fast, and Reliable EncouRAGe:一个用于评估RAG系统本地化、快速性和可靠性的Python框架 large language model
11 DialectalArabicMMLU: Benchmarking Dialectal Capabilities in Arabic and Multilingual Language Models DialectalArabicMMLU:构建阿拉伯语方言能力评测基准,评估LLM的方言理解能力。 large language model
12 Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs 提出IntAttn-Edit,通过平衡MLP和Attention模块更新,提升LLM知识编辑效果 large language model
13 Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs 提出LIGHT框架,增强LLM在长程对话中的记忆能力,并构建BEAM基准评测模型性能。 large language model
14 What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge HAUNT:提出对抗扰动框架,评估大型语言模型在虚构领域中的自洽性 large language model
15 Diffuse Thinking: Exploring Diffusion Language Models as Efficient Thought Proposers for Reasoning 提出Diffuse Thinking框架,利用扩散语言模型高效生成推理过程中的中间步骤,提升复杂推理任务性能。 large language model
16 Dynamic Affective Memory Management for Personalized LLM Agents 提出基于贝叶斯的动态情感记忆管理,提升个性化LLM Agent性能 large language model
17 ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations ThoughtProbe:通过探测表征,利用分类器引导LLM进行思维空间探索,提升推理性能。 large language model
18 Rating Roulette: Self-Inconsistency in LLM-As-A-Judge Frameworks 揭示LLM评判框架的自洽性问题:评分结果在不同运行中存在不一致性 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
19 VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision VCORE:一种方差控制的优化重加权方法,用于提升思维链监督效果 reinforcement learning large language model chain-of-thought
20 MedCalc-Eval and MedCalc-Env: Advancing Medical Calculation Capabilities of Large Language Models 提出MedCalc-Eval和MedCalc-Env,提升大语言模型在医疗计算任务中的能力。 reinforcement learning large language model
21 Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models 提出对比知识迁移与鲁棒优化方法,提升大语言模型安全对齐 distillation large language model
22 Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning 提出多轮强化学习框架,提升LLM在模拟人设对话中的一致性 reinforcement learning large language model
23 MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval 提出MARAG-R1,通过强化学习的多工具Agentic检索,提升LLM在复杂推理任务中的信息获取能力。 reinforcement learning large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页