cs.CL(2025-05-30)

📊 共 73 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (53 🔗10) 支柱二:RL算法与架构 (RL & Architecture) (19 🔗4) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (53 篇)

#题目一句话要点标签🔗
1 PersianMedQA: Evaluating Large Language Models on a Persian-English Bilingual Medical Question Answering Benchmark 提出PersianMedQA,用于评估大型语言模型在波斯语-英语双语医学问答中的表现 large language model instruction following chain-of-thought
2 Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models 揭示并缓解LLM在SDOH抽取中存在的虚假相关性和捷径学习问题 large language model chain-of-thought
3 Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings 针对中文毒性内容检测,提出多模态扰动分类体系并构建基准评测LLM large language model multimodal
4 Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities 评估大型语言模型在密码分析和侧信道漏洞中的表现 large language model chain-of-thought
5 When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways 提出EVOKE基准,评估多模态大模型在演进知识注入中的能力与挑战。 multimodal instruction following
6 MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs 提出MMAFFBen多语言多模态情感分析基准,用于评估LLM和VLM large language model multimodal
7 Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation 提出GSTransform,通过引导空间变换实现高效的指令跟随文本嵌入。 instruction following
8 Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation 揭示多模态RAG中的位置偏差,提出位置敏感性指标并分析其对性能的影响 multimodal
9 Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks 多语言并非提升词义理解任务零样本迁移的关键,数据和评估更重要 zero-shot transfer
10 Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty? 研究表明大型语言模型的认知标记在分布外场景下无法准确反映其不确定性 large language model
11 HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America 提出HESEIA数据集,用于评估大型语言模型在拉丁美洲学校环境中的社会偏见。 large language model
12 Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration 提出Soft Reasoning框架,通过可控嵌入探索提升大语言模型复杂推理能力 large language model
13 TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis TRIDENT:通过三维多样化红队数据合成增强大型语言模型的安全性 large language model
14 Disentangling Language and Culture for Evaluating Multilingual Large Language Models 提出双重评估框架,解耦语言和文化因素,更全面评估多语言大模型的性能。 large language model
15 Harnessing Large Language Models for Scientific Novelty Detection 利用大型语言模型进行科学新颖性检测,并构建相关基准数据集。 large language model
16 CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation CaMMT:构建文化感知多模态机器翻译的基准数据集 multimodal
17 Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts 对比捐赠数据与生成数据,评估情感识别多模态社交媒体内容的数据收集策略。 multimodal
18 Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model 提出一种多语种无词汇手语翻译模型,支持多种手语互译。 foundation model
19 Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research AGORA:基于图编排引擎的统一语言Agent算法框架,促进可复现研究 large language model multimodal chain-of-thought
20 Advantageous Parameter Expansion Training Makes Better Large Language Models APEX:通过优势参数扩展训练提升大语言模型性能 large language model
21 Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models 重新审视LLM误差累积:关注关键Token以突破长序列性能瓶颈 large language model
22 Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games 在最后通牒博弈中,利用心智理论和亲社会信念引导LLM实现人类对齐行为 large language model chain-of-thought
23 FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation FinMME:金融多模态推理评估基准数据集,填补金融领域多模态评测空白。 large language model multimodal
24 LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text LegalEval-Q:提出法律领域LLM质量评估基准,关注清晰度、连贯性和术语准确性 large language model
25 Lossless Token Sequence Compression via Meta-Tokens 提出基于Meta-Tokens的无损压缩方法,降低LLM输入序列长度并加速编码。 large language model
26 Model Unlearning via Sparse Autoencoder Subspace Guided Projections 提出SSPU,利用稀疏自编码器子空间投影实现大模型的可解释、鲁棒性知识遗忘。 large language model
27 HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs 提出HD-NDEs,利用神经微分方程检测LLM中的幻觉问题 large language model
28 An evaluation of LLMs for generating movie reviews: GPT-4o, Gemini-2.0 and DeepSeek-V3 评估大型语言模型在电影评论生成中的表现:GPT-4o、Gemini-2.0 和 DeepSeek-V3 large language model
29 Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings 提出基于多语言Matryoshka嵌入的分层新闻文章聚类方法,提升可扩展性和可解释性。 large language model
30 Multiple LLM Agents Debate for Equitable Cultural Alignment 提出多智能体辩论框架,提升LLM在不同文化背景下的适应性和公平性 large language model
31 Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX 提出POLLUX:一个用于评估俄语LLM生成能力的综合性开源基准。 large language model
32 Bench4KE: Benchmarking Automated Competency Question Generation Bench4KE:用于自动胜任力问题生成的基准测试系统 large language model
33 Cross-Attention Speculative Decoding 提出基于交叉注意力的推测解码模型Beagle,简化架构并提升训练效率。 large language model
34 Localizing Persona Representations in LLMs 研究大型语言模型中人格表征的定位与编码方式 large language model
35 Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations 提出基于LLM与定理证明器的NLI解释框架,提升忠实性和鲁棒性 large language model
36 COSMIC: Generalized Refusal Direction Identification in LLM Activations COSMIC:基于LLM激活空间的通用拒绝方向识别方法 large language model
37 LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing 提出LKD-KGC框架,通过LLM驱动的知识依赖解析构建领域知识图谱。 large language model
38 CASPER: A Large Scale Spontaneous Speech Dataset CASPER:一个大规模自发语音数据集,旨在解决高质量自发语音数据稀缺问题。 large language model
39 MultiHoax: A Dataset of Multi-hop False-Premise Questions 提出MultiHoax数据集,用于评估LLM在多跳推理中对错误前提的检测能力 large language model
40 The Impact of Disability Disclosure on Fairness and Bias in LLM-Driven Candidate Selection 研究揭示LLM驱动的候选人筛选中,残疾披露信息对公平性和偏见的影响 large language model
41 Guiding Generative Storytelling with Knowledge Graphs 提出知识图谱辅助的生成式故事叙述框架,提升长文本连贯性和用户可控性。 large language model
42 From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning 探究语言模型微调中数据集多样性:从宏观到微观的分析框架 large language model
43 BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization 提出SCRIPT编码,增强BPE在多语言预分词中的鲁棒性,避免非西方文字的惩罚。 large language model
44 A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings 提出A*-Thought以解决低资源环境下推理效率问题 chain-of-thought
45 Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections 提出一种AI工具,用于检测文化遗产数据中的有害语言并提供语境信息。 large language model
46 ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation 提出ClueAnchor,通过线索锚定的知识推理探索与优化增强检索增强生成。 large language model
47 LLM Inference Enhanced by External Knowledge: A Survey 综述:利用外部知识增强大语言模型推理能力 large language model
48 HiCaM: A Hierarchical-Causal Modification Framework for Long-Form Text Modification 提出HiCaM框架,通过层级因果关系建模改进长文本修改任务 large language model
49 Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation 提出并评估多种数据泄露检测方法,用于提升LLM评测基准的可靠性 large language model
50 Semi-structured LLM Reasoners Can Be Rigorously Audited 提出半结构化推理模型以解决大语言模型可审计性问题 large language model
51 CLaSp: In-Context Layer Skip for Self-Speculative Decoding CLaSp:提出一种上下文层跳跃的自推测解码方法,加速LLM推理。 large language model
52 CrossICL: Cross-Task In-Context Learning via Unsupervised Demonstration Transfer 提出CrossICL,通过无监督示例迁移实现跨任务上下文学习。 large language model
53 R-KV: Redundancy-aware KV Cache Compression for Reasoning Models R-KV:面向推理模型,提出冗余感知的KV缓存压缩方法 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (19 篇)

#题目一句话要点标签🔗
54 Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models 提出Mixed-R1框架,统一多模态大语言模型推理能力的奖励视角 reinforcement learning reward design large language model
55 Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling 提出基于直觉模糊集的LLM数据标注方法,提升偏好标注质量 reinforcement learning RLHF DPO
56 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models ProRL:通过长期强化学习拓展大语言模型的推理边界 reinforcement learning large language model
57 Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation 研究持续预训练中英语数据对大语言模型涌现能力的影响,并提出改进方法。 curriculum learning large language model
58 A Simple Linear Patch Revives Layer-Pruned Large Language Models 提出LinearPatch以解决层修剪模型性能下降问题 distillation large language model
59 Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion LLM表征在CoT完成前已编码推理成功信息,可用于提前预测 reinforcement learning chain-of-thought
60 Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models 提出FSPO:通过事实性感知强化学习,减少大语言推理模型中的幻觉问题 reinforcement learning large language model
61 TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence 提出时间感知的分层认知强化学习(TimeHC-RL),提升LLM的社会智能 reinforcement learning large language model
62 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning 提出基于自反思和强化学习的LLM自提升方法,解决复杂任务中合成数据不可行的问题。 reinforcement learning large language model
63 DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning DeepDiver:通过开放网络强化学习自适应调整搜索强度,提升LLM开放域问答能力 reinforcement learning large language model
64 Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning EHRMIND:利用强化学习训练LLM进行电子病历推理任务 reinforcement learning large language model
65 Mamba Knockout for Unraveling Factual Information Flow 利用Mamba Knockout方法解析Mamba模型中的事实信息流动 Mamba SSM
66 Proactive Guidance of Multi-Turn Conversation in Industrial Search 提出双阶段框架,用于工业搜索中多轮对话的主动引导,提升用户交互体验。 reinforcement learning distillation large language model
67 Efficient Text Encoders for Labor Market Analysis 提出ConTeXT-match,一种高效文本编码器,用于劳动力市场分析中的技能提取。 contrastive learning large language model
68 CREFT: Sequential Multi-Agent LLM for Character Relation Extraction CREFT:用于角色关系抽取的序列多智能体LLM框架 distillation large language model
69 Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios 提出语音到文本翻译的音素增强CoT方法,提升低资源场景下的跨语言迁移能力。 curriculum learning chain-of-thought
70 Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards 提出Writing-Zero,弥合非验证任务与可验证奖励之间的差距,提升LLM写作能力。 reinforcement learning large language model
71 GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training GATE:面向阿拉伯语的通用文本嵌入,通过Matryoshka表示学习和混合损失训练提升语义文本相似度 representation learning
72 HardTests: Synthesizing High-Quality Test Cases for LLM Coding HARDTESTGEN:利用LLM合成高质量测试用例,提升代码生成模型验证精度 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
73 Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors 提出一种基于风格迁移的对抗攻击方法,用于评估和提升机器生成文本检测器的鲁棒性。 manipulation DPO direct preference optimization

⬅️ 返回 cs.CL 首页 · 🏠 返回主页