cs.CL(2025-05-30)

📊 共 73 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (53 🔗10) 支柱二:RL算法与架构 (RL & Architecture) (19 🔗4) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (53 篇)

#题目一句话要点标签🔗
1 PersianMedQA: Evaluating Large Language Models on a Persian-English Bilingual Medical Question Answering Benchmark 提出PersianMedQA以评估双语医疗问答中的大型语言模型 large language model instruction following chain-of-thought
2 Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models 提出新方法以解决社会健康决定因素提取中的快捷学习问题 large language model chain-of-thought
3 Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings 提出多模态挑战分类以提升中文毒性检测能力 large language model multimodal
4 Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities 评估大型语言模型在密码分析与侧信道漏洞中的应用潜力 large language model chain-of-thought
5 When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways 提出EVOKE基准以解决多模态模型知识演变问题 multimodal instruction following
6 MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs 提出MMAFFBen基准以解决多语言多模态情感分析评估问题 large language model multimodal
7 Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation 提出GSTransform以解决指令跟随文本嵌入的计算开销问题 instruction following
8 Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation 提出位置敏感性指数以解决多模态RAG系统中的偏见问题 multimodal
9 Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks 探讨多语言性对零-shot迁移的影响,提出新见解 zero-shot transfer
10 Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty? 提出标记信心评估方法以解决LLM不确定性问题 large language model
11 HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America 提出HESEIA数据集以评估语言模型中的社会偏见 large language model
12 Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration 提出Soft Reasoning以解决大语言模型推理能力不足的问题 large language model
13 TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis 提出TRIDENT以增强大型语言模型的安全性 large language model
14 Disentangling Language and Culture for Evaluating Multilingual Large Language Models 提出双重评估框架以评估多语言大语言模型的能力 large language model
15 Harnessing Large Language Models for Scientific Novelty Detection 利用大型语言模型解决科学新颖性检测问题 large language model
16 CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation 提出CaMMT基准以解决文化内容翻译中的多模态挑战 multimodal
17 Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts 比较数据收集策略以优化情感标注的多模态社交媒体帖子 multimodal
18 Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model 提出多语言无注释手语翻译模型以解决低资源问题 foundation model
19 Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research 提出AGORA框架以解决语言代理开发中的标准化与评估问题 large language model multimodal chain-of-thought
20 Advantageous Parameter Expansion Training Makes Better Large Language Models 提出优势参数扩展训练以提升大语言模型性能 large language model
21 Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models 提出新框架以重塑大型语言模型的错误累积理解 large language model
22 Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games 探讨心智理论与利他信念对LLM人类行为对齐的影响 large language model chain-of-thought
23 FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation 提出FinMME数据集以解决金融领域多模态评估不足问题 large language model multimodal
24 LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text 提出LegalEval-Q以解决法律文本生成质量评估问题 large language model
25 Lossless Token Sequence Compression via Meta-Tokens 提出无损令牌序列压缩方法以优化大语言模型性能 large language model
26 Model Unlearning via Sparse Autoencoder Subspace Guided Projections 提出SAE引导的子空间投影去学习方法以解决隐私问题 large language model
27 HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs 提出HD-NDEs以解决大语言模型中的幻觉检测问题 large language model
28 An evaluation of LLMs for generating movie reviews: GPT-4o, Gemini-2.0 and DeepSeek-V3 提出框架评估LLMs生成电影评论的有效性 large language model
29 Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings 提出多语言Matryoshka嵌入以解决新闻文章聚类问题 large language model
30 Multiple LLM Agents Debate for Equitable Cultural Alignment 提出多代理辩论框架以促进文化适应性 large language model
31 Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX 提出POLLUX以评估俄语LLM的生成能力 large language model
32 Bench4KE: Benchmarking Automated Competency Question Generation 提出Bench4KE以解决知识工程自动化评估标准化问题 large language model
33 Cross-Attention Speculative Decoding 提出跨注意力推测解码以简化大语言模型推理 large language model
34 Localizing Persona Representations in LLMs 研究如何在大型语言模型中定位个性化表征 large language model
35 Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations 提出LLM驱动的定理证明方法以提升NLI解释的可靠性与稳健性 large language model
36 COSMIC: Generalized Refusal Direction Identification in LLM Activations 提出COSMIC框架以自动识别大型语言模型中的拒绝行为 large language model
37 LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing 提出LKD-KGC以解决领域特定知识图谱构建的效率问题 large language model
38 CASPER: A Large Scale Spontaneous Speech Dataset 提出CASPER数据集以解决自发语音数据稀缺问题 large language model
39 MultiHoax: A Dataset of Multi-hop False-Premise Questions 提出MultiHoax数据集以解决多跳错误前提问题 large language model
40 The Impact of Disability Disclosure on Fairness and Bias in LLM-Driven Candidate Selection 探讨残疾信息披露对LLM驱动候选人选择的公平性影响 large language model
41 Guiding Generative Storytelling with Knowledge Graphs 提出知识图谱辅助的故事生成方法以提升叙事质量 large language model
42 From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning 提出数据集多样性控制策略以提升语言模型微调效果 large language model
43 BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization 提出SCRIPT以解决多语言预标记化中的挑战 large language model
44 A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings 提出A*-Thought以解决低资源环境下推理效率问题 chain-of-thought
45 Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections 提出AI工具以检测和上下文化遗产中的有害语言 large language model
46 ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation 提出ClueAnchor以解决RAG系统知识提取不足问题 large language model
47 LLM Inference Enhanced by External Knowledge: A Survey 通过外部知识增强LLM推理能力以解决推理准确性问题 large language model
48 HiCaM: A Hierarchical-Causal Modification Framework for Long-Form Text Modification 提出HiCaM框架以解决长文本修改中的内容不一致问题 large language model
49 Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation 提出数据泄漏模拟方法以提升LLM评估的透明性 large language model
50 Semi-structured LLM Reasoners Can Be Rigorously Audited 提出半结构化推理模型以解决大型语言模型的可审计性问题 large language model
51 CLaSp: In-Context Layer Skip for Self-Speculative Decoding 提出CLaSp以解决自我推测解码中的层跳过问题 large language model
52 CrossICL: Cross-Task In-Context Learning via Unsupervised Demonstration Transfer 提出CrossICL以解决无监督示范转移的任务间学习问题 large language model
53 R-KV: Redundancy-aware KV Cache Compression for Reasoning Models 提出R-KV以解决推理模型中的冗余KV缓存压缩问题 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (19 篇)

#题目一句话要点标签🔗
54 Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models 提出Mixed-R1框架以解决多模态大语言模型的推理能力问题 reinforcement learning reward design large language model
55 Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling 提出基于直觉模糊集的偏好标注方法以解决人类偏好数据不确定性问题 reinforcement learning RLHF DPO
56 ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models 提出ProRL以扩展大语言模型的推理能力 reinforcement learning large language model
57 Emergent Abilities of Large Language Models under Continued Pretraining for Language Adaptation 提出继续预训练方法以增强语言适应能力 curriculum learning large language model
58 A Simple Linear Patch Revives Layer-Pruned Large Language Models 提出LinearPatch以解决层修剪模型性能下降问题 distillation large language model
59 Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion 提出基于LLM表示的分类器以预测Chain-of-Thought成功性 reinforcement learning chain-of-thought
60 Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models 提出事实意识的逐步策略优化以解决推理模型幻觉问题 reinforcement learning large language model
61 TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence 提出TimeHC-RL以提升大语言模型的社会智能 reinforcement learning large language model
62 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning 提出自我反思与强化学习以提升大型语言模型性能 reinforcement learning large language model
63 DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning 提出DeepDiver以解决开放网络问答中的信息获取问题 reinforcement learning large language model
64 Training LLMs for EHR-Based Reasoning Tasks via Reinforcement Learning 提出EHRMIND以解决电子健康记录推理任务中的知识应用问题 reinforcement learning large language model
65 Mamba Knockout for Unraveling Factual Information Flow 提出Mamba Knockout以揭示事实信息流动机制 Mamba SSM
66 Proactive Guidance of Multi-Turn Conversation in Industrial Search 提出双阶段框架以主动引导工业搜索中的多轮对话 reinforcement learning distillation large language model
67 Efficient Text Encoders for Labor Market Analysis 提出ConTeXT-match以提升劳动市场分析的技能提取效率 contrastive learning large language model
68 CREFT: Sequential Multi-Agent LLM for Character Relation Extraction 提出CREFT以解决复杂角色关系提取问题 distillation large language model
69 Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios 提出音素增强的思维链以解决低资源语言翻译问题 curriculum learning chain-of-thought
70 Writing-Zero: Bridge the Gap Between Non-verifiable Tasks and Verifiable Rewards 提出Writing-Zero以解决非可验证任务与可验证奖励之间的差距问题 reinforcement learning large language model
71 GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training 提出GATE模型以提升阿拉伯语语义文本相似性 representation learning
72 HardTests: Synthesizing High-Quality Test Cases for LLM Coding 提出HARDTESTGEN以解决LLM编码问题的高质量测试用例合成 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
73 Stress-testing Machine Generated Text Detection: Shifting Language Models Writing Style to Fool Detectors 提出一种新方法以提高机器生成文本检测的鲁棒性 manipulation DPO direct preference optimization

⬅️ 返回 cs.CL 首页 · 🏠 返回主页