cs.CL(2025-05-19)

📊 共 38 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (31 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (31 篇)

#题目一句话要点标签🔗
1 RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning RBF++:量化和优化CoT推理中可测量与不可测量能力的推理边界 large language model multimodal chain-of-thought
2 KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 KIT提出利用LLM增强的离线语音翻译和指令跟随系统,提升性能。 large language model instruction following
3 FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models FlightGPT:基于视觉-语言模型的通用且可解释的无人机视觉-语言导航 VLN multimodal chain-of-thought
4 Are Large Language Models Good at Detecting Propaganda? 评估大型语言模型在新闻宣传检测中的能力,结果表明其性能未超越RoBERTa-CRF基线。 large language model
5 Krikri: Advancing Open Large Language Models for Greek Krikri:面向希腊语的开源大型语言模型,显著提升希腊语理解与生成能力 large language model
6 Simulation Agent: A Framework for Integrating Simulation and Large Language Models for Enhanced Decision-Making 提出Simulation Agent框架,融合仿真与大语言模型以增强决策能力 large language model
7 From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery 综述性论文:大型语言模型赋能科学发现,从自动化工具到自主科研智能体 large language model
8 SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science SeedBench:面向种子科学领域大语言模型的多任务评测基准 large language model
9 ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models ToolSpectrum:面向大语言模型的个性化工具利用基准 large language model
10 Role-Playing Evaluation for Large Language Models 提出RPEval基准,用于评估大型语言模型在角色扮演中的能力 large language model
11 The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation 通过控制实验揭示语言多样性对LLM翻译微调的影响,并发现适度多样性提升翻译质量 large language model
12 Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset 利用多模态语音特征进行自杀风险评估,基于SW1挑战数据集。 multimodal
13 An Empirical Study of Many-to-Many Summarization with Large Language Models 系统性研究大型语言模型在多语种文档摘要任务中的能力,揭示指令调优的优势与事实性挑战。 large language model
14 I'll believe it when I see it: Images increase misinformation sharing in Vision-Language Models 图像增强视觉-语言模型中的虚假信息传播:一项关于图像影响力的研究 large language model multimodal
15 SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information SAKURA:评估大型音频语言模型基于语音和音频信息的多跳推理能力 large language model multimodal
16 Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning 提出TREA数据集并评估LALM时序推理能力,同时提出不确定性度量方法。 large language model multimodal
17 MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix 提出MMAR:一个用于评估音频-语言模型深度推理能力的挑战性基准 large language model chain-of-thought
18 SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs SQLForge:合成可靠且多样的数据以增强LLM在Text-to-SQL推理中的能力 large language model
19 Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading 评估GPT在盲评下基于证明的大学课程中的表现 large language model
20 Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents 提出引导搜索策略以解决非可序列化环境中的软件工程问题 large language model
21 What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts 针对LLM提示词欠规范问题,提出需求感知的优化方法,提升模型稳定性和性能。 instruction following
22 Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning 揭示语义召回对长上下文代码推理的影响,提出SemTrace基准测试LLM的语义理解能力。 large language model
23 GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection 提出GUARD:一种基于自适应限制和检测的生成时LLM知识遗忘框架 large language model
24 Rank, Chunk and Expand: Lineage-Oriented Reasoning for Taxonomy Expansion LORex:提出一种面向谱系的推理框架,用于高效扩展分类体系。 PaLM-E
25 What's in a prompt? Language models encode literary style in prompt embeddings 语言模型Prompt嵌入蕴含文学风格信息,可用于作者归属分析 large language model
26 RAR: Setting Knowledge Tripwires for Retrieval Augmented Rejection RAR:通过检索增强拒绝机制为大型语言模型设置知识陷阱,实现内容审核。 large language model
27 HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding HeteroSpec:利用上下文异质性实现高效推测解码,显著提升LLM推理速度。 large language model
28 Are LLMs Better Formalizers than Solvers on Complex Problems? 针对复杂约束满足问题,LLM作为形式化器性能不如直接求解器 large language model
29 Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks 揭示LLM的位置脆弱性:偏移效应如何影响记忆风险认知 large language model
30 What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text 质疑文本欺骗检测的可靠性:跨语言研究揭示语言线索的局限性 large language model
31 Language-Specific Latent Process Hinders Cross-Lingual Performance 揭示语言特定隐变量阻碍跨语言性能,提出引导方法提升小模型跨语言推理能力 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
32 MR. Judge: Multimodal Reasoner as a Judge 提出MR. Judge,增强多模态大语言模型作为评判者的推理能力 RLHF large language model multimodal
33 SMOTExT: SMOTE meets Large Language Models SMOTExT:结合SMOTE与大语言模型,解决NLP模型训练中的数据稀缺和类别不平衡问题 distillation large language model
34 Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry 提出方言增强的多模态框架,用于提升古诗词情感分析的准确性。 representation learning multimodal
35 CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning 提出CSC-SQL,通过强化学习优化自洽性和自校正,提升Text-to-SQL的准确率。 reinforcement learning large language model
36 Thinkless: LLM Learns When to Think Thinkless:LLM学习何时思考,自适应选择推理模式提升效率 reinforcement learning chain-of-thought
37 J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization 提出EIS-GRPO算法,训练J4R模型,提升LLM在推理场景下的自动评估能力 reinforcement learning large language model
38 Transparent and Robust RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability 提出ARENA框架,通过自适应奖励强化学习实现透明且鲁棒的RAG生成。 reinforcement learning

⬅️ 返回 cs.CL 首页 · 🏠 返回主页