cs.CL(2025-08-28)

📊 共 34 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (29 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (29 篇)

#题目一句话要点标签🔗
1 The Percept-V Challenge: Can Multimodal LLMs Crack Simple Perception Problems? 提出Percept-V数据集,评估多模态大语言模型在基础视觉感知任务上的能力 large language model multimodal
2 A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers 综述科学大语言模型:从数据基础到智能体前沿 large language model multimodal
3 Exploring Machine Learning and Language Models for Multimodal Depression Detection 探索机器学习与语言模型在多模态抑郁症检测中的应用 large language model multimodal
4 How Does Cognitive Bias Affect Large Language Models? A Case Study on the Anchoring Effect in Price Negotiation Simulations 研究表明大型语言模型在价格谈判中受锚定效应影响 large language model chain-of-thought
5 Leveraging Large Language Models for Generating Research Topic Ontologies: A Multi-Disciplinary Study 利用大型语言模型生成研究主题本体,解决跨学科知识组织难题。 large language model chain-of-thought
6 Quantifying Label-Induced Bias in Large Language Model Self- and Cross-Evaluations 揭示大语言模型评估中标签诱导的偏见,强调盲评的重要性 large language model
7 Lethe: Purifying Backdoored Large Language Models with Knowledge Dilution LETHE:利用知识稀释净化后门大语言模型 large language model
8 GDLLM: A Global Distance-aware Modeling Approach Based on Large Language Models for Event Temporal Relation Extraction 提出GDLLM,利用全局距离感知建模提升大语言模型在事件时序关系抽取中的性能 large language model
9 Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models 针对LLM隐写与水印中Token化不一致问题,提出阶梯验证与回滚方法 large language model
10 ConspirED: A Dataset for Cognitive Traits of Conspiracy Theories and Large Language Model Safety ConspirED:构建阴谋论认知特征数据集,评估大型语言模型安全性 large language model
11 CAPE: Context-Aware Personality Evaluation Framework for Large Language Models CAPE:提出上下文感知的LLM人格评估框架,解决现有方法忽略对话历史的问题。 large language model
12 Benchmarking GPT-5 for biomedical natural language processing 评估GPT-5在生物医学自然语言处理任务中的性能,揭示其优势与局限。 multimodal chain-of-thought
13 A Graph Talks, But Who's Listening? Rethinking Evaluations for Graph-Language Models 揭示图语言模型评估困境:现有基准不足以评估多模态推理能力 large language model multimodal
14 GUARD: Guideline Upholding Test through Adaptive Role-play and Jailbreak Diagnostics for LLMs GUARD:通过自适应角色扮演和越狱诊断提升LLM的合规性测试 large language model
15 On the Theoretical Limitations of Embedding-Based Retrieval 揭示基于嵌入检索的理论局限性:即使简单查询也可能失效 instruction following
16 Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection 提出Rank-One Safety Injection (ROSI),通过秩一权重修改增强LLM安全性对齐。 large language model
17 Decoding Memories: An Efficient Pipeline for Self-Consistency Hallucination Detection 提出解码记忆流水线DMP,加速自洽性幻觉检测并降低计算成本 large language model
18 BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design 提出BED-LLM以提升大语言模型的信息收集能力 large language model
19 ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents ProactiveEval:用于评估主动对话Agent的统一评估框架 large language model
20 CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection 提出CoCoNUTS基准和CoCoDet检测器,用于识别同行评审中AI生成的内容,关注内容而非风格。 large language model
21 Measuring Reasoning Utility in LLMs via Conditional Entropy Reduction 通过条件熵降低评估LLM推理效用,优化推理过程 large language model
22 An Agile Method for Implementing Retrieval Augmented Generation Tools in Industrial SMEs EASI-RAG:一种敏捷方法,用于在工业中小企业中部署检索增强生成工具 large language model
23 How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $τ$-bench 提出IRMA框架,通过输入重构显著提升LLM在动态环境中工具使用的准确性 large language model
24 Feel the Difference? A Comparative Analysis of Emotional Arcs in Real and LLM-Generated CBT Sessions 对比分析真实与LLM生成的CBT对话情感弧,揭示LLM在情感表达上的局限性 large language model
25 SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM SciTopic:利用大型语言模型增强科学文献主题发现,提升科研信息检索效率。 large language model
26 From Post To Personality: Harnessing LLMs for MBTI Prediction in Social Media 提出PostToPersonality框架,利用LLM进行社交媒体MBTI性格预测,缓解幻觉并解决数据不平衡问题 large language model
27 MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers MCP-Bench:通过MCP服务器评估LLM智能体在复杂真实世界任务中的工具使用能力 large language model
28 CAMB: A comprehensive industrial LLM benchmark on civil aviation maintenance 提出CAMB:一个全面的民用航空维护工业LLM基准测试 large language model
29 Joint Enhancement of Relational Reasoning for Long-Context LLMs 提出JERR框架,通过图推理增强长文本LLM的关系推理能力 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
30 SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement SageLM:用于语音评判的多方面可解释大型语言模型 reinforcement learning large language model
31 Prediction of mortality and resource utilization in critical care: a deep learning approach using multimodal electronic health records with natural language processing techniques 提出一种基于多模态EHR和NLP的深度学习框架,用于预测重症监护中的死亡率和资源利用。 MAE multimodal
32 Graph-R1: Unleashing LLM Reasoning with NP-Hard Graph Problems Graph-R1:利用NP-hard图问题提升LLM的推理能力 reinforcement learning reward design large language model
33 Improving Aviation Safety Analysis: Automated HFACS Classification Using Reinforcement Learning with Group Relative Policy Optimization 提出基于强化学习的HFACS自动分类框架,提升航空安全分析效率与准确性 reinforcement learning large language model
34 Adaptive Federated Distillation for Multi-Domain Non-IID Textual Data 提出自适应联邦蒸馏框架AdaFD,解决多领域非独立同分布文本数据的挑战。 distillation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页