cs.CL(2025-02-26)

📊 共 55 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (44 🔗10) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗4) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (44 篇)

#题目一句话要点标签🔗
1 Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? DeltaBench:评估大语言模型在长链式推理中错误检测能力 large language model chain-of-thought
2 DataMan: Data Manager for Pre-training Large Language Models DataMan:用于预训练大型语言模型的数据管理器,提升数据质量与领域混合。 large language model instruction following
3 Medical Hallucinations in Foundation Models and Their Impact on Healthcare 揭示医学领域大模型幻觉问题:通用模型优于专用模型,CoT推理显著缓解 foundation model chain-of-thought
4 Do Large Language Models Know How Much They Know? 评估大型语言模型知识范围:提出基准测试模型认知能力 large language model
5 Detecting Linguistic Indicators for Stereotype Assessment with Large Language Models 提出基于语言学指标的刻板印象评估方法,利用大语言模型检测文本中的刻板印象。 large language model
6 Binary Neural Networks for Large Language Model: A Survey 综述:面向大语言模型的二值神经网络技术 large language model
7 JailBench: A Comprehensive Chinese Security Assessment Benchmark for Large Language Models JailBench:首个全面的中文安全评估基准,用于评估大型语言模型的深层漏洞 large language model
8 When Large Language Models Meet Speech: A Survey on Integration Approaches 综述:探索大语言模型与语音融合的三种主要方法 large language model
9 Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique 利用共识评估技术,大语言模型在诗歌评估中超越非专家 large language model
10 MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering MEBench:用于跨文档多实体问答的大语言模型基准测试 large language model
11 Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models 提出PETAL:一种针对预训练大语言模型的仅标签成员推理攻击方法 large language model
12 Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance 提出Plutus-ben基准和Plutus-8B模型,填补了低资源希腊金融领域大语言模型研究的空白。 large language model
13 Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning 提出MoO方法,利用弱LLM的意见混合增强强LLM的数学推理能力 large language model chain-of-thought
14 A Causal Lens for Evaluating Faithfulness Metrics 提出因果诊断框架,评估自然语言解释忠实度指标的有效性 large language model chain-of-thought
15 Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs 提出LLMEvalDB,利用LLM加速文献分析,揭示前沿LLM的性能洞见 multimodal chain-of-thought
16 Stay Focused: Problem Drift in Multi-Agent Debate 提出DRIFTJudge和DRIFTPolicy,解决多智能体辩论中的问题漂移现象 large language model instruction following
17 Random Forest-of-Thoughts: Uncertainty-aware Reasoning for Computational Social Science 提出Random Forest-of-Thoughts (RFoT)方法,用于提升LLM在社会调查分析中的不确定性推理能力。 large language model chain-of-thought
18 Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs 综述:探索代码增强推理与推理驱动代码智能在大型语言模型中的协同作用 large language model
19 Shh, don't say that! Domain Certification in LLMs 提出VALID方法,为LLM在特定领域应用中提供输出域认证,保障模型安全性。 large language model
20 TestNUC: Enhancing Test-Time Computing Approaches and Scaling through Neighboring Unlabeled Data Consistency TestNUC:利用邻域未标注数据一致性提升测试时计算方法并实现线性扩展 large language model
21 Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs Amulet:测试时重对齐,实现LLM的个性化偏好适应 large language model
22 Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review 构建AI同行评审检测基准,揭示现有AI文本检测算法在评审场景下的局限性 large language model
23 Cognitive networks highlight differences and similarities in the STEM mindsets of human and LLM-simulated trainees, experts and academics 利用认知网络揭示人类与LLM在STEM思维模式上的异同 large language model
24 Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing 研究揭示LLM局部知识编辑中范数增长与稳定性挑战 large language model
25 BEYONDWORDS is All You Need: Agentic Generative AI based Social Media Themes Extractor 提出基于Agentic生成式AI的社交媒体主题提取方法,提升主题分析的深度和准确性。 chain-of-thought
26 Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning 提出低置信度黄金(LCG)框架,高效过滤指令微调数据集,提升大语言模型性能。 large language model
27 Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework 提出ARJudge框架,通过多维度评估对齐LLM评估能力,提升鲁棒性。 large language model
28 Revisiting Word Embeddings in the LLM Era 对比研究LLM与经典词嵌入,揭示LLM时代词嵌入的优势与局限 large language model
29 Where Are We? Evaluating LLM Performance on African Languages 评估LLM在非洲语言上的性能,揭示数据偏差对模型效果的影响 large language model
30 Learning Code-Edit Embedding to Model Student Debugging Behavior 提出基于代码编辑嵌入的模型,用于建模学生调试行为并提供个性化代码建议。 large language model
31 Negation-Induced Forgetting in LLMs 研究发现部分大型语言模型存在否定诱导遗忘现象 large language model
32 Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation 提出Bi'an,一个双语基准和模型,用于检索增强生成中的幻觉检测。 large language model
33 BIG-Bench Extra Hard 提出BIG-Bench Extra Hard (BBEH)基准,用于评估LLM更高级的通用推理能力。 large language model
34 Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval 提出PseudoEval基准测试,用于分离评估LLM的代码能力与问题解决能力 large language model
35 Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization 提出PKUE,通过增强精确知识利用能力缓解大语言模型的事实性幻觉问题 large language model
36 LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm LongEval:提出基于规划范式的长文本生成综合评估基准 large language model
37 Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs 提出CLADA框架以解决大语言模型的效率瓶颈问题 large language model
38 IndicEval-XL: Bridging Linguistic Diversity in Code Generation Across Indic Languages IndicEval-XL:构建跨印度语代码生成的多语言评测基准 large language model
39 MathClean: A Benchmark for Synthetic Mathematical Data Cleaning 提出MathClean基准,用于评估数学数据清洗模型的有效性。 large language model
40 GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation GenTool:通过零到一和弱到强模拟增强语言模型中的工具泛化能力 large language model
41 TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation TokenSwift:超长序列生成无损加速框架,提升LLM生成效率 large language model
42 Active Few-Shot Learning for Text Classification 提出基于主动学习的少样本文本分类方法,提升LLM在有限标注数据下的性能 large language model
43 Towards Optimal Multi-draft Speculative Decoding 提出基于最优传输理论的多Draft推测解码效率分析与优化方法 large language model
44 A Survey of Automatic Prompt Optimization with Instruction-focused Heuristic-based Search Algorithm 综述:基于指令的启发式搜索算法的自动Prompt优化方法 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
45 When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning 提出多维度评估框架,分析个性化偏好学习在LLM中的有效性、公平性和安全性。 reinforcement learning preference learning RLHF
46 PEToolLLM: Towards Personalized Tool Learning in Large Language Models PEToolLLM:提出个性化工具学习框架,提升大语言模型在个性化场景下的工具使用能力 direct preference optimization large language model
47 Sliding Window Attention Training for Efficient Large Language Models 提出SWAT:通过滑动窗口注意力训练实现高效的大型语言模型 state space model large language model
48 Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models ConsJudge:利用大语言模型的判断一致性提升检索增强生成模型的评估 DPO large language model
49 Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents 利用小模型制胜:知识蒸馏与自训练降低产品问答Agent的幻觉 distillation large language model
50 Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems 提出Agentic Reward Modeling,融合人类偏好与可验证正确性信号,提升奖励系统可靠性 DPO large language model instruction following
51 Learning to Generate Structured Output with Schema Reinforcement Learning 提出Schema强化学习,提升大语言模型生成符合JSON Schema结构化输出的能力。 reinforcement learning large language model
52 Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? 提出基于自监督奖励学习的知识蒸馏方法,使小模型超越大模型 reinforcement learning distillation large language model
53 Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time 提出DARS双模型反射评分框架,提升自动学生答案评分的性能与可解释性。 reinforcement learning large language model
54 Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles 提出USP框架,通过隐式用户画像建模类人用户模拟器,提升对话真实性和多样性。 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
55 Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization 评估大型语言模型在多语言和地理环境下的仇恨言论检测能力 manipulation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页