cs.CL(2025-02-20)

📊 共 84 篇论文 | 🔗 19 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (71 🔗16) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗3) 支柱一:机器人控制 (Robot Control) (3)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (71 篇)

#题目一句话要点标签🔗
1 Reducing Hallucinations of Medical Multimodal Large Language Models with Visual Retrieval-Augmented Generation 提出视觉检索增强生成(V-RAG)框架,减少医疗多模态大语言模型中的幻觉问题 large language model multimodal
2 A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics 综述:基于反馈的多步推理提升大语言模型数学能力 large language model chain-of-thought
3 Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models 提出U-SafeBench,评估大语言模型在用户特定安全标准下的表现 large language model chain-of-thought
4 AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO AlphaMaze:利用GRPO提升大语言模型在迷宫导航中的空间智能 large language model chain-of-thought
5 StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following 提出StructFlowBench,用于评估LLM在多轮指令跟随中的结构化流程理解能力 large language model instruction following
6 InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback 提出InterFeedback框架,评估大型多模态模型与人类交互的智能水平 multimodal
7 Harnessing PDF Data for Improving Japanese Large Multimodal Models 利用PDF数据增强日语大型多模态模型,提升日语文化知识理解能力 multimodal
8 Obliviate: Efficient Unmemorization for Protecting Intellectual Property in Large Language Models Obliviate:一种高效的遗忘方法,用于保护大型语言模型中的知识产权 large language model
9 Hallucination Detection in Large Language Models with Metamorphic Relations 提出MetaQA,利用变质关系和提示突变检测大语言模型中的幻觉问题。 large language model
10 TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators TritonBench:首个针对LLM生成Triton算子的综合性基准测试,揭示现有模型在高性能代码生成上的不足。 large language model
11 Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models 揭示大语言模型基准测试的内在局限性,质疑其泛化能力评估的可靠性 large language model
12 MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models MedHallu:用于检测大型语言模型医学幻觉的综合基准 large language model
13 Fact or Guesswork? Evaluating Large Language Models' Medical Knowledge with Structured One-Hop Judgments 提出MKJ数据集,评估大语言模型在医学知识领域的准确性和校准性 large language model
14 Effects of Prompt Length on Domain-specific Tasks for Large Language Models 研究提示长度对大语言模型在领域特定任务上表现的影响 large language model
15 Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease RareScale:结合专家系统与大语言模型,提升罕见病诊断准确率 large language model
16 From RAG to Memory: Non-Parametric Continual Learning for Large Language Models 提出HippoRAG 2,提升LLM在事实、推理和关联记忆任务上的非参数持续学习能力。 large language model
17 Explanations of Large Language Models Explain Language Representations in the Brain 利用可解释AI,揭示大语言模型与大脑语言表征的关联 large language model
18 Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models 软Token攻击无法可靠地审计大型语言模型中的非学习效果 large language model
19 CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models 提出CORBA:一种针对基于大语言模型的多智能体系统的传染性递归阻塞攻击 large language model
20 Enhancing Smart Environments with Context-Aware Chatbots using Large Language Models 提出一种基于大语言模型的上下文感知聊天机器人,增强智能环境用户体验 large language model
21 Judging It, Washing It: Scoring and Greenwashing Corporate Climate Disclosures using Large Language Models 利用大型语言模型评估企业气候信息披露并识别“漂绿”行为 large language model
22 Optimizing Singular Spectrum for Large Language Model Compression 提出SoCo框架,通过优化奇异谱实现大语言模型高效压缩 large language model
23 Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps 提出FUR框架,通过消除推理步骤信息评估CoT推理的参数忠实度。 chain-of-thought
24 SurveyX: Academic Survey Automation via Large Language Models SurveyX:利用大型语言模型实现学术调研自动化,显著提升内容和引文质量。 large language model
25 Large Language Models Struggle to Describe the Haystack without Human Help: Human-in-the-loop Evaluation of Topic Models 大型语言模型在缺乏人工干预时难以描述大型语料库主题,需人机协同评估主题模型 large language model
26 LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning LIFT:通过长输入微调提升大语言模型的长文本理解能力 large language model
27 Behavioral Analysis of Information Salience in Large Language Models 提出可解释框架,通过摘要行为分析大语言模型的信息显著性偏好。 large language model
28 Optimal word order for non-causal text generation with Large Language Models: the Spanish case 针对西班牙语,提出基于Viterbi算法的最大似然估计方法,优化非因果语言模型的文本生成顺序。 large language model
29 Token-Level Density-Based Uncertainty Quantification Methods for Eliciting Truthfulness of Large Language Models 提出基于Token密度的大语言模型不确定性量化方法,提升生成结果的真实性。 large language model
30 A Survey on Data Contamination for Large Language Models 综述大型语言模型数据污染问题,并分析检测与应对方法 large language model
31 SR-LLM: Rethinking the Structured Representation in Large Language Model SR-LLM:通过结构化表示增强大型语言模型的推理能力 large language model
32 English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports 首个多语言Bug报告机器翻译评估:对比大型语言模型与传统翻译模型 large language model
33 Transfer-Prompting: Enhancing Cross-Task Adaptation in Large Language Models via Dual-Stage Prompts Optimization 提出Transfer-Prompting,通过双阶段Prompt优化提升大语言模型跨任务迁移能力 large language model
34 QUAD-LLM-MLTC: Large Language Models Ensemble Learning for Healthcare Text Multi-Label Classification QUAD-LLM-MLTC:利用大语言模型集成学习进行医疗文本多标签分类 large language model
35 Vulnerability of Text-to-Image Models to Prompt Template Stealing: A Differential Evolution Approach 提出EvoStealer,利用差分进化算法实现对文本生成图像模型的提示词模板窃取。 large language model multimodal
36 ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting 提出ReVision数据集与基线VLM,用于保护隐私的视觉指令重写任务 multimodal
37 LUME: LLM Unlearning with Multitask Evaluations LUME:通过多任务评估实现LLM的不可学习性 large language model
38 LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention LServe:通过统一稀疏注意力加速长序列LLM服务。 large language model
39 FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling 提出FR-Spec,通过频率排序推测采样加速大词汇量语言模型 large language model
40 Rapid Word Learning Through Meta In-Context Learning 提出Minnow元学习框架,提升语言模型在少量样本下的快速单词学习能力 large language model
41 ExpertLens: Activation steering features are highly interpretable ExpertLens:通过激活调控发现LLM中高度可解释的概念表征 large language model
42 Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection 提出一种基于不确定性验证的黑盒幻觉检测方法,提升效率并保持高性能。 large language model
43 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers LLM-Microscope揭示了Transformer上下文中标点符号的隐藏作用 large language model
44 Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries 量化AI过度拒绝与情感依恋边界:提出LLM情感边界处理评估框架。 large language model
45 Revealing and Mitigating Over-Attention in Knowledge Editing 提出选择性注意力漂移限制(SADR)方法,缓解知识编辑中的过度关注问题 large language model
46 eC-Tab2Text: Aspect-Based Text Generation from e-Commerce Product Tables 提出eC-Tab2Text数据集,用于电商产品表格到文本的属性特定生成。 large language model
47 Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis 提出Tree-of-Debate框架,利用多角色辩论树促进科学论文的对比分析和批判性思维。 large language model
48 Edit Once, Update Everywhere: A Simple Framework for Cross-Lingual Knowledge Synchronization in LLMs 提出X-KDE框架,实现LLM中跨语言知识同步编辑与更新 large language model
49 PredictaBoard: Benchmarking LLM Score Predictability PredictaBoard:评估LLM预测能力,提升AI系统安全性 large language model
50 SEA-HELM: Southeast Asian Holistic Evaluation of Language Models SEA-HELM:东南亚语言模型综合评估基准,填补多语言文化评估空白 large language model
51 MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels MCQA-Eval:利用标准答案评估NLG置信度,提升评估效率与可靠性 large language model
52 CoME: An Unlearning-based Approach to Conflict-free Model Editing CoME:一种基于遗忘学习的无冲突模型编辑方法 large language model
53 Using tournaments to calculate AUROC for zero-shot classification with LLMs 利用锦标赛机制计算AUROC,提升LLM零样本分类性能 large language model
54 Contextualizing Search Queries In-Context Learning for Conversational Rewriting with LLMs 提出Prompt引导的上下文学习方法,解决低资源对话式查询重写问题 large language model
55 CLIPPER: Compression enables long-context synthetic data generation CLIPPER:通过压缩增强长文本合成数据生成,提升叙事性声明验证性能 chain-of-thought
56 GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks 提出GATE框架以解决多任务工具构建效率问题 large language model
57 HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States HiddenDetect:通过监控隐藏状态检测大型视觉语言模型的越狱攻击 multimodal
58 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines SuperGPQA:构建涵盖285个研究生学科的大规模LLM评估基准 large language model
59 I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search I-MCTS:通过内省蒙特卡洛树搜索增强Agentic AutoML large language model
60 Bridging the Gap: Transforming Natural Language Questions into SQL Queries via Abstract Query Pattern and Contextual Schema Markup PAS-SQL通过抽象查询模式和上下文模式标记,提升Text-to-SQL在复杂问题上的性能。 large language model
61 Can LLMs Predict Citation Intent? An Experimental Analysis of In-context Learning and Fine-tuning on Open LLMs 利用通用LLM,通过上下文学习和微调预测引文意图,无需领域特定预训练。 large language model
62 LoRA-MGPO: Mitigating Double Descent in Low-Rank Adaptation via Momentum-Guided Perturbation Optimization 提出LoRA-MGPO,通过动量引导扰动优化缓解低秩自适应中的双重下降问题。 large language model
63 Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases 利用信息论分析,评估大语言模型模拟二语英语对话中母语干扰偏差的能力 large language model
64 Unshackling Context Length: An Efficient Selective Attention Approach through Query-Key Compression 提出ESA:通过查询-键压缩实现高效选择性注意力,突破长文本上下文长度限制 large language model
65 Unstructured Evidence Attribution for Long Context Query Focused Summarization 提出SUnsET数据集和非结构化证据抽取方法,提升长文本问答式摘要的真实性。 large language model
66 Leveraging Small LLMs for Argument Mining in Education: Argument Component Identification, Classification, and Assessment 利用小型LLM进行教育论证挖掘:论证成分识别、分类与评估 large language model
67 Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems 提出面向通信的大语言模型多智能体系统综述框架,分析交互机制并展望未来方向。 large language model
68 ParallelComp: Parallel Long-Context Compressor for Length Extrapolation 提出ParallelComp并行长文本压缩方法,解决LLM长文本外推的内存瓶颈和注意力衰减问题。 large language model
69 Unveiling Cultural Blind Spots: Analyzing the Limitations of mLLMs in Procedural Text Comprehension CAPTex基准揭示mLLM在文化程序文本理解上的局限性,尤其在低资源语言中 large language model
70 Learning to Retrieve and Reason on Knowledge Graph through Active Self-Reflection 提出ARG框架,通过主动自反思实现知识图谱推理的端到端训练。 large language model
71 PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant PaperHelper:基于知识的LLM问答论文阅读助手,提升文献理解效率。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
72 Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective 提出基于时间衰减的直接偏好优化方法以解决长度偏差问题 reinforcement learning RLHF DPO
73 Length-Controlled Margin-Based Preference Optimization without Reference Model 提出长度控制的边际偏好优化以解决DPO的局限性 reinforcement learning RLHF DPO
74 Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling 提出多尺度字节语言模型,实现单GPU上5M字节超长序列建模 Mamba foundation model multimodal
75 Capturing Nuanced Preferences: Preference-Aligned Distillation for Small Language Models 提出偏好对齐蒸馏(PAD)框架,提升小语言模型对人类偏好的捕捉能力 distillation large language model
76 Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning 提出Full-Step-DPO,利用步进式奖励优化数学推理中的自监督偏好。 DPO direct preference optimization
77 Drift: Decoding-time Personalized Alignments with Implicit User Preferences Drift:通过隐式用户偏好,在解码时进行个性化对齐 reinforcement learning RLHF large language model
78 MLGym: A New Framework and Benchmark for Advancing AI Research Agents 提出MLGym框架与基准测试,用于提升AI研究Agent的能力。 reinforcement learning large language model
79 DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model DeepRTL:统一表示模型桥接Verilog理解与生成 curriculum learning large language model
80 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Logic-RL:利用规则强化学习释放LLM的推理能力 reinforcement learning
81 On-the-fly Preference Alignment via Principle-Guided Decoding 提出OPAD,通过原则引导解码实现即时偏好对齐,无需微调。 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
82 NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models 提出NLoRA,利用Nyström方法加速低秩适应,提升大语言模型微调效率。 manipulation large language model
83 Enhancing Conversational Agents with Theory of Mind: Aligning Beliefs, Desires, and Intentions for Human-Like Interaction 利用心智理论增强对话Agent,对齐信念、欲望和意图以实现类人交互 manipulation large language model
84 Sentence Smith: Controllable Edits for Evaluating Text Embeddings 提出Sentence Smith框架,通过可控编辑评估文本嵌入模型 manipulation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页