cs.CL(2025-02-19)

📊 共 67 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (57 🔗12) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (57 篇)

#题目一句话要点标签🔗
1 MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification 提出MM-Verify,通过思维链验证增强多模态推理能力,超越GPT-4o。 large language model multimodal chain-of-thought
2 ProMedTS: A Self-Supervised, Prompt-Guided Multimodal Approach for Integrating Medical Text and Time Series ProMedTS:一种自监督提示引导的多模态方法,用于整合医学文本和时间序列数据 large language model multimodal
3 Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method Time-LlaMA:一种高效参数适配方法,用于大型语言模型在时间序列建模中的应用 large language model foundation model
4 Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition 提出LDDU框架,通过解耦潜在情绪分布建模不确定性,提升多模态情感识别性能。 multimodal
5 Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region 揭示大语言模型安全机制脆弱性:模板锚定导致的安全对齐易受攻击 large language model
6 Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset 利用大型语言模型进行零样本常识验证与推理:SemEval-2020 Task 4数据集评估 large language model
7 REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models 提出REFIND框架,通过检索增强和上下文敏感度量化,检测大语言模型中的幻觉。 large language model
8 Complex Ontology Matching with Large Language Model Embeddings 提出一种融合大型语言模型嵌入的复杂本体匹配方法,显著提升了匹配的表达能力。 large language model
9 What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis 通过分析LLM内部状态理解幻觉“心理”,实现无需外部信息的幻觉检测。 large language model
10 Batayan: A Filipino NLP benchmark for evaluating Large Language Models Batayan:构建菲律宾语NLP基准,评估大型语言模型在低资源语言上的性能 large language model
11 GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking GIMMICK:构建全球包容的多模态多任务文化知识基准评测体系 multimodal
12 PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models 提出PRIV-QA:一种面向云端大语言模型的隐私保护问答框架 large language model
13 Detecting Linguistic Bias in Government Documents Using Large language Models 提出DGDB数据集并微调BERT模型,用于检测政府文件中存在的语言偏见。 large language model
14 Towards Lightweight, Adaptive and Attribute-Aware Multi-Aspect Controllable Text Generation with Large Language Models 提出轻量级、自适应和属性感知的多方面可控文本生成框架,提升大语言模型控制能力。 large language model
15 Event Segmentation Applications in Large Language Model Enabled Automated Recall Assessments 利用大语言模型实现自动化事件分割与记忆评估,提升认知研究效率。 large language model
16 MMTEB: Massive Multilingual Text Embedding Benchmark 提出大规模多语言文本嵌入基准MMTEB,用于全面评估文本嵌入模型。 large language model instruction following
17 OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment OpenSearch-SQL:通过动态Few-shot和一致性对齐增强Text-to-SQL性能 large language model instruction following
18 STaR-SQL: Self-Taught Reasoner for Text-to-SQL 提出STaR-SQL,通过自学习推理提升Text-to-SQL任务性能 large language model chain-of-thought
19 Transferring Textual Preferences to Vision-Language Understanding through Model Merging 通过模型融合将文本偏好迁移到视觉-语言理解模型 multimodal
20 RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression RocketKV:通过两阶段KV缓存压缩加速长文本LLM推理。 large language model
21 DataSciBench: An LLM Agent Benchmark for Data Science DataSciBench:一个用于评估LLM在数据科学任务中能力的综合性基准测试。 large language model
22 The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text 针对LLM生成合成文本的隐私风险,提出新型数据驱动的成员推理攻击方法。 large language model
23 VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare VITAL:针对医疗领域多元化对齐的基准数据集 large language model
24 MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures 提出MaskPrune,通过掩码学习实现LLM逐层均匀结构化剪枝 large language model
25 PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference 提出PLDR-LLM以替代深度神经网络进行推理 large language model
26 ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails ThinkGuard:通过审慎的慢思考实现更可靠的大语言模型安全防护 large language model
27 A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language? 研究表明大型语言模型在多大程度上能捕捉语言的分形复杂性,并揭示了影响因素。 large language model
28 How Do LLMs Perform Two-Hop Reasoning in Context? 揭示LLM上下文学习中双跳推理的机制:从随机猜测到结构化查询 large language model
29 SIFT: Grounding LLM Reasoning in Contexts via Stickers SIFT:通过Sticker机制增强LLM在上下文中的推理能力 large language model
30 C2T: A Classifier-Based Tree Construction Method in Speculative Decoding 提出C2T方法,利用分类器动态构建token树,提升推理解码效率。 large language model
31 Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference 提出ActQKV以解决长上下文LLMs推理效率问题 large language model
32 Towards Geo-Culturally Grounded LLM Generations 检索增强提升LLM文化感知能力,但需警惕刻板印象风险 large language model
33 EvoP: Robust LLM Inference via Evolutionary Pruning EvoP:通过演化剪枝实现鲁棒的大语言模型推理 large language model
34 TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation TreeCut:用于评估LLM幻觉的合成不可解数学应用题数据集 large language model
35 The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? Crescent:一种无需外部监督信号的LLM自提升数学推理能力框架 large language model
36 Benchmarking LLMs for Political Science: A United Nations Perspective 提出联合国基准以评估大语言模型在政治决策中的应用 large language model
37 Towards Context-Robust LLMs: A Gated Representation Fine-tuning Approach 提出Grft门控微调方法,提升LLM在检索增强生成中对上下文的鲁棒性 large language model
38 Retrieving Versus Understanding Extractive Evidence in Few-Shot Learning 分析少样本学习中LLM抽取式证据检索与理解的关联性,揭示模型预测误差与证据检索误差的关系。 large language model
39 Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral UniMoral:构建多语言道德推理计算流程,弥合文化差异下的道德理解鸿沟 large language model
40 Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences? 提出StripCipher基准,评估LMMs在图像序列中的时序和上下文理解能力 multimodal
41 TESS 2: A Large-Scale Generalist Diffusion Language Model TESS 2:一种大规模通用扩散语言模型,性能媲美自回归模型。 instruction following
42 Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking 提出Inner Thinking Transformer,通过动态深度缩放提升LLM在关键token上的推理能力。 large language model
43 From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions MemoryCode:评估LLM在多轮编码交互中的长期记忆能力 large language model
44 SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning 提出SCALAR:基于科学引用的长文本学术推理实时评估基准 large language model
45 Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding 提出In-Context Contrastive Decoding,增强ICL中输入-标签映射,提升NLU任务性能。 large language model
46 Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora 提出一种无需训练的文本信息价值评估方法,助力大语言模型高效数据集成 large language model
47 Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh 针对低资源语言哈萨克语,提出基于政府和文化数据的指令调优方法。 instruction following
48 D.Va: Validate Your Demonstration First Before You Use It 提出D.Va:一种基于验证的ICL示例选择方法,提升LLM在NLU/NLG任务上的性能。 large language model
49 Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts Qorgau:评估LLM在哈萨克-俄语双语环境下的安全性 large language model
50 BeamLoRA: Beam-Constraint Low-Rank Adaptation BeamLoRA:提出一种基于Beam搜索的低秩自适应方法,提升大语言模型微调精度。 large language model
51 Shall Your Data Strategy Work? Perform a Swift Study 提出一种快速评估指令微调数据有效性的方法,无需模型重训练。 chain-of-thought
52 Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning Drowzee:利用时序逻辑增强LLM事实冲突幻觉检测 large language model
53 Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study 提出一种基于Prompt的权重机制,提升LLM作为评判者的NLG任务评估能力 large language model
54 Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval 提出PGMR框架,利用后生成记忆检索减少LLM在SPARQL查询生成中的幻觉问题 large language model
55 Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications FineEdit:通过微调LLM实现精准和目标明确的文本编辑 large language model
56 Craw4LLM: Efficient Web Crawling for LLM Pretraining Craw4LLM:面向LLM预训练的高效网页爬取方法,显著降低爬取浪费。 large language model
57 GneissWeb: Preparing High Quality Data for LLMs at Scale GneissWeb:构建高质量大规模LLM训练数据集,提升模型泛化能力 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
58 Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values 提出直接价值优化(DVO),通过精细化价值信号提升LLM的CoT推理能力 reinforcement learning large language model chain-of-thought
59 LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization LongPO:通过短到长偏好优化实现大语言模型长上下文自进化 DPO large language model
60 Refining Sentence Embedding Model through Ranking Sentences Generation with Large Language Models 提出基于大语言模型排序生成句子的句子嵌入模型优化方法 contrastive learning large language model
61 LLM should think and action as a human 提出基于内置思维链的LLM交互方法,提升多轮对话中的推理和规划能力 reinforcement learning large language model chain-of-thought
62 MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads 提出MuDAF,通过对比学习优化注意力头,提升长文本多文档问答性能。 contrastive learning large language model
63 RLTHF: Targeted Human Feedback for LLM Alignment RLTHF:通过有针对性的人工反馈实现LLM对齐,降低标注成本并提升性能。 reinforcement learning RLHF large language model
64 Task-agnostic Prompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor 提出任务无关提示压缩框架TPC,提升LLM在长文本任务中的泛化能力。 reinforcement learning large language model
65 MoM: Linear Sequence Modeling with Mixture-of-Memories 提出MoM:一种混合记忆的线性序列建模方法,提升长序列召回能力 state space model linear attention
66 Efficient Safety Retrofitting Against Jailbreaking for LLMs 提出Egida数据集和DPO微调方法,高效提升LLM抗越狱攻击的安全性 DPO direct preference optimization

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
67 Don't Stop the Multi-Party! On Generating Synthetic Multi-Party Conversations with Constraints 提出基于指令微调LLM的受约束多方对话生成方法,解决现有数据集隐私和平台局限性问题。 MPC large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页