cs.CL(2025-02-25)

📊 共 51 篇论文 | 🔗 16 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (41 🔗14) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (41 篇)

#题目一句话要点标签🔗
1 What are Foundation Models Cooking in the Post-Soviet World? 构建BORSch数据集,揭示大模型在后苏联文化食物知识上的局限性 foundation model multimodal
2 Scalable Best-of-N Selection for Large Language Models via Self-Certainty 提出基于自确信度的可扩展Best-of-N选择方法,提升大语言模型推理性能。 large language model chain-of-thought
3 Assessing Agentic Large Language Models in Multilingual National Bias 评估多语言大语言模型中的国家偏见,揭示跨语言推理偏差 large language model chain-of-thought
4 RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts RankCoT:通过排序链式思考优化检索增强生成中的知识利用 large language model chain-of-thought
5 Can Multimodal LLMs Perform Time Series Anomaly Detection? 提出VisualTimeAnomaly基准,评估多模态LLM在时间序列异常检测中的能力 large language model multimodal
6 Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs SafeCLIP:利用LVLM内生多模态对齐实现零样本有毒图像防御 multimodal
7 EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models EnDive:一个用于评估大型语言模型在不同方言上公平性和性能的跨方言基准 large language model
8 Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data 提出判别式微调(DFT)方法,无需奖励模型和人类偏好数据即可提升大语言模型性能。 large language model
9 Can Large Language Models Extract Customer Needs as well as Professional Analysts? 利用微调大语言模型自动提取客户需求,性能媲美专业分析师 large language model
10 From Small to Large Language Models: Revisiting the Federalist Papers 重探《联邦党人文集》作者归属问题:对比小型与大型语言模型 large language model
11 FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models FactReasoner:一种用于评估大型语言模型生成长文本事实性的概率方法 large language model
12 Uncertainty Modeling in Multimodal Speech Analysis Across the Psychosis Spectrum 提出一种不确定性感知的多模态语音分析模型,用于精神病谱系症状评估。 multimodal
13 SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models 提出SECURA,通过Sigmoid增强的CUR分解LoRA,提升LLM微调性能并缓解灾难性遗忘。 large language model
14 NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts NusaAksara:印尼本土文字保护的多模态多语言基准数据集 multimodal
15 Harnessing Multiple Large Language Models: A Survey on LLM Ensemble 首个LLM集成综述:系统性回顾集成方法、基准与应用,并展望未来方向 large language model
16 Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference 提出基于采样的推理方法,检测视觉大语言模型的知识边界,提升检索增强生成效率。 large language model
17 FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models 提出FACT-AUDIT,用于动态评估大型语言模型的事实核查能力 large language model
18 Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation 提出评估框架以识别隐性自杀意念并提供支持 large language model
19 LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems 提出LR^2Bench基准,用于评估大语言模型长链反思推理能力。 large language model
20 Chain of Draft: Thinking Faster by Writing Less 提出Chain of Draft,通过精简中间推理步骤提升LLM效率。 large language model chain-of-thought
21 TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning 提出TextGames基准,评估LLM在文本游戏中的推理能力 large language model instruction following
22 Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs 针对版面丰富文档的信息抽取,提出基于LLM的设计空间探索方法LayIE-LLM large language model multimodal
23 Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments 提出多语言Program-of-Thought框架,提升跨语言环境下LLM的推理能力 large language model chain-of-thought
24 URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models 提出URO-Bench,用于端到端语音对话模型全面评测 large language model instruction following
25 A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition 提出CMAS协同多智能体框架,解决零样本命名实体识别中的上下文关联和示范利用问题。 large language model
26 Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources 对比单/双Prompt方法,利用LLM合成HR面试对话,提升对话质量 large language model
27 Steered Generation via Gradient Descent on Sparse Features 提出基于稀疏特征梯度下降的引导式生成方法,用于精确控制LLM的输出特性。 large language model
28 FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response FRIDA:利用合成数据提升LLM在灾难响应中基于对象的常识推理能力 large language model
29 Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods 提出蒙特卡洛温度采样(MCT),提升LLM不确定性量化方法在不同温度下的鲁棒性。 large language model
30 Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology 提出MOSAIC方法,利用主题建模和LLM分析频闪现象学报告,揭示潜在体验模式。 large language model
31 WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging WiCkeD:通过引入“以上皆非”选项,提升多项选择题基准测试的难度 chain-of-thought
32 RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction 提出RefuteBench 2.0以动态评估LLM对反驳指令的响应能力 large language model
33 Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases 揭示LLM在美国最高法院案件中的政治倾向:训练数据还是民意调查? large language model
34 Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization 研究表明,细粒度CoT数据能显著提升语言模型在复杂任务上的泛化能力。 chain-of-thought
35 League: Leaderboard Generation on Demand 提出Leaderboard Auto Generation (LAG)框架,自动生成AI研究领域排行榜。 large language model
36 Grandes modelos de lenguaje: de la predicción de palabras a la comprensión? 探讨大型语言模型:从单词预测到语言理解的演变与挑战 large language model
37 Can LLMs Explain Themselves Counterfactually? 研究表明大型语言模型在生成反事实解释方面存在局限性 large language model
38 HyperG: Hypergraph-Enhanced LLMs for Structured Knowledge HyperG:一种超图增强的LLM框架,用于处理结构化知识 large language model
39 from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors 提出AVATAR框架,利用对抗隐喻诱导大语言模型越狱 large language model
40 CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation CaseGen:构建中文法律领域多阶段法律文书生成基准,促进法律AI发展 large language model
41 Constraining Sequential Model Editing with Editing Anchor Compression 提出编辑锚压缩(EAC)框架,约束序列模型编辑中的参数漂移,提升通用能力。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
42 Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning 揭示思维链蒸馏的关键因素,优化小语言模型推理能力。 distillation large language model chain-of-thought
43 DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers DRAMA:利用大语言模型增强小规模稠密检索器的多样性训练 contrastive learning large language model
44 Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent 提出MADeN框架,优化LLM在债务催收谈判中的决策能力 DPO large language model
45 Steering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of Thought 提出C$^2$SER模型,通过上下文感知和思维链提升语音情感识别的稳定性和准确性。 distillation chain-of-thought
46 Comparative Analysis Based on DeepSeek, ChatGPT, and Google Gemini: Features, Techniques, Performance, Future Prospects 对比DeepSeek、ChatGPT和Gemini:特性、技术、性能与未来展望 reinforcement learning RLHF large language model
47 Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning 提出思维最优缩放策略,解决LLM推理中过度CoT长度带来的性能下降问题 distillation large language model chain-of-thought
48 AfroXLMR-Comet: Multilingual Knowledge Distillation with Attention Matching for Low-Resource languages AfroXLMR-Comet:面向低资源非洲语言,结合注意力匹配的多语言知识蒸馏 distillation large language model
49 Advantage-Guided Distillation for Preference Alignment in Small Language Models 提出优势引导蒸馏ADPA,提升小语言模型偏好对齐能力 distillation large language model
50 MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment 提出MPO框架,通过混合不同偏好策略实现高效的偏好对齐后处理。 reinforcement learning RLHF large language model
51 Rank1: Test-Time Compute for Reranking in Information Retrieval Rank1:一种利用测试时计算进行信息检索重排序的模型 distillation instruction following

⬅️ 返回 cs.CL 首页 · 🏠 返回主页