cs.CL(2025-04-30)

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (17 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (17 篇)

#题目一句话要点标签🔗
1 Meeseeks: A Feedback-Driven, Iterative Self-Correction Benchmark evaluating LLMs' Instruction Following Capability Meeseeks:一个反馈驱动的迭代自纠正基准,用于评估LLM的指令遵循能力 large language model instruction following chain-of-thought
2 On the Failure of Latent State Persistence in Large Language Models 揭示大语言模型在维持潜在状态持久性方面的不足 large language model
3 Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models 利用微调大语言模型分析古代和中世纪小说中的文学母题 large language model
4 Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring? 研究表明,基于Prompt的大语言模型在作文评分中会识别学生人口统计信息并引入偏见。 large language model
5 Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges 提出基于贝叶斯推断的LLM评估方法,解决小样本评估中的置信度问题 large language model
6 GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling GDI-Bench:一个视觉与推理解耦的通用文档智能基准 large language model multimodal
7 Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5 提出基于Exaone 3.5的文本到SQL生成事实一致性评估框架,用于商业智能领域。 large language model
8 Clustering Internet Memes Through Template Matching and Multi-Dimensional Similarity 提出基于模板匹配和多维相似性的互联网模因聚类方法,无需预定义数据库并提升聚类效果。 multimodal
9 Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications 综述:通过心理测量工具、数据集和人机应用来理解和“人性化”大型语言模型 large language model
10 Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs 研究表明LLM在推理长度上存在校准问题,对简单问题过度思考,对难题思考不足。 large language model
11 Fine-Tuning LLMs for Low-Resource Dialect Translation: The Case of Lebanese 针对低资源黎巴嫩方言翻译,提出基于文化数据微调LLM的方法 large language model
12 RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations 提出基于RDF的框架,用于评估多语言LLM在知识冲突下的质量。 large language model
13 Memorization and Knowledge Injection in Gated LLMs MEGa:门控LLM中嵌入记忆与知识注入,解决持续学习中的灾难性遗忘问题 large language model
14 AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models AdaptMI:面向小语言模型的自适应技能型上下文数学指令学习 large language model
15 A Report on the llms evaluating the high school questions 评估大型语言模型在解决高中科学问题中的表现及教育应用潜力 large language model
16 Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models 针对LLaMA模型的Spike感知混合精度量化策略,提升量化性能。 large language model
17 Who Gets the Callback? Generative AI and Gender Bias 通过审计开源LLM揭示招聘中的性别偏见,尤其在高薪职位上男性更受青睐。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
18 DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition DeepSeek-Prover-V2:强化学习分解子目标,提升形式化数学推理能力 reinforcement learning large language model chain-of-thought
19 BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models BiasGuard:一种增强推理的大语言模型偏见检测工具 reinforcement learning large language model
20 Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Phi-4-Mini-Reasoning:探索小型语言模型在数学推理中的极限 reinforcement learning DPO distillation
21 WebThinker: Empowering Large Reasoning Models with Deep Research Capability WebThinker:赋予大型推理模型深度网络研究能力,提升复杂知识密集型任务性能 DPO direct preference optimization

⬅️ 返回 cs.CL 首页 · 🏠 返回主页