cs.CL（2025-04-30）

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (17 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (17 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Meeseeks: A Feedback-Driven, Iterative Self-Correction Benchmark evaluating LLMs' Instruction Following Capability	Meeseeks：一个反馈驱动的迭代自纠正基准，用于评估LLM的指令遵循能力	large language model instruction following chain-of-thought	✅
2	On the Failure of Latent State Persistence in Large Language Models	揭示大语言模型在维持潜在状态持久性方面的不足	large language model
3	Investigating Literary Motifs in Ancient and Medieval Novels with Large Language Models	利用微调大语言模型分析古代和中世纪小说中的文学母题	large language model
4	Does the Prompt-based Large Language Model Recognize Students' Demographics and Introduce Bias in Essay Scoring?	研究表明，基于Prompt的大语言模型在作文评分中会识别学生人口统计信息并引入偏见。	large language model
5	Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges	提出基于贝叶斯推断的LLM评估方法，解决小样本评估中的置信度问题	large language model
6	GDI-Bench: A Benchmark for General Document Intelligence with Vision and Reasoning Decoupling	GDI-Bench：一个视觉与推理解耦的通用文档智能基准	large language model multimodal	✅
7	Fact-Consistency Evaluation of Text-to-SQL Generation for Business Intelligence Using Exaone 3.5	提出基于Exaone 3.5的文本到SQL生成事实一致性评估框架，用于商业智能领域。	large language model
8	Clustering Internet Memes Through Template Matching and Multi-Dimensional Similarity	提出基于模板匹配和多维相似性的互联网模因聚类方法，无需预定义数据库并提升聚类效果。	multimodal
9	Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications	综述：通过心理测量工具、数据集和人机应用来理解和“人性化”大型语言模型	large language model
10	Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs	研究表明LLM在推理长度上存在校准问题，对简单问题过度思考，对难题思考不足。	large language model
11	Fine-Tuning LLMs for Low-Resource Dialect Translation: The Case of Lebanese	针对低资源黎巴嫩方言翻译，提出基于文化数据微调LLM的方法	large language model
12	RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations	提出基于RDF的框架，用于评估多语言LLM在知识冲突下的质量。	large language model
13	Memorization and Knowledge Injection in Gated LLMs	MEGa：门控LLM中嵌入记忆与知识注入，解决持续学习中的灾难性遗忘问题	large language model
14	AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models	AdaptMI：面向小语言模型的自适应技能型上下文数学指令学习	large language model
15	A Report on the llms evaluating the high school questions	评估大型语言模型在解决高中科学问题中的表现及教育应用潜力	large language model
16	Precision Where It Matters: A Novel Spike Aware Mixed-Precision Quantization Strategy for LLaMA-based Language Models	针对LLaMA模型的Spike感知混合精度量化策略，提升量化性能。	large language model
17	Who Gets the Callback? Generative AI and Gender Bias	通过审计开源LLM揭示招聘中的性别偏见，尤其在高薪职位上男性更受青睐。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
18	DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	DeepSeek-Prover-V2：强化学习分解子目标，提升形式化数学推理能力	reinforcement learning large language model chain-of-thought
19	BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models	BiasGuard：一种增强推理的大语言模型偏见检测工具	reinforcement learning large language model
20	Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math	Phi-4-Mini-Reasoning：探索小型语言模型在数学推理中的极限	reinforcement learning DPO distillation
21	WebThinker: Empowering Large Reasoning Models with Deep Research Capability	WebThinker：赋予大型推理模型深度网络研究能力，提升复杂知识密集型任务性能	DPO direct preference optimization	✅

⬅️ 返回 cs.CL 首页 · 🏠 返回主页