cs.CL(2025-04-21)

📊 共 24 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (22 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)

#题目一句话要点标签🔗
1 Rethinking the Potential of Multimodality in Collaborative Problem Solving Diagnosis with Large Language Models 利用大语言模型,探索多模态数据在协同问题解决诊断中的潜力 large language model multimodal
2 EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models EasyEdit2:一种易于使用的大语言模型行为引导框架 large language model
3 The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models 提出AutoNuggetizer框架,利用大语言模型自动化RAG系统的事实抽取与评估。 large language model
4 Retrieval Augmented Generation Evaluation in the Era of Large Language Models: A Comprehensive Survey 全面综述大语言模型时代检索增强生成(RAG)的评估方法与框架。 large language model
5 Natural Fingerprints of Large Language Models 揭示大语言模型“自然指纹”:即使同数据集训练,模型输出仍可区分 large language model
6 Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators JETTS基准:评估LLM-Judge在测试时计算扩展中的有效性,揭示其在不同任务中的优劣势。 large language model instruction following
7 CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation CRUST-Bench:一个全面的C到安全Rust转译基准测试,促进遗留C代码的安全迁移。 large language model
8 Fully Bayesian Approaches to Topics over Time 提出全贝叶斯时间主题模型(WBToT),提升主题随时间变化的建模稳定性和事件捕获能力。 TAMP
9 On Self-improving Token Embeddings 提出一种自提升Token嵌入方法,用于增强特定领域文本表示。 large language model
10 Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges 对比人类与LLM评估RAG系统支持度,验证GPT-4o作为评估者的可靠性。 large language model
11 Kuwain 1.5B: An Arabic SLM via Language Injection 提出基于语言注入的阿拉伯语SLM,提升性能并保留原有知识 large language model
12 Speculative Sampling via Exponential Races 提出基于指数竞赛的推测采样方法ERSD,加速大语言模型推理。 large language model
13 Tell Me What You Know About Sexism: Expert-LLM Interaction Strategies and Co-Created Definitions for Zero-Shot Sexism Detection 通过专家与LLM协同,为零样本性别歧视检测构建定义并提升性能 large language model
14 MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety MrGuard:提出一种多语言推理安全防线,提升通用LLM在多语言环境下的安全性。 large language model
15 EvalAgent: Discovering Implicit Evaluation Criteria from the Web EvalAgent:从网络挖掘隐含的评估标准,提升语言模型生成质量。 large language model
16 The Synthetic Imputation Approach: Generating Optimal Synthetic Texts For Underrepresented Categories In Supervised Classification Tasks 提出合成插补方法,利用生成式LLM为监督分类任务中代表性不足的类别生成最优合成文本。 large language model
17 Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT 利用ChatGPT评估机器翻译质量:基于LSP翻译错误类型的初步实验 large language model
18 Stay Hungry, Stay Foolish: On the Extended Reading Articles Generation with LLMs 利用大型语言模型自动生成扩展阅读材料与课程推荐,辅助教育内容创作 large language model
19 Efficient Pretraining Length Scaling 提出PHD-Transformer,实现预训练阶段高效长度扩展并保持推理效率。 large language model
20 Evaluating LLMs on Chinese Topic Constructions: A Research Proposal Inspired by Tian et al. (2024) 提出评估框架,用于考察大型语言模型在中文话题结构和岛屿约束上的语法知识。 large language model
21 CRAVE: A Conflicting Reasoning Approach for Explainable Claim Verification Using LLMs CRAVE:提出一种基于冲突推理的可解释声明验证方法,利用大语言模型提升复杂声明验证的准确性和透明度。 large language model
22 Transparentize the Internal and External Knowledge Utilization in LLMs with Trustworthy Citation 提出Context-Prior增强的引用生成任务,提升LLM内部和外部知识利用的可信度。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
23 Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models 利用基于特征的预测模型检测临床训练LLM中的越狱攻击 predictive model large language model
24 DistilQwen2.5: Industrial Practices of Training Distilled Open Lightweight Language Models 提出DistilQwen2.5,通过蒸馏技术提升轻量级LLM在资源受限场景下的指令遵循能力。 distillation large language model instruction following

⬅️ 返回 cs.CL 首页 · 🏠 返回主页