cs.CL（2025-03-09）

📊 共 18 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (15 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Multimodal Programming in Computer Science with Interactive Assistance Powered by Large Language Model	利用大语言模型构建交互式编程辅助系统，提升计算机科学教学效果	large language model multimodal
2	Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators	大型语言模型可辅助人工标注事件，但无法独立完成高质量标注	large language model
3	Delusions of Large Language Models	揭示大语言模型幻觉新形态：高置信度幻觉（Delusion）及其缓解策略	large language model
4	Alignment for Efficient Tool Calling of Large Language Models	提出多目标对齐框架，提升大语言模型工具调用效率，减少不必要调用。	large language model
5	InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models	InftyThink：突破大语言模型长文本推理长度限制，实现无限深度推理	large language model
6	Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation	利用大型语言模型作为编码器，提升神经机器翻译的效率与泛化能力	large language model
7	WildIFEval: Instruction Following in the Wild	WildIFEval：提出大规模真实用户指令数据集，评估LLM在复杂约束下的指令遵循能力	instruction following
8	Effectiveness of Zero-shot-CoT in Japanese Prompts	比较日英零-shot CoT 提示的有效性	chain-of-thought
9	PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts	PFDial：基于UML流程图的结构化对话指令微调方法，提升流程驱动对话系统性能	large language model	✅
10	DependEval: Benchmarking LLMs for Repository Dependency Understanding	DependEval：用于评估LLM在代码仓库依赖理解能力的分层基准测试	large language model
11	FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation	FEA-Bench：用于评估代码大模型在仓库级别特征实现的代码生成能力基准	large language model
12	Enhancing NLP Robustness and Generalization through LLM-Generated Contrast Sets: A Scalable Framework for Systematic Evaluation and Adversarial Training	利用LLM生成对抗样本集，提升NLP模型的鲁棒性和泛化能力	large language model
13	Evaluating and Aligning Human Economic Risk Preferences in LLMs	评估并对齐LLM中人类经济风险偏好，提升决策合理性	large language model
14	BingoGuard: LLM Content Moderation Tools with Risk Levels	BingoGuard：构建具备风险等级评估能力的大语言模型内容审核工具	large language model
15	SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations	SafeSpeech：一个用于分析对话中性别歧视和辱骂性语言的综合交互式工具	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting	提出Dr Genre：一种解耦LLM反馈的强化学习框架，用于通用文本重写任务	reinforcement learning large language model instruction following
17	GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks	提出GFlowVLM以解决多步推理中的解决方案多样性问题	reinforcement learning PPO chain-of-thought
18	Less is More: Adaptive Program Repair with Bug Localization and Preference Learning	提出AdaPatcher，通过自适应程序修复生成最小修改的补丁	preference learning

⬅️ 返回 cs.CL 首页 · 🏠 返回主页