cs.CL(2024-07-04)

📊 共 46 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (41 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (41 篇)

#题目一句话要点标签🔗
1 NutriBench: A Dataset for Evaluating Large Language Models on Nutrition Estimation from Meal Descriptions NutriBench:一个用于评估大型语言模型从膳食描述中估计营养成分的数据集 large language model chain-of-thought
2 Benchmarking Complex Instruction-Following with Multiple Constraints Composition 提出ComplexBench,用于评估LLM在多约束组合下的复杂指令遵循能力 large language model instruction following
3 M5 -- A Diverse Benchmark to Assess the Performance of Large Multimodal Models Across Multilingual and Multicultural Vision-Language Tasks M5:一个多语言多文化的大型多模态模型评测基准 large language model multimodal
4 Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models 利用大语言模型理解对话,生成视觉描述符,提升图像选择效果 large language model multimodal
5 metabench -- A Sparse Benchmark of Reasoning and Knowledge in Large Language Models MetaBench:针对大型语言模型推理和知识能力的稀疏基准测试集 large language model
6 A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations 系统性评测大语言模型:挑战、局限与建议 large language model
7 Convolutional vs Large Language Models for Software Log Classification in Edge-Deployable Cellular Network Testing 提出一种紧凑型CNN,用于边缘部署的蜂窝网络测试中软件日志分类,显著优于LLM。 large language model
8 Text2TimeSeries: Enhancing Financial Forecasting through Time Series Prediction Updates with Event-Driven Insights from Large Language Models Text2TimeSeries:利用大语言模型事件驱动的洞察力增强金融预测 large language model
9 Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models 提出RelD:一种鲁棒的LLM幻觉检测器,提升答案可靠性 large language model
10 MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization 提出MAPO:一种模型自适应的Prompt优化方法,提升大语言模型在下游任务中的性能。 large language model
11 Deep Content Understanding Toward Entity and Aspect Target Sentiment Analysis on Foundation Models 提出EASTE任务,利用Transformer模型进行实体-属性目标情感分析,实现细粒度情感理解。 foundation model
12 MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production 提出MS2SL框架,利用多模态信息生成连续手语序列 multimodal
13 Integrating Randomness in Large Language Models: A Linear Congruential Generator Approach for Generating Clinically Relevant Content 利用线性同余生成器,提升大语言模型生成临床相关内容的多样性与质量 large language model
14 LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking LLMAEL:利用大语言模型增强实体链接的上下文信息,显著提升链接准确率。 large language model
15 TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models 提出TongGu以解决古典汉语理解的挑战 large language model
16 Chain-of-Thought Augmentation with Logit Contrast for Enhanced Reasoning in Language Models 提出基于Logit对比的思维链增强方法,提升语言模型推理能力 chain-of-thought
17 Black-box Model Ensembling for Textual and Visual Question Answering via Information Fusion 提出InfoSel,通过信息融合实现黑盒模型集成,提升文本和视觉问答性能 large language model multimodal
18 ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents ChatSOP:一种SOP引导的MCTS规划框架,用于可控LLM对话Agent large language model chain-of-thought
19 STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering 提出STOC-TOT以解决多跳问答中的复杂推理问题 large language model chain-of-thought
20 Semantic Graphs for Syntactic Simplification: A Revisit from the Age of LLM 提出AMRS^3方法以提升句法简化任务的性能 large language model instruction following
21 Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems 提出ZPS:一种基于约束引导的多智能体系统,用于解决斑马难题 large language model chain-of-thought
22 Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs 利用未来事件作为后门触发器:研究LLM中的时间脆弱性 large language model
23 Improving Self Consistency in LLMs through Probabilistic Tokenization 利用概率分词提升大型语言模型在推理任务中的自洽性 large language model
24 Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms 评估大语言模型在维基百科中应用中立性原则的能力与偏差 large language model
25 Unlocking the Potential of Model Merging for Low-Resource Languages 提出模型融合方法,解决低资源语言LLM任务能力不足问题 large language model
26 LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs LLM-jp:一个用于研发完全开源日语LLM的跨组织项目 large language model
27 HYBRINFOX at CheckThat! 2024 -- Task 1: Enhancing Language Models with Structured Information for Check-Worthiness Estimation HYBRINFOX团队提出一种融合结构化信息增强语言模型的方法,用于评估新闻报道的查证价值。 large language model
28 Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction 提出SWiM评估框架与Medoid Voting推理方法,提升长文本语言模型中间信息利用率。 large language model
29 HAF-RM: A Hybrid Alignment Framework for Reward Model Training 提出混合对齐框架HAF-RM,提升奖励模型训练效果与对齐能力 large language model
30 Defense Against Syntactic Textual Backdoor Attacks with Token Substitution 提出基于Token替换的在线防御算法,有效对抗文本后门攻击 large language model
31 Unveiling Scoring Processes: Dissecting the Differences between LLMs and Human Graders in Automatic Scoring 揭示LLM评分过程:剖析自动评分中LLM与人类评分者的差异 large language model
32 A Survey on Natural Language Counterfactual Generation 综述自然语言反事实生成技术,着重分析基于大语言模型的方法。 large language model
33 A framework for annotating and modelling intentions behind metaphor use 提出隐喻意图分类体系并构建数据集,评估LLM在此任务上的表现 large language model
34 Automated Progressive Red Teaming 提出自动化渐进式红队测试框架APRT,有效识别大语言模型潜在风险。 large language model
35 Anthropocentric bias in language model evaluation 揭示并缓解语言模型评估中以人类为中心的偏见,提升评估的客观性和准确性 large language model
36 GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels 对比人类译员,全面评估GPT-4在多语言、领域和专业水平下的翻译质量 large language model
37 DSLR: Document Refinement with Sentence-Level Re-ranking and Reconstruction to Enhance Retrieval-Augmented Generation 提出DSLR框架,通过句子级别重排序和重构优化RAG系统中的文档检索。 large language model
38 Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks 提出问题分析提示(QAP)方法,提升LLM在推理任务中的性能 chain-of-thought
39 The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model 提出可注入重对齐模型(IRM),揭示Llama 2模型内部神经元与对齐行为的关联性。 large language model
40 Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction 构建十九世纪拉丁美洲西班牙语报纸语料库,并提出基于LLM的OCR纠错框架。 large language model
41 Core: Robust Factual Precision with Informative Sub-Claim Identification 提出Core,通过信息性子声明识别增强大语言模型事实精确度评估的鲁棒性 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
42 Improving Sample Efficiency of Reinforcement Learning with Background Knowledge from Large Language Models 利用大语言模型背景知识提升强化学习样本效率 reinforcement learning policy learning reward shaping
43 DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning DotaMath:结合代码辅助与自纠错的思维分解方法,提升数学推理能力 imitation learning large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
44 Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks 提出机器人操作任务多模态模型评估框架,关注指令多样性和任务难度对泛化性的影响 manipulation multimodal
45 Systematic Task Exploration with LLMs: A Study in Citation Text Generation 提出基于LLM的引文文本生成研究框架,系统探索任务定义与评估方法。 manipulation large language model

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
46 Can Pre-trained Language Models Understand Chinese Humor? 系统性评估预训练语言模型对中文幽默的理解能力 HuMoR

⬅️ 返回 cs.CL 首页 · 🏠 返回主页