| 1 |
MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking |
提出Multiplex CoT,通过双重CoT思维提升大语言模型的自我反思能力 |
large language model chain-of-thought |
|
|
| 2 |
The Value of Nothing: Multimodal Extraction of Human Values Expressed by TikTok Influencers |
提出一种基于多模态的TikTok网红价值观提取方法,用于分析社交平台对青少年价值观的影响。 |
large language model multimodal |
|
|
| 3 |
Multi-round, Chain-of-thought Post-editing for Unfaithful Summaries |
提出多轮CoT后编辑框架,提升LLM生成摘要的事实一致性 |
large language model chain-of-thought |
|
|
| 4 |
Generative AI and Large Language Models in Language Preservation: Opportunities and Challenges |
提出一种基于社区治理和伦理保障的GenAI语言保护分析框架 |
large language model |
|
|
| 5 |
Ontology Matching with Large Language Models and Prioritized Depth-First Search |
提出MILA,结合LLM与优先深度优先搜索解决本体匹配难题 |
large language model |
|
|
| 6 |
PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents |
PlotEdit:利用多模态LLM智能体实现PDF中图表的可访问自然语言编辑 |
multimodal |
|
|
| 7 |
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks |
提出Mobile-Agent-E,通过自进化机制提升移动设备上复杂任务的处理能力。 |
foundation model multimodal |
✅ |
|
| 8 |
Irony in Emojis: A Comparative Study of Human and LLM Interpretation |
研究GPT-4o对表情符号讽刺含义的理解能力,对比人类认知。 |
large language model |
|
|
| 9 |
Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection |
揭示合成数据在LLM成员推断评估中的误导性:MIA作为机器文本检测器 |
large language model |
|
|
| 10 |
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy |
提出Explain-Query-Test自评估框架,通过解释与理解差异评估LLM |
large language model |
✅ |
|
| 11 |
Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness |
研究LLM在多语言提示下为Restless Bandits设计奖励函数对任务性能和公平性的影响 |
large language model |
|
|
| 12 |
Each Graph is a New Language: Graph Learning with LLMs |
提出GDL4LLM框架,将图结构转化为语言预训练LLM,提升节点分类性能。 |
large language model |
|
|
| 13 |
Benchmarking LLMs' Mathematical Reasoning with Unseen Random Variables Questions |
提出RV-Bench,用于评估LLM在随机变量数学问题上的推理能力。 |
large language model |
|
|
| 14 |
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility |
提出UtiliMax和MEDU,高效优化LLM预训练数据混合比例,加速训练并降低计算成本。 |
large language model |
|
|
| 15 |
Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems |
提出对话例程框架,利用提示工程构建任务型对话系统 |
large language model |
|
|
| 16 |
Zep: A Temporal Knowledge Graph Architecture for Agent Memory |
Zep:一种用于Agent记忆的时序知识图谱架构,显著提升复杂场景下的记忆检索性能。 |
large language model |
|
|
| 17 |
Guided Persona-based AI Surveys: Can we replicate personal mobility preferences at scale using LLMs? |
提出基于人物角色的LLM人工调查方法,解决大规模个人出行偏好数据获取难题 |
large language model |
|
|
| 18 |
Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP Documents |
提出Chat3GPP,一个用于3GPP文档的开源检索增强生成框架,提升电信领域问答性能。 |
large language model |
|
|
| 19 |
Redundancy Principles for MLLMs Benchmarks |
针对多模态大语言模型评测冗余问题,提出benchmark构建原则与优化策略。 |
large language model |
✅ |
|
| 20 |
Can OpenAI o1 Reason Well in Ophthalmology? A 6,990-Question Head-to-Head Evaluation Study |
OpenAI o1在眼科问答中表现评估:一项基于6990个问题的对比研究 |
large language model |
|
|
| 21 |
YouLeQD: Decoding the Cognitive Complexity of Questions and Engagement in Online Educational Videos from Learners' Perspectives |
YouLeQD:构建学习者视角下在线教育视频问题认知复杂度和参与度分析数据集 |
large language model |
|
|
| 22 |
PIKE-RAG: sPecIalized KnowledgE and Rationale Augmented Generation |
PIKE-RAG:面向工业场景,提出知识增强和推理增强的检索增强生成方法 |
large language model |
|
|