| 1 |
Thinking with Visual Abstract: Enhancing Multimodal Reasoning via Visual Abstraction |
提出视觉抽象思维(VAT)方法,提升多模态大语言模型在视觉推理任务中的性能。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
SEMMA: A Semantic Aware Knowledge Graph Foundation Model |
提出SEMMA以解决知识图谱推理中的语义不足问题 |
large language model foundation model |
|
|
| 3 |
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback |
WebCoT:通过重构思维链提升Web Agent在反思、分支和回滚中的推理能力 |
large language model chain-of-thought |
|
|
| 4 |
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs |
提出ALAS:一种用于评估多模态LLM中语音-文本潜在对齐的自动指标 |
large language model multimodal |
|
|
| 5 |
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding |
提出MangaVQA基准和MangaLMM模型,用于提升多模态漫画理解能力 |
multimodal |
|
|
| 6 |
MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks |
MedHELM:用于医学任务的大语言模型全面评估框架 |
large language model |
|
|
| 7 |
Beyond Keywords: Evaluating Large Language Model Classification of Nuanced Ableism |
评估大语言模型对细微歧视性语言的分类能力,揭示其在自闭症歧视识别上的局限性。 |
large language model |
|
|
| 8 |
Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects |
多模态对话情感识别综述:方法、趋势、挑战与展望 |
multimodal |
|
|
| 9 |
Large Language Models for IT Automation Tasks: Are We There Yet? |
ITAB基准测试揭示大语言模型在IT自动化任务中,特别是Ansible脚本生成方面的局限性 |
large language model |
|
|
| 10 |
WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models |
WXImpactBench:构建天气灾害影响理解基准,评估大语言模型在气候适应中的能力 |
large language model |
|
|
| 11 |
THiNK: Can Large Language Models Think-aloud? |
THiNK:提出基于Bloom分类的多智能体反馈框架,评估并提升LLM的高阶思维能力。 |
large language model |
|
|
| 12 |
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers |
提出EXSEARCH,通过迭代自激励提升大语言模型在知识密集型任务中的搜索能力 |
large language model |
|
|
| 13 |
ResSVD: Residual Compensated SVD for Large Language Model Compression |
ResSVD:一种残差补偿的SVD大语言模型压缩方法 |
large language model |
|
|
| 14 |
Language-Agnostic Suicidal Risk Detection Using Large Language Models |
提出语言无关的自杀风险检测框架以解决现有方法局限性 |
large language model |
|
|
| 15 |
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities |
综述性研究:探讨大型语言模型与知识图谱融合在问答任务中的方法与机遇 |
large language model |
|
|
| 16 |
MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning |
提出MA-RAG多智能体框架,通过协同CoT推理解决复杂信息检索增强生成任务。 |
chain-of-thought |
|
|
| 17 |
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models |
提出MiniLongBench,一种低成本的长文本理解大语言模型评测基准 |
large language model |
✅ |
|
| 18 |
FoodTaxo: Generating Food Taxonomies with Large Language Models |
FoodTaxo:利用大型语言模型自动生成食品分类体系 |
large language model |
|
|
| 19 |
T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search |
提出T^2Agent,一种基于蒙特卡洛树搜索的工具增强型多模态虚假信息检测Agent。 |
multimodal |
|
|
| 20 |
Reasoning LLMs are Wandering Solution Explorers |
揭示推理LLM缺乏系统性探索能力,指出其为游荡式问题解决者 |
large language model chain-of-thought |
|
|
| 21 |
MemGuide: Intent-Driven Memory Selection for Goal-Oriented Multi-Session LLM Agents |
提出MemGuide框架,通过意图驱动的记忆选择提升多轮对话LLM智能体的任务连贯性。 |
large language model chain-of-thought |
|
|
| 22 |
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction |
OmniCharacter:提出一种无缝语音-语言个性化交互模型,实现沉浸式角色扮演Agent。 |
large language model |
✅ |
|
| 23 |
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution? |
提出SelfReflect指标,评估LLM能否有效传达其内部答案分布的不确定性 |
large language model |
|
|
| 24 |
Does quantization affect models' performance on long-context tasks? |
系统评估量化对长文本LLM性能的影响,揭示任务、模型和量化方法的依赖性。 |
large language model |
|
|
| 25 |
Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline |
揭示多语言LLM事实性召回pipeline,提出向量干预提升跨语言一致性 |
large language model |
|
|
| 26 |
Gatsby Without the 'E': Crafting Lipograms with LLMs |
利用大型语言模型生成有限制性文本:探索无'e'小说的创作 |
large language model |
|
|
| 27 |
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries |
Amulet:利用LLM陪审团评估复杂多轮对话,提升评判准确性 |
large language model |
|
|
| 28 |
HAMburger: Accelerating LLM Inference via Token Smashing |
HAMburger:通过Token压缩加速LLM推理,实现KV缓存和计算的亚线性增长。 |
large language model |
|
|
| 29 |
Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery |
提出ECO-Concept框架,无需标注自动发现文本解释中的可理解概念。 |
large language model |
|
|
| 30 |
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models |
FLAME-MoE:开源混合专家语言模型研究平台,促进可复现性研究 |
large language model |
✅ |
|
| 31 |
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics |
揭示LLM一致性度量与人类感知的偏差,提出logit集成方法提升对齐度 |
large language model |
|
|
| 32 |
Improving the OOD Performance of Closed-Source LLMs on NLI Through Strategic Data Selection |
通过策略性数据选择提升闭源LLM在NLI任务上的OOD泛化性能 |
large language model |
|
|
| 33 |
Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations |
提出MedAgent框架与MHSD数据集,评估LLM在多轮心理健康对话中的表现 |
large language model |
|
|
| 34 |
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs |
Pangu Light:通过权重重初始化加速和压缩大语言模型 |
large language model |
|
|
| 35 |
UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models |
UORA:大模型参数高效微调的均匀正交重初始化适配方法 |
large language model |
|
|
| 36 |
Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach |
提出自适应贪婪二分搜索AGBS,用于LLM的语义保持对抗攻击。 |
large language model |
✅ |
|
| 37 |
TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent |
TrojanStego:提出一种基于语言模型隐写术的隐私泄露攻击方法 |
large language model |
|
|
| 38 |
Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi's Zibaldone |
针对历史意大利语,提出基于BERT和LLaMa的命名实体识别方法。 |
large language model |
|
|
| 39 |
Multi-Domain Explainability of Preferences |
提出多领域偏好可解释性方法,提升LLM对人类偏好的理解与对齐。 |
large language model |
|
|
| 40 |
Inference-time Alignment in Continuous Space |
提出SEA算法,通过连续空间梯度优化实现大语言模型推理时对齐。 |
large language model |
✅ |
|
| 41 |
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks |
提出基于PCFG的LLM不确定性量化框架,提升自动推理任务可靠性 |
large language model |
|
|
| 42 |
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking |
STeP:通过合成自反思轨迹和局部掩码训练LLM驱动的智能体 |
large language model |
|
|
| 43 |
Emergent LLM behaviors are observationally equivalent to data leakage |
大型语言模型涌现行为的解释:数据泄露而非社会规范 |
large language model |
|
|
| 44 |
DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset |
DeepDialogue:一个多轮、情感丰富的口语对话数据集,促进类人对话系统研究。 |
multimodal |
|
|
| 45 |
APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation |
APE:基于接受准则的选择性微调方法用于语言模型自适应 |
large language model |
|
|
| 46 |
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages |
评估大型语言模型在印度语言音译中的表现 |
large language model |
|
|
| 47 |
Improving Multilingual Math Reasoning for African Languages |
针对非洲语言,研究者探索提升LLM在数学推理任务上的多语言能力的方法。 |
large language model |
|
|
| 48 |
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective |
从元学习视角解读:将LLM推理轨迹视为参数优化的伪梯度下降 |
large language model |
|
|
| 49 |
Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risks |
系统性地探索LLM中的意识:理论、实现与前沿风险的综述 |
large language model |
✅ |
|
| 50 |
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs |
MOLE:利用大语言模型自动提取和验证科研论文元数据 |
large language model |
✅ |
|
| 51 |
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs |
研究揭示长文本情境下大语言模型在多示例攻击中的脆弱性,强调上下文长度是关键因素。 |
large language model |
|
|