| 1 |
Thinking with Visual Abstract: Enhancing Multimodal Reasoning via Visual Abstraction |
提出视觉抽象思维以提升多模态推理能力 |
large language model multimodal chain-of-thought |
|
|
| 2 |
SEMMA: A Semantic Aware Knowledge Graph Foundation Model |
提出SEMMA以解决知识图谱推理中的语义不足问题 |
large language model foundation model |
|
|
| 3 |
WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback |
提出WebCoT以增强网络代理在动态环境中的推理能力 |
large language model chain-of-thought |
|
|
| 4 |
ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs |
提出ALAS以解决多模态LLMs中的语音文本对齐问题 |
large language model multimodal |
|
|
| 5 |
MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding |
提出MangaVQA和MangaLMM以解决多模态漫画理解问题 |
multimodal |
|
|
| 6 |
MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks |
提出MedHELM框架以全面评估医疗任务中的大型语言模型表现 |
large language model |
|
|
| 7 |
Beyond Keywords: Evaluating Large Language Model Classification of Nuanced Ableism |
评估大型语言模型对细微能力歧视的分类能力 |
large language model |
|
|
| 8 |
Multimodal Emotion Recognition in Conversations: A Survey of Methods, Trends, Challenges and Prospects |
提出多模态情感识别方法以解决对话系统情感理解不足的问题 |
multimodal |
|
|
| 9 |
Large Language Models for IT Automation Tasks: Are We There Yet? |
提出ITAB基准以评估LLM在IT自动化任务中的表现 |
large language model |
|
|
| 10 |
WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models |
提出WXImpactBench以评估大语言模型在极端天气影响理解中的能力 |
large language model |
|
|
| 11 |
THiNK: Can Large Language Models Think-aloud? |
提出THiNK框架以评估大型语言模型的高阶思维能力 |
large language model |
|
|
| 12 |
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers |
提出EXSEARCH框架以解决LLM在复杂任务中的信息检索问题 |
large language model |
|
|
| 13 |
ResSVD: Residual Compensated SVD for Large Language Model Compression |
提出ResSVD以解决大语言模型压缩中的残差损失问题 |
large language model |
|
|
| 14 |
Language-Agnostic Suicidal Risk Detection Using Large Language Models |
提出语言无关的自杀风险检测框架以解决现有方法局限性 |
large language model |
|
|
| 15 |
Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities |
提出知识图谱与大语言模型结合的方法以解决复杂问答问题 |
large language model |
|
|
| 16 |
MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning |
提出MA-RAG框架以解决复杂信息检索中的推理挑战 |
chain-of-thought |
|
|
| 17 |
MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models |
提出MiniLongBench以降低长文本理解基准的评估成本 |
large language model |
✅ |
|
| 18 |
FoodTaxo: Generating Food Taxonomies with Large Language Models |
提出FoodTaxo以解决食品技术领域分类生成问题 |
large language model |
|
|
| 19 |
T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search |
提出T^2Agent以解决多模态虚假信息检测问题 |
multimodal |
|
|
| 20 |
Reasoning LLMs are Wandering Solution Explorers |
提出系统性问题解决框架以提升推理LLM的探索能力 |
large language model chain-of-thought |
|
|
| 21 |
MemGuide: Intent-Driven Memory Selection for Goal-Oriented Multi-Session LLM Agents |
提出MemGuide以解决多会话对话系统中的意图驱动记忆选择问题 |
large language model chain-of-thought |
|
|
| 22 |
OmniCharacter: Towards Immersive Role-Playing Agents with Seamless Speech-Language Personality Interaction |
提出OmniCharacter以解决角色扮演代理的语音与语言互动问题 |
large language model |
✅ |
|
| 23 |
SelfReflect: Can LLMs Communicate Their Internal Answer Distribution? |
提出SelfReflect以揭示大型语言模型的不确定性 |
large language model |
|
|
| 24 |
Does quantization affect models' performance on long-context tasks? |
系统评估量化对长上下文任务的影响 |
large language model |
|
|
| 25 |
Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline |
提出向量干预以解决多语言事实回忆不一致问题 |
large language model |
|
|
| 26 |
Gatsby Without the 'E': Crafting Lipograms with LLMs |
利用大型语言模型生成无'e'的文本 |
large language model |
|
|
| 27 |
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries |
提出Amulet框架以提升复杂多轮对话的LLM评估能力 |
large language model |
|
|
| 28 |
HAMburger: Accelerating LLM Inference via Token Smashing |
提出HAMburger以加速大语言模型推理效率 |
large language model |
|
|
| 29 |
Enhancing the Comprehensibility of Text Explanations via Unsupervised Concept Discovery |
提出ECO-Concept以解决文本解释的可理解性问题 |
large language model |
|
|
| 30 |
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models |
提出FLAME-MoE以解决现有MoE语言模型研究平台不足问题 |
large language model |
✅ |
|
| 31 |
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics |
提出基于logit的集成方法以评估LLM一致性 |
large language model |
|
|
| 32 |
How to Improve the Robustness of Closed-Source Models on NLI |
提出数据中心方法以提升闭源模型在NLI任务中的鲁棒性 |
large language model |
|
|
| 33 |
Reasoning Is Not All You Need: Examining LLMs for Multi-Turn Mental Health Conversations |
提出MedAgent框架以解决多轮心理健康对话评估问题 |
large language model |
|
|
| 34 |
Pangu Light: Weight Re-Initialization for Pruning and Accelerating LLMs |
提出Pangu Light以解决大语言模型压缩与加速问题 |
large language model |
|
|
| 35 |
UORA: Uniform Orthogonal Reinitialization Adaptation in Parameter-Efficient Fine-Tuning of Large Models |
提出UORA以实现大模型的高效微调 |
large language model |
|
|
| 36 |
Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach |
提出自适应贪婪二分搜索方法以解决LLMs的语义保持对抗攻击问题 |
large language model |
✅ |
|
| 37 |
TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent |
提出TrojanStego以解决语言模型隐私泄露问题 |
large language model |
|
|
| 38 |
Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi's Zibaldone |
提出新数据集以解决历史意大利文本命名实体识别问题 |
large language model |
|
|
| 39 |
Multi-Domain Explainability of Preferences |
提出一种自动化方法以实现多领域偏好解释 |
large language model |
|
|
| 40 |
Inference-time Alignment in Continuous Space |
提出简单能量适应算法以解决推理时对齐问题 |
large language model |
✅ |
|
| 41 |
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks |
提出概率上下文无关文法以解决LLM在自动推理中的不确定性问题 |
large language model |
|
|
| 42 |
Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking |
提出STeP方法以解决LLM代理训练中的性能瓶颈问题 |
large language model |
|
|
| 43 |
Emergent LLM behaviors are observationally equivalent to data leakage |
揭示大型语言模型行为与数据泄露的关系 |
large language model |
|
|
| 44 |
DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset |
提出DeepDialogue以解决多轮对话情感表达不足问题 |
multimodal |
|
|
| 45 |
APE: Selective Fine-tuning with Acceptance Criteria for Language Model Adaptation |
提出APE方法以解决语言模型适应性问题 |
large language model |
|
|
| 46 |
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages |
评估大型语言模型在印度语言音译中的表现 |
large language model |
|
|
| 47 |
Improving Multilingual Math Reasoning for African Languages |
提出多阶段适应策略以改善非洲语言的数学推理能力 |
large language model |
|
|
| 48 |
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective |
提出一种新框架以优化大语言模型的推理能力 |
large language model |
|
|
| 49 |
Exploring Consciousness in LLMs: A Systematic Survey of Theories, Implementations, and Frontier Risks |
系统性探讨大型语言模型中的意识理论与风险 |
large language model |
✅ |
|
| 50 |
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs |
提出MOLE框架以自动提取科学论文中的元数据 |
large language model |
✅ |
|
| 51 |
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs |
研究长上下文漏洞,揭示LLM安全机制的不足 |
large language model |
|
|
| 52 |
SGM: A Framework for Building Specification-Guided Moderation Filters |
提出SGM框架以解决内容审核中的对齐问题 |
large language model |
|
|