| 1 |
MLLM-CL: Continual Learning for Multimodal Large Language Models |
提出MLLM-CL以解决多模态大语言模型的持续学习问题 |
large language model multimodal |
✅ |
|
| 2 |
MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models |
提出MMRefine以解决多模态大语言模型的错误修正问题 |
large language model multimodal |
✅ |
|
| 3 |
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification |
提出Safe框架以解决大语言模型数学推理中的幻觉问题 |
large language model chain-of-thought |
|
|
| 4 |
Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis |
评估大型语言模型在金融情感分析中的有效性 |
large language model chain-of-thought |
|
|
| 5 |
RELIC: Evaluating Compositional Instruction Following via Language Recognition |
提出RELIC框架以评估语言识别中的指令遵循能力 |
large language model instruction following |
|
|
| 6 |
Do Large Language Models Judge Error Severity Like Humans? |
比较人类与大型语言模型在错误严重性判断上的差异 |
large language model multimodal |
|
|
| 7 |
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data |
提出LESS框架以解决真实环境下语音模型的半监督学习挑战 |
large language model foundation model |
|
|
| 8 |
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models |
提出约束熵消除方法以解决大语言模型遗忘问题 |
large language model |
|
|
| 9 |
Does It Make Sense to Speak of Introspection in Large Language Models? |
探讨大型语言模型中的内省概念及其局限性 |
large language model |
|
|
| 10 |
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights |
利用大型语言模型生成多项选择题以解决教育评估问题 |
large language model |
|
|
| 11 |
Fine-Grained Interpretation of Political Opinions in Large Language Models |
提出四维政治学习框架以解决LLMs政治意见分析问题 |
large language model |
|
|
| 12 |
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models |
提出Qwen3嵌入以提升文本嵌入和重排序能力 |
foundation model |
|
|
| 13 |
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View |
提出SCOP以评估大型语言模型的理解过程 |
large language model |
|
|
| 14 |
MuSciClaims: Multimodal Scientific Claim Verification |
提出MuSciClaims以解决科学声明验证的多模态基准问题 |
multimodal |
|
|
| 15 |
Parking, Perception, and Retail: Street-Level Determinants of Community Vitality in Harbin |
提出图像基础框架以分析哈尔滨社区商业活力 |
large language model multimodal |
|
|
| 16 |
The NTNU System at the S&I Challenge 2025 SLA Open Track |
提出多模态融合系统以提升口语能力评估准确性 |
large language model multimodal |
|
|
| 17 |
MMSU: A Massive Multi-task Spoken Language Understanding and Reasoning Benchmark |
提出MMSU基准以解决多任务口语理解与推理问题 |
large language model multimodal |
✅ |
|
| 18 |
TaDA: Training-free recipe for Decoding with Adaptive KV Cache Compression and Mean-centering |
提出TaDA以解决KV缓存压缩中的稀疏异常处理问题 |
large language model chain-of-thought |
|
|
| 19 |
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems |
提出复合人工智能系统以解决独立模型的局限性 |
large language model multimodal |
|
|
| 20 |
Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation |
提出翻译自我修正方法以提升翻译质量 |
large language model chain-of-thought |
|
|
| 21 |
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development |
提出ComfyUI-Copilot以解决ComfyUI使用中的挑战 |
large language model |
✅ |
|
| 22 |
Why LLM Safety Guardrails Collapse After Fine-tuning: A Similarity Analysis Between Alignment and Fine-tuning Datasets |
通过相似性分析提出新方法以增强LLM安全性 |
large language model |
|
|
| 23 |
Search Arena: Analyzing Search-Augmented LLMs |
提出Search Arena以分析搜索增强型大语言模型的用户偏好 |
large language model |
✅ |
|
| 24 |
ProRefine: Inference-Time Prompt Refinement with Textual Feedback |
提出ProRefine以解决推理时提示优化问题 |
chain-of-thought |
|
|
| 25 |
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text |
发布Common Pile v0.1数据集以解决LLM训练中的版权问题 |
large language model |
|
|
| 26 |
Context Is Not Comprehension |
提出Verbose ListOps基准以评估语言模型的推理能力 |
large language model |
|
|
| 27 |
Natural Language Interaction with Databases on Edge Devices in the Internet of Battlefield Things |
提出自然语言交互框架以解决战场物联网中的数据处理问题 |
large language model |
|
|
| 28 |
Dynamic Context Tuning for Retrieval-Augmented Generation: Enhancing Multi-Turn Planning and Tool Adaptation |
提出动态上下文调优以解决多轮对话和工具适应问题 |
large language model |
|
|
| 29 |
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective |
通过机械解释方法分析大型语言模型中的偏见 |
large language model |
|
|
| 30 |
TALL -- A Trainable Architecture for Enhancing LLM Performance in Low-Resource Languages |
提出TALL以提升低资源语言的LLM性能 |
large language model |
|
|
| 31 |
From Struggle (06-2024) to Mastery (02-2025) LLMs Conquer Advanced Algorithm Exams and Pave the Way for Editorial Generation |
评估大型语言模型在高级算法考试中的表现与教育应用 |
large language model |
|
|
| 32 |
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback |
提出多语言LLM辅导模拟以提升数学反馈效果 |
large language model |
|
|
| 33 |
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study |
提出FineLogic框架以解决LLMs逻辑推理评估问题 |
large language model |
✅ |
|
| 34 |
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models |
提出REWIRE方法以解决预训练数据质量和数量不足问题 |
large language model |
✅ |
|
| 35 |
Detection Method for Prompt Injection by Integrating Pre-trained Model and Heuristic Feature Engineering |
提出DMPI-PMHFE以解决提示注入攻击检测问题 |
large language model |
|
|
| 36 |
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs |
提出输入依赖的软提示技术以提升大语言模型的微调效率 |
large language model |
|
|
| 37 |
OPeRA: A Dataset of Observation, Persona, Rationale, and Action for Evaluating LLMs on Human Online Shopping Behavior Simulation |
提出OPeRA数据集以解决LLMs模拟用户在线购物行为的挑战 |
large language model |
|
|
| 38 |
SoK: Are Watermarks in LLMs Ready for Deployment? |
提出水印系统化方法以解决LLMs部署中的知识产权风险 |
large language model |
|
|
| 39 |
Improving LLMs with a knowledge from databases |
提出基于增强关联规则的LLM知识改进方法 |
large language model |
|
|
| 40 |
Just a Scratch: Enhancing LLM Capabilities for Self-harm Detection through Intent Differentiation and Emoji Interpretation |
通过意图区分与表情符号解读提升自伤检测能力 |
large language model |
|
|
| 41 |
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers |
提出AR-Checker以解决LLMs的鲁棒性测试问题 |
large language model |
|
|
| 42 |
ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests |
提出ICPC-Eval以解决LLM在编程竞赛中的评估问题 |
large language model |
✅ |
|
| 43 |
Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies |
将通用依赖整合进预训练语言模型以提升跨语言任务性能 |
large language model |
|
|
| 44 |
Joint Evaluation of Answer and Reasoning Consistency for Hallucination Detection in Large Reasoning Models |
提出RACE框架以解决大型推理模型中的幻觉检测问题 |
large language model |
✅ |
|
| 45 |
From Handwriting to Feedback: Evaluating VLMs and LLMs for AI-Powered Assessment in Indonesian Classrooms |
评估VLM和LLM在印尼课堂AI评估中的应用 |
large language model |
|
|
| 46 |
Identifying Reliable Evaluation Metrics for Scientific Text Revision |
提出混合评估方法以解决科学文本修订评估问题 |
instruction following |
|
|
| 47 |
A MISMATCHED Benchmark for Scientific Natural Language Inference |
提出MISMATCHED基准以解决科学自然语言推理的领域偏差问题 |
large language model |
|
|
| 48 |
Selecting Demonstrations for Many-Shot In-Context Learning via Gradient Matching |
提出梯度匹配方法以优化多示例上下文学习的演示选择 |
large language model |
|
|
| 49 |
Are LLMs Stable Formal Logic Translators in Logical Reasoning Across Linguistically Diversified Texts? |
提出SoLT和MenTaL以解决LLM逻辑推理中的符号不一致问题 |
large language model |
✅ |
|