| 1 |
MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification |
提出MM-Verify,通过思维链验证增强多模态推理能力,超越GPT-4o。 |
large language model multimodal chain-of-thought |
|
|
| 2 |
ProMedTS: A Self-Supervised, Prompt-Guided Multimodal Approach for Integrating Medical Text and Time Series |
ProMedTS:一种自监督提示引导的多模态方法,用于整合医学文本和时间序列数据 |
large language model multimodal |
|
|
| 3 |
Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method |
Time-LlaMA:一种高效参数适配方法,用于大型语言模型在时间序列建模中的应用 |
large language model foundation model |
|
|
| 4 |
Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition |
提出LDDU框架,通过解耦潜在情绪分布建模不确定性,提升多模态情感识别性能。 |
multimodal |
✅ |
|
| 5 |
Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region |
揭示大语言模型安全机制脆弱性:模板锚定导致的安全对齐易受攻击 |
large language model |
|
|
| 6 |
Zero-Shot Commonsense Validation and Reasoning with Large Language Models: An Evaluation on SemEval-2020 Task 4 Dataset |
利用大型语言模型进行零样本常识验证与推理:SemEval-2020 Task 4数据集评估 |
large language model |
|
|
| 7 |
REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models |
提出REFIND框架,通过检索增强和上下文敏感度量化,检测大语言模型中的幻觉。 |
large language model |
✅ |
|
| 8 |
Complex Ontology Matching with Large Language Model Embeddings |
提出一种融合大型语言模型嵌入的复杂本体匹配方法,显著提升了匹配的表达能力。 |
large language model |
|
|
| 9 |
What are Models Thinking about? Understanding Large Language Model Hallucinations "Psychology" through Model Inner State Analysis |
通过分析LLM内部状态理解幻觉“心理”,实现无需外部信息的幻觉检测。 |
large language model |
|
|
| 10 |
Batayan: A Filipino NLP benchmark for evaluating Large Language Models |
Batayan:构建菲律宾语NLP基准,评估大型语言模型在低资源语言上的性能 |
large language model |
|
|
| 11 |
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking |
GIMMICK:构建全球包容的多模态多任务文化知识基准评测体系 |
multimodal |
|
|
| 12 |
PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models |
提出PRIV-QA:一种面向云端大语言模型的隐私保护问答框架 |
large language model |
✅ |
|
| 13 |
Detecting Linguistic Bias in Government Documents Using Large language Models |
提出DGDB数据集并微调BERT模型,用于检测政府文件中存在的语言偏见。 |
large language model |
|
|
| 14 |
Towards Lightweight, Adaptive and Attribute-Aware Multi-Aspect Controllable Text Generation with Large Language Models |
提出轻量级、自适应和属性感知的多方面可控文本生成框架,提升大语言模型控制能力。 |
large language model |
|
|
| 15 |
Event Segmentation Applications in Large Language Model Enabled Automated Recall Assessments |
利用大语言模型实现自动化事件分割与记忆评估,提升认知研究效率。 |
large language model |
|
|
| 16 |
MMTEB: Massive Multilingual Text Embedding Benchmark |
提出大规模多语言文本嵌入基准MMTEB,用于全面评估文本嵌入模型。 |
large language model instruction following |
|
|
| 17 |
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment |
OpenSearch-SQL:通过动态Few-shot和一致性对齐增强Text-to-SQL性能 |
large language model instruction following |
|
|
| 18 |
STaR-SQL: Self-Taught Reasoner for Text-to-SQL |
提出STaR-SQL,通过自学习推理提升Text-to-SQL任务性能 |
large language model chain-of-thought |
|
|
| 19 |
Transferring Textual Preferences to Vision-Language Understanding through Model Merging |
通过模型融合将文本偏好迁移到视觉-语言理解模型 |
multimodal |
|
|
| 20 |
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression |
RocketKV:通过两阶段KV缓存压缩加速长文本LLM推理。 |
large language model |
✅ |
|
| 21 |
DataSciBench: An LLM Agent Benchmark for Data Science |
DataSciBench:一个用于评估LLM在数据科学任务中能力的综合性基准测试。 |
large language model |
✅ |
|
| 22 |
The Canary's Echo: Auditing Privacy Risks of LLM-Generated Synthetic Text |
针对LLM生成合成文本的隐私风险,提出新型数据驱动的成员推理攻击方法。 |
large language model |
|
|
| 23 |
VITAL: A New Dataset for Benchmarking Pluralistic Alignment in Healthcare |
VITAL:针对医疗领域多元化对齐的基准数据集 |
large language model |
|
|
| 24 |
MaskPrune: Mask-based LLM Pruning for Layer-wise Uniform Structures |
提出MaskPrune,通过掩码学习实现LLM逐层均匀结构化剪枝 |
large language model |
|
|
| 25 |
PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference |
提出PLDR-LLM以替代深度神经网络进行推理 |
large language model |
|
|
| 26 |
ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails |
ThinkGuard:通过审慎的慢思考实现更可靠的大语言模型安全防护 |
large language model |
|
|
| 27 |
A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language? |
研究表明大型语言模型在多大程度上能捕捉语言的分形复杂性,并揭示了影响因素。 |
large language model |
|
|
| 28 |
How Do LLMs Perform Two-Hop Reasoning in Context? |
揭示LLM上下文学习中双跳推理的机制:从随机猜测到结构化查询 |
large language model |
|
|
| 29 |
SIFT: Grounding LLM Reasoning in Contexts via Stickers |
SIFT:通过Sticker机制增强LLM在上下文中的推理能力 |
large language model |
✅ |
|
| 30 |
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding |
提出C2T方法,利用分类器动态构建token树,提升推理解码效率。 |
large language model |
|
|
| 31 |
Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference |
提出ActQKV以解决长上下文LLMs推理效率问题 |
large language model |
|
|
| 32 |
Towards Geo-Culturally Grounded LLM Generations |
检索增强提升LLM文化感知能力,但需警惕刻板印象风险 |
large language model |
|
|
| 33 |
EvoP: Robust LLM Inference via Evolutionary Pruning |
EvoP:通过演化剪枝实现鲁棒的大语言模型推理 |
large language model |
|
|
| 34 |
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation |
TreeCut:用于评估LLM幻觉的合成不可解数学应用题数据集 |
large language model |
✅ |
|
| 35 |
The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? |
Crescent:一种无需外部监督信号的LLM自提升数学推理能力框架 |
large language model |
|
|
| 36 |
Benchmarking LLMs for Political Science: A United Nations Perspective |
提出联合国基准以评估大语言模型在政治决策中的应用 |
large language model |
✅ |
|
| 37 |
Towards Context-Robust LLMs: A Gated Representation Fine-tuning Approach |
提出Grft门控微调方法,提升LLM在检索增强生成中对上下文的鲁棒性 |
large language model |
|
|
| 38 |
Retrieving Versus Understanding Extractive Evidence in Few-Shot Learning |
分析少样本学习中LLM抽取式证据检索与理解的关联性,揭示模型预测误差与证据检索误差的关系。 |
large language model |
|
|
| 39 |
Are Rules Meant to be Broken? Understanding Multilingual Moral Reasoning as a Computational Pipeline with UniMoral |
UniMoral:构建多语言道德推理计算流程,弥合文化差异下的道德理解鸿沟 |
large language model |
|
|
| 40 |
Beyond Single Frames: Can LMMs Comprehend Temporal and Contextual Narratives in Image Sequences? |
提出StripCipher基准,评估LMMs在图像序列中的时序和上下文理解能力 |
multimodal |
|
|
| 41 |
TESS 2: A Large-Scale Generalist Diffusion Language Model |
TESS 2:一种大规模通用扩散语言模型,性能媲美自回归模型。 |
instruction following |
✅ |
|
| 42 |
Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking |
提出Inner Thinking Transformer,通过动态深度缩放提升LLM在关键token上的推理能力。 |
large language model |
|
|
| 43 |
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions |
MemoryCode:评估LLM在多轮编码交互中的长期记忆能力 |
large language model |
|
|
| 44 |
SCALAR: Scientific Citation-based Live Assessment of Long-context Academic Reasoning |
提出SCALAR:基于科学引用的长文本学术推理实时评估基准 |
large language model |
|
|
| 45 |
Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding |
提出In-Context Contrastive Decoding,增强ICL中输入-标签映射,提升NLU任务性能。 |
large language model |
✅ |
|
| 46 |
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora |
提出一种无需训练的文本信息价值评估方法,助力大语言模型高效数据集成 |
large language model |
|
|
| 47 |
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh |
针对低资源语言哈萨克语,提出基于政府和文化数据的指令调优方法。 |
instruction following |
|
|
| 48 |
D.Va: Validate Your Demonstration First Before You Use It |
提出D.Va:一种基于验证的ICL示例选择方法,提升LLM在NLU/NLG任务上的性能。 |
large language model |
|
|
| 49 |
Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts |
Qorgau:评估LLM在哈萨克-俄语双语环境下的安全性 |
large language model |
|
|
| 50 |
BeamLoRA: Beam-Constraint Low-Rank Adaptation |
BeamLoRA:提出一种基于Beam搜索的低秩自适应方法,提升大语言模型微调精度。 |
large language model |
|
|
| 51 |
Shall Your Data Strategy Work? Perform a Swift Study |
提出一种快速评估指令微调数据有效性的方法,无需模型重训练。 |
chain-of-thought |
|
|
| 52 |
Detecting LLM Fact-conflicting Hallucinations Enhanced by Temporal-logic-based Reasoning |
Drowzee:利用时序逻辑增强LLM事实冲突幻觉检测 |
large language model |
|
|
| 53 |
Prompting a Weighting Mechanism into LLM-as-a-Judge in Two-Step: A Case Study |
提出一种基于Prompt的权重机制,提升LLM作为评判者的NLG任务评估能力 |
large language model |
|
|
| 54 |
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval |
提出PGMR框架,利用后生成记忆检索减少LLM在SPARQL查询生成中的幻觉问题 |
large language model |
|
|
| 55 |
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications |
FineEdit:通过微调LLM实现精准和目标明确的文本编辑 |
large language model |
✅ |
|
| 56 |
Craw4LLM: Efficient Web Crawling for LLM Pretraining |
Craw4LLM:面向LLM预训练的高效网页爬取方法,显著降低爬取浪费。 |
large language model |
✅ |
|
| 57 |
GneissWeb: Preparing High Quality Data for LLMs at Scale |
GneissWeb:构建高质量大规模LLM训练数据集,提升模型泛化能力 |
large language model |
|
|