| 1 |
Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning |
提出SOMADHAN数据集以解决孟加拉数学文字问题 |
large language model chain-of-thought |
|
|
| 2 |
Evaluating and Steering Modality Preferences in Multimodal Large Language Model |
提出MC²基准评估多模态大语言模型中的模态偏好,并通过表征工程实现偏好操控。 |
large language model multimodal |
|
|
| 3 |
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities |
揭示指令特定神经元与专家:用于分析LLM指令遵循能力的框架 |
large language model instruction following |
|
|
| 4 |
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective |
提出MAMMQA多智能体框架,提升多模态问答的准确性和可解释性 |
large language model multimodal |
|
|
| 5 |
Explaining Large Language Models with gSMILE |
提出gSMILE框架,用于提升大型语言模型token级别可解释性 |
large language model |
|
|
| 6 |
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions |
LayerIF:利用影响函数估计大语言模型各层训练质量 |
large language model |
|
|
| 7 |
Test-Time Learning for Large Language Models |
提出TLM:一种面向大语言模型的测试时学习方法,提升领域知识适应能力。 |
large language model |
|
|
| 8 |
Rethinking the Outlier Distribution in Large Language Models: An In-depth Study |
深入研究大语言模型中的异常值分布以提升量化性能 |
large language model |
|
|
| 9 |
How does Misinformation Affect Large Language Model Behaviors and Preferences? |
提出MisBench基准,分析并提升大语言模型对虚假信息的辨别能力 |
large language model |
✅ |
|
| 10 |
RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models |
提出RelationalFactQA基准,评估LLM从参数知识中检索表格事实的能力 |
large language model |
|
|
| 11 |
Who Reasons in the Large Language Models? |
研究表明LLM的推理能力主要源于Transformer中的输出投影模块 |
large language model |
|
|
| 12 |
Multi-objective Large Language Model Alignment with Hierarchical Experts |
提出HoE:一种轻量级、参数高效的即插即用方法,用于多目标大语言模型对齐。 |
large language model |
|
|
| 13 |
Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models |
提出LLM-AT框架,无需训练自动选择LLM层级,优化成本与准确率。 |
large language model |
|
|
| 14 |
DenseLoRA: Dense Low-Rank Adaptation of Large Language Models |
DenseLoRA:通过密集低秩矩阵提升大语言模型参数效率与性能 |
large language model |
✅ |
|
| 15 |
DLP: Dynamic Layerwise Pruning in Large Language Models |
DLP:一种用于大语言模型的动态层级剪枝方法,提升高稀疏度下的性能。 |
large language model |
✅ |
|
| 16 |
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models |
提出CogniBench,用于评估大型语言模型在认知层面上的忠实性 |
large language model |
✅ |
|
| 17 |
From prosthetic memory to prosthetic denial: Auditing whether large language models are prone to mass atrocity denialism |
审核大型语言模型对大规模暴行否认的倾向,揭示其潜在的“人造否认”风险。 |
large language model |
|
|
| 18 |
Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives |
针对大语言模型,提出数据混合方法的全面综述与新视角,优化训练数据配比。 |
large language model |
|
|
| 19 |
DecisionFlow: Advancing Large Language Model as Principled Decision Maker |
DecisionFlow:提升大语言模型在决策场景中的理性决策能力 |
large language model |
✅ |
|
| 20 |
Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts |
提出融合LLM与传统ML集成的框架,用于从叙事文本中检测ADHD。 |
large language model |
|
|
| 21 |
Assessment of L2 Oral Proficiency using Speech Large Language Models |
利用语音大语言模型评估二语口语能力,显著提升评估性能与泛化性。 |
large language model |
|
|
| 22 |
RPM: Reasoning-Level Personalization for Black-Box Large Language Models |
RPM:面向黑盒大语言模型的推理级个性化框架 |
large language model |
|
|
| 23 |
Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models? |
通过增加上下文示例缓解大语言模型的不确定性 |
large language model |
|
|
| 24 |
Research Community Perspectives on "Intelligence" and Large Language Models |
调查研究人员对“智能”和大型语言模型的认知与期望 |
large language model |
|
|
| 25 |
On VLMs for Diverse Tasks in Multimodal Meme Classification |
提出结合视觉语言模型与语言模型的新方法,提升多模态Meme分类任务性能。 |
multimodal |
|
|
| 26 |
Automated Privacy Information Annotation in Large Language Model Interactions |
构建大规模隐私信息标注数据集,用于评估LLM交互中的隐私泄露风险。 |
large language model |
|
|
| 27 |
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models |
提出Steer-Bench基准,用于评估大型语言模型在群体特定规范下的可控性。 |
large language model |
|
|
| 28 |
SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis |
SV-TrustEval-C:评估大语言模型在C代码漏洞分析中的结构和语义推理能力 |
large language model |
|
|
| 29 |
Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation |
针对语音识别,提出声学感知数据增强方法,提升模型泛化能力 |
foundation model |
|
|
| 30 |
Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning |
提出Self-Route,通过能力评估自动切换推理模式,提升大语言模型推理效率。 |
large language model chain-of-thought |
|
|
| 31 |
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs |
提出AutoJudger,通过智能Agent驱动高效评估多模态大语言模型 |
large language model multimodal |
|
|
| 32 |
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction |
提出MM-VAP以增强人机交互中的预测轮流发言能力 |
multimodal |
|
|
| 33 |
Predicting Implicit Arguments in Procedural Video Instructions |
提出Implicit-VidSRL数据集,并用iSRL-Qwen2-VL模型提升视频指令中隐式语义角色预测。 |
multimodal |
|
|
| 34 |
LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model |
LLMPR:一种基于LLM驱动的迁移学习请愿排序模型,用于优化司法流程。 |
large language model |
|
|
| 35 |
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing |
R2R:通过小-大模型Token路由高效导航发散推理路径 |
large language model |
✅ |
|
| 36 |
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling |
揭示大语言模型事实性自我感知能力:内部表征、鲁棒性与规模效应 |
large language model |
|
|
| 37 |
Exploring the Hidden Capacity of LLMs for One-Step Text Generation |
揭示LLM单步文本生成潜力:仅用两个嵌入即可生成数百token |
large language model |
|
|
| 38 |
A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction |
提出轻量级多专家生成语言模型系统SLG,用于工程信息与知识抽取。 |
large language model |
|
|
| 39 |
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences |
SpecExtend:一种用于长序列推测解码的即插即用增强方法 |
large language model |
✅ |
|
| 40 |
CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs |
CodeMirage:一个用于检测生产级LLM生成的和释义源代码的多语言基准 |
large language model |
|
|
| 41 |
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning |
REAL-Prover:一种检索增强的Lean定理证明器,用于数学推理 |
large language model |
|
|
| 42 |
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies |
提出基于子句频率的LLM校准方法,提升Text-to-SQL解析的置信度评估。 |
large language model |
|
|
| 43 |
RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation |
RefTool:利用参考资料引导工具创建,增强模型推理能力 |
large language model |
|
|
| 44 |
Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science |
提出一种数据增强的LLM研究思路生成方法,提升社会科学研究可行性与质量。 |
large language model |
|
|
| 45 |
Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History |
提出评估框架,研究LLM在用户画像和对话历史中对社会人口因素的适应性 |
large language model |
|
|
| 46 |
Pretrained LLMs Learn Multiple Types of Uncertainty |
研究表明预训练LLM在未明确训练下已能捕捉多种不确定性 |
large language model |
|
|
| 47 |
BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge |
提出BLUCK:孟加拉语理解与文化知识的基准数据集 |
large language model |
|
|
| 48 |
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation |
提出MARS-Bench,用于评估LLM在体育赛事多轮对话场景下的鲁棒性 |
large language model |
|
|
| 49 |
LLM-Driven E-Commerce Marketing Content Optimization: Balancing Creativity and Conversion |
提出基于LLM的电商营销内容优化框架,平衡创意与转化率。 |
multimodal |
|
|
| 50 |
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties |
Trans-EnV框架评估LLM在不同英语变体下的语言鲁棒性 |
large language model |
✅ |
|
| 51 |
Calibrating LLM Confidence by Probing Perturbed Representation Stability |
CCPS:通过探测扰动表征稳定性校准大语言模型置信度 |
large language model |
|
|
| 52 |
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing |
揭示大语言模型知识探测的不一致性,强调鲁棒性探测框架的重要性 |
large language model |
|
|
| 53 |
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs |
MAKIEval:一个基于维基数据的多语言框架,用于评估LLM的文化意识。 |
large language model |
|
|
| 54 |
Are Language Models Consequentialist or Deontological Moral Reasoners? |
提出道德推理分类框架以分析语言模型的伦理判断 |
large language model |
✅ |
|
| 55 |
Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance |
研究LLM潜在语言一致性对下游任务性能的影响,发现并非始终必要。 |
large language model |
|
|
| 56 |
PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims |
提出PEDANTIC数据集,用于自动审查专利权利要求中的不确定性问题。 |
large language model |
|
|
| 57 |
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead |
综述非洲自然语言处理研究进展,分析现状并展望未来发展方向。 |
large language model |
|
|
| 58 |
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset |
rStar-Coder:构建大规模验证数据集,提升LLM在代码推理方面的能力 |
large language model |
✅ |
|
| 59 |
Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration? |
提出CogCalib以解决LLMs微调中的校准问题 |
large language model |
|
|
| 60 |
MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection |
MSA提出一种高质量弱标签与LLM集成验证方法,用于多语言幻觉检测。 |
large language model |
|
|
| 61 |
Concealment of Intent: A Game-Theoretic Analysis |
提出意图隐藏对抗提示攻击,并用博弈论分析LLM攻防策略 |
large language model |
|
|
| 62 |
CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation |
提出CHIMERA知识库,用于科学思想重组分析与科研灵感激发 |
large language model |
✅ |
|
| 63 |
Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration |
提出DART框架,通过可行性探索动态调整推理演示,提升小模型推理能力。 |
large language model |
|
|
| 64 |
Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration |
提出XpandA框架,通过多Agent协作和问题驱动,提升LLM长文本处理能力。 |
large language model |
|
|
| 65 |
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization |
POLAR:一个用于多语言、多元文化和多事件在线极化现象的基准数据集。 |
large language model |
|
|