| 1 |
What are Foundation Models Cooking in the Post-Soviet World? |
构建BORSch数据集,揭示大模型在后苏联文化食物知识上的局限性 |
foundation model multimodal |
✅ |
|
| 2 |
Scalable Best-of-N Selection for Large Language Models via Self-Certainty |
提出基于自确信度的可扩展Best-of-N选择方法,提升大语言模型推理性能。 |
large language model chain-of-thought |
✅ |
|
| 3 |
Assessing Agentic Large Language Models in Multilingual National Bias |
评估多语言大语言模型中的国家偏见,揭示跨语言推理偏差 |
large language model chain-of-thought |
✅ |
|
| 4 |
RankCoT: Refining Knowledge for Retrieval-Augmented Generation through Ranking Chain-of-Thoughts |
RankCoT:通过排序链式思考优化检索增强生成中的知识利用 |
large language model chain-of-thought |
✅ |
|
| 5 |
Can Multimodal LLMs Perform Time Series Anomaly Detection? |
提出VisualTimeAnomaly基准,评估多模态LLM在时间序列异常检测中的能力 |
large language model multimodal |
✅ |
|
| 6 |
Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs |
SafeCLIP:利用LVLM内生多模态对齐实现零样本有毒图像防御 |
multimodal |
|
|
| 7 |
EnDive: A Cross-Dialect Benchmark for Fairness and Performance in Large Language Models |
EnDive:一个用于评估大型语言模型在不同方言上公平性和性能的跨方言基准 |
large language model |
|
|
| 8 |
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data |
提出判别式微调(DFT)方法,无需奖励模型和人类偏好数据即可提升大语言模型性能。 |
large language model |
✅ |
|
| 9 |
Can Large Language Models Extract Customer Needs as well as Professional Analysts? |
利用微调大语言模型自动提取客户需求,性能媲美专业分析师 |
large language model |
|
|
| 10 |
From Small to Large Language Models: Revisiting the Federalist Papers |
重探《联邦党人文集》作者归属问题:对比小型与大型语言模型 |
large language model |
|
|
| 11 |
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models |
FactReasoner:一种用于评估大型语言模型生成长文本事实性的概率方法 |
large language model |
✅ |
|
| 12 |
Uncertainty Modeling in Multimodal Speech Analysis Across the Psychosis Spectrum |
提出一种不确定性感知的多模态语音分析模型,用于精神病谱系症状评估。 |
multimodal |
|
|
| 13 |
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models |
提出SECURA,通过Sigmoid增强的CUR分解LoRA,提升LLM微调性能并缓解灾难性遗忘。 |
large language model |
|
|
| 14 |
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts |
NusaAksara:印尼本土文字保护的多模态多语言基准数据集 |
multimodal |
|
|
| 15 |
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble |
首个LLM集成综述:系统性回顾集成方法、基准与应用,并展望未来方向 |
large language model |
✅ |
|
| 16 |
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference |
提出基于采样的推理方法,检测视觉大语言模型的知识边界,提升检索增强生成效率。 |
large language model |
✅ |
|
| 17 |
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models |
提出FACT-AUDIT,用于动态评估大型语言模型的事实核查能力 |
large language model |
|
|
| 18 |
Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation |
提出评估框架以识别隐性自杀意念并提供支持 |
large language model |
|
|
| 19 |
LR^2Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems |
提出LR^2Bench基准,用于评估大语言模型长链反思推理能力。 |
large language model |
|
|
| 20 |
Chain of Draft: Thinking Faster by Writing Less |
提出Chain of Draft,通过精简中间推理步骤提升LLM效率。 |
large language model chain-of-thought |
✅ |
|
| 21 |
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning |
提出TextGames基准,评估LLM在文本游戏中的推理能力 |
large language model instruction following |
|
|
| 22 |
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs |
针对版面丰富文档的信息抽取,提出基于LLM的设计空间探索方法LayIE-LLM |
large language model multimodal |
✅ |
|
| 23 |
Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments |
提出多语言Program-of-Thought框架,提升跨语言环境下LLM的推理能力 |
large language model chain-of-thought |
|
|
| 24 |
URO-Bench: Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models |
提出URO-Bench,用于端到端语音对话模型全面评测 |
large language model instruction following |
|
|
| 25 |
A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition |
提出CMAS协同多智能体框架,解决零样本命名实体识别中的上下文关联和示范利用问题。 |
large language model |
|
|
| 26 |
Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources |
对比单/双Prompt方法,利用LLM合成HR面试对话,提升对话质量 |
large language model |
|
|
| 27 |
Steered Generation via Gradient Descent on Sparse Features |
提出基于稀疏特征梯度下降的引导式生成方法,用于精确控制LLM的输出特性。 |
large language model |
|
|
| 28 |
FRIDA to the Rescue! Analyzing Synthetic Data Effectiveness in Object-Based Common Sense Reasoning for Disaster Response |
FRIDA:利用合成数据提升LLM在灾难响应中基于对象的常识推理能力 |
large language model |
|
|
| 29 |
Monte Carlo Temperature: a robust sampling strategy for LLM's uncertainty quantification methods |
提出蒙特卡洛温度采样(MCT),提升LLM不确定性量化方法在不同温度下的鲁棒性。 |
large language model |
|
|
| 30 |
Mapping of Subjective Accounts into Interpreted Clusters (MOSAIC): Topic Modelling and LLM applied to Stroboscopic Phenomenology |
提出MOSAIC方法,利用主题建模和LLM分析频闪现象学报告,揭示潜在体验模式。 |
large language model |
|
|
| 31 |
WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging |
WiCkeD:通过引入“以上皆非”选项,提升多项选择题基准测试的难度 |
chain-of-thought |
✅ |
|
| 32 |
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction |
提出RefuteBench 2.0以动态评估LLM对反驳指令的响应能力 |
large language model |
✅ |
|
| 33 |
Better Aligned with Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases |
揭示LLM在美国最高法院案件中的政治倾向:训练数据还是民意调查? |
large language model |
|
|
| 34 |
Beyond In-Distribution Success: Scaling Curves of CoT Granularity for Language Model Generalization |
研究表明,细粒度CoT数据能显著提升语言模型在复杂任务上的泛化能力。 |
chain-of-thought |
|
|
| 35 |
League: Leaderboard Generation on Demand |
提出Leaderboard Auto Generation (LAG)框架,自动生成AI研究领域排行榜。 |
large language model |
|
|
| 36 |
Grandes modelos de lenguaje: de la predicción de palabras a la comprensión? |
探讨大型语言模型:从单词预测到语言理解的演变与挑战 |
large language model |
|
|
| 37 |
Can LLMs Explain Themselves Counterfactually? |
研究表明大型语言模型在生成反事实解释方面存在局限性 |
large language model |
|
|
| 38 |
HyperG: Hypergraph-Enhanced LLMs for Structured Knowledge |
HyperG:一种超图增强的LLM框架,用于处理结构化知识 |
large language model |
|
|
| 39 |
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors |
提出AVATAR框架,利用对抗隐喻诱导大语言模型越狱 |
large language model |
|
|
| 40 |
CaseGen: A Benchmark for Multi-Stage Legal Case Documents Generation |
CaseGen:构建中文法律领域多阶段法律文书生成基准,促进法律AI发展 |
large language model |
✅ |
|
| 41 |
Constraining Sequential Model Editing with Editing Anchor Compression |
提出编辑锚压缩(EAC)框架,约束序列模型编辑中的参数漂移,提升通用能力。 |
large language model |
|
|