| 1 |
CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models |
CHRONOBERG:构建时序语料库,提升大语言模型对语言演变和时间感知的理解 |
large language model foundation model |
✅ |
|
| 2 |
Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data |
提出复合推理(CR)方法,提升大语言模型在少样本下的复杂问题求解能力 |
large language model chain-of-thought |
|
|
| 3 |
R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning |
提出R-Capsule框架以提高大语言模型推理效率 |
large language model chain-of-thought |
|
|
| 4 |
Why Chain of Thought Fails in Clinical Text Understanding |
大规模实验揭示思维链(CoT)提示在临床文本理解中失效的现象与原因 |
large language model chain-of-thought |
|
|
| 5 |
Large language models management of medications: three performance analyses |
评估大型语言模型在药物管理任务中的表现,揭示其在药物推荐方面的局限性 |
large language model |
|
|
| 6 |
Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models |
评估论证型大语言模型中不确定性量化方法的有效性 |
large language model |
|
|
| 7 |
Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning |
评估大语言模型在多语言法律推理中的局限性 |
large language model |
|
|
| 8 |
Detecting (Un)answerability in Large Language Models with Linear Directions |
利用线性方向检测大型语言模型在抽取式问答中的不可回答性 |
large language model |
|
|
| 9 |
Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models |
利用大型语言模型进行风力涡轮机维护日志的探索性语义可靠性分析 |
large language model |
|
|
| 10 |
The Outputs of Large Language Models are Meaningless |
论证大型语言模型输出无意义,挑战现有语义理解 |
large language model |
|
|
| 11 |
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement |
提出MACC框架,通过多轮细化自适应压缩CoT,提升推理效率与准确率。 |
chain-of-thought |
✅ |
|
| 12 |
FoodSEM: Large Language Model Specialized in Food Named-Entity Linking |
FoodSEM:针对食品命名实体链接的专用大型语言模型 |
large language model |
|
|
| 13 |
Evaluating Open-Source Large Language Models for Technical Telecom Question Answering |
评估开源大语言模型在电信技术问答中的性能 |
large language model |
|
|
| 14 |
Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration |
提出ThaiFACTUAL框架,解决泰语政治立场检测中大语言模型的偏见问题 |
large language model |
|
|
| 15 |
SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models |
提出SBFA:单比特翻转攻击破解大语言模型,揭示严重安全隐患 |
large language model |
|
|
| 16 |
Quantifying the Impact of Structured Output Format on Large Language Models through Causal Inference |
利用因果推断量化结构化输出格式对大语言模型的影响 |
large language model |
|
|
| 17 |
ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning |
提出ADAM框架,用于评估和提升LLM在人物传记推理中的能力 |
large language model multimodal |
|
|
| 18 |
VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing |
提出VoiceAssistant-Eval以评估多模态AI助手的能力 |
large language model multimodal |
✅ |
|
| 19 |
RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media |
RedNote-Vibe:一个用于捕捉社交媒体中AI生成文本时序动态的数据集 |
large language model TAMP |
✅ |
|
| 20 |
Human Mobility Datasets Enriched With Contextual and Social Dimensions |
提出一种结合上下文、社交维度和LLM生成数据的城市人类移动数据集构建框架。 |
large language model multimodal |
|
|
| 21 |
AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts |
提出AI Brown和AI Koditex:可与传统语料库媲美的LLM生成英文和捷克文语料库 |
large language model |
|
|
| 22 |
Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs |
提示工程能否使LLM回溯时间?评估提示知识截断的有效性 |
large language model |
✅ |
|
| 23 |
The Bias is in the Details: An Assessment of Cognitive Bias in LLMs |
评估LLM认知偏差:揭示模型在决策中存在的系统性偏差 |
large language model |
|
|
| 24 |
Towards Generalizable Implicit In-Context Learning with Attention Routing |
提出In-Context Routing (ICR)方法,提升隐式上下文学习的泛化能力。 |
large language model |
|
|
| 25 |
ArabJobs: A Multinational Corpus of Arabic Job Ads |
ArabJobs:一个多国阿拉伯语招聘广告语料库,用于公平的阿拉伯语NLP和劳动力市场研究。 |
large language model |
✅ |
|
| 26 |
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models |
InfiR2:面向推理增强语言模型的全面FP8训练方案 |
large language model |
|
|
| 27 |
We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong |
提出自适应多分支引导(AMBS)框架,提升LLM在HHH目标上的对齐效果 |
large language model |
|
|
| 28 |
Representing LLMs in Prompt Semantic Task Space |
提出一种免训练方法,将LLM表示为提示语义任务空间中的线性算子,用于模型选择。 |
large language model |
|
|
| 29 |
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory |
提出FormalML基准,评估LLM在机器学习理论中形式化子目标补全能力 |
large language model |
|
|
| 30 |
What Is The Political Content in LLMs' Pre- and Post-Training Data? |
分析LLM训练数据中的政治倾向,揭示模型偏见与数据内容的相关性 |
large language model |
|
|
| 31 |
The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling |
构建InviTE语料库,用于计算建模都铎英语文本中的宗教谩骂 |
large language model |
|
|
| 32 |
Transformers Can Learn Connectivity in Some Graphs but Not Others |
研究表明Transformer在网格状图上学习连通性,但在复杂图上存在困难 |
large language model |
|
|
| 33 |
Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs |
提出SSKG-LLM,通过结构化图编码和自适应空间对齐缓解LLM幻觉问题 |
large language model |
✅ |
|
| 34 |
Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance |
提出Safety Compliance框架,通过法律合规视角提升LLM安全性 |
large language model |
|
|
| 35 |
FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction |
FLEXI:首个全双工人机语音交互评测基准,关注紧急场景中断 |
large language model |
|
|
| 36 |
FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding |
FeatBench:用于评估Vibe Coding中特征实现的编码智能体性能的基准测试。 |
large language model |
|
|
| 37 |
Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation |
提出递归主题划分(RTP),利用LLM构建可解释主题树,实现文本聚类和可控生成。 |
large language model |
|
|
| 38 |
Library Hallucinations in LLMs: Risk Analysis Grounded in Developer Queries |
系统性分析LLM代码生成中因开发者查询引起的库幻觉风险 |
large language model |
|
|
| 39 |
Mixture of Detectors: A Compact View of Machine-Generated Text Detection |
提出混合检测器框架BMAS English,用于全面评估机器生成文本检测 |
large language model |
|
|
| 40 |
Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM |
提出Uni-LAP框架,通过监督分类模型与LLM紧密协作,实现通用法律条文预测。 |
large language model |
|
|
| 41 |
Think Right, Not More: Test-Time Scaling for Numerical Claim Verification |
提出VERIFIERFC,通过测试时缩放提升LLM在数值声明验证中的性能 |
large language model |
✅ |
|
| 42 |
COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning |
提出CoSpaDi,通过校准引导的稀疏字典学习压缩LLM,提升压缩性能。 |
large language model |
|
|
| 43 |
Fine-tuning Done Right in Model Editing |
重塑微调在模型编辑中的地位:提出LocFT-BF大幅超越现有方法 |
large language model |
|
|
| 44 |
Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias |
提出语音延续任务,用于探测语音模型中基于声音的偏见 |
foundation model |
|
|
| 45 |
Black-Box Hallucination Detection via Consistency Under the Uncertain Expression |
提出基于不确定性表达一致性的黑盒方法,用于检测大型语言模型中的幻觉问题。 |
large language model |
|
|
| 46 |
MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation |
提出MotivGraph-SoIQ以解决LLM创意过程中的偏见与基础不足问题 |
large language model |
|
|
| 47 |
SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation |
SimulSense:通过感知驱动的口译实现高效同声语音翻译 |
large language model |
|
|
| 48 |
A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs |
提出土耳其语引文意图分类数据集和基于DSPy优化的LLM分类框架 |
large language model |
|
|
| 49 |
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans |
AgentPack:一个由智能体与人类共同编写的代码变更数据集,用于提升代码编辑模型性能。 |
large language model |
|
|
| 50 |
LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals |
LUMINA:利用上下文-知识信号检测RAG系统中的幻觉问题 |
large language model |
✅ |
|
| 51 |
Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations |
LoRAN:通过结构化非线性变换增强低秩自适应能力 |
large language model |
|
|
| 52 |
What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness |
利用LLM Agent模拟提升应急预案有效性:一个迭代设计案例研究 |
large language model |
|
|
| 53 |
Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models |
提出TRACE框架,通过多智能体模型分解任务,提升共情回复生成质量。 |
large language model |
|
|
| 54 |
Collaborative and Proactive Management of Task-Oriented Conversations |
提出一种基于信息状态的协作式任务导向对话管理模型,提升对话成功率。 |
large language model |
|
|
| 55 |
Can LLMs Solve and Generate Linguistic Olympiad Puzzles? |
利用大型语言模型解决并生成语言学奥林匹克竞赛题 |
large language model |
|
|