| 1 |
Training-Free Cultural Alignment of Large Language Models via Persona Disagreement |
提出DISCA推理时对齐方法,无需微调即可实现大语言模型的跨文化价值对齐 |
large language model |
|
|
| 2 |
Can Language Models Analyze Data? Evaluating Large Language Models for Question Answering over Datasets |
评估大语言模型在数据集问答任务中的效能:直接推理与SQL生成的对比研究 |
large language model |
|
|
| 3 |
ANCHOR: Abductive Network Construction with Hierarchical Orchestration for Reliable Probability Inference in Large Language Models |
提出ANCHOR框架:通过分层编排的溯因网络构建,实现大语言模型中可靠的概率推理 |
large language model |
|
|
| 4 |
FERA: Uncertainty-Aware Federated Reasoning for Large Language Models |
提出FERA框架:一种面向大语言模型的无训练联邦推理方法,通过不确定性感知实现协同推理优化。 |
large language model |
|
|
| 5 |
Merlin: Deterministic Byte-Exact Deduplication for Lossless Context Optimization in Large Language Model Inference |
提出Merlin:一种基于确定性字节级去重的高吞吐上下文优化引擎,旨在提升大模型推理效率。 |
large language model |
|
|
| 6 |
To Redact, or not to Redact? A Local LLM Approach to Deliberative Process Privilege Classification |
提出基于本地化Qwen3.5模型与思维链提示的审议过程特权自动分类方法 |
large language model chain-of-thought |
|
|
| 7 |
GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction |
提出GLiNER-Relex统一框架,实现命名实体识别与关系抽取的零样本联合建模 |
large language model |
|
|
| 8 |
When Can Digital Personas Reliably Approximate Human Survey Findings? |
量化评估基于大语言模型的数字人格在社会调查中的可靠性与适用边界 |
large language model |
|
|
| 9 |
Intrinsic Guardrails: How Semantic Geometry of Personality Interacts with Emergent Misalignment in LLMs |
提出基于人格语义几何的内在护栏机制,有效抑制大模型微调中的涌现性对齐失效问题。 |
large language model |
|
|
| 10 |
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings |
量化法文文学文本的嵌入风格敏感度:评估大语言模型重写对作者风格特征的保留能力 |
large language model |
|
|
| 11 |
NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding |
提出NCO解码策略,通过在线模式匹配高效处理大语言模型中的多重负面约束 |
large language model |
✅ |
|
| 12 |
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation |
提出WildClawBench基准测试,旨在解决真实运行环境下长周期智能体评估难题 |
multimodal |
|
|
| 13 |
DGPO: Beyond Pairwise Preferences with Directional Consistent Groupwise Optimization |
提出方向性组级偏好优化(DGPO)框架,通过多候选比较提升大模型推理的一致性与多样性。 |
large language model |
|
|
| 14 |
RUBEN: Rule-Based Explanations for Retrieval-Augmented LLM Systems |
提出RUBEN交互式工具,通过规则挖掘实现检索增强生成(RAG)系统的可解释性与安全性评估。 |
large language model |
|
|
| 15 |
Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding |
提出ChartCF训练框架,通过反事实学习与多模态偏好优化提升图表理解的数据效率 |
multimodal |
|
|
| 16 |
Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis |
提出DPUA框架,通过不确定性对齐解决主观性分析中人类分歧被忽视的问题 |
large language model |
|
|
| 17 |
Not All Proofs Are Equal: Evaluating LLM Proof Quality Beyond Correctness |
提出ProofRank基准以量化评估大模型数学证明的质量,超越单纯的正确性评价。 |
large language model |
|
|
| 18 |
Toward Multi-Database Query Reasoning for Text2Cypher |
提出多数据库查询推理框架,解决Text2Cypher在跨源图数据场景下的局限性 |
large language model |
|
|
| 19 |
An Annotation Scheme and Classifier for Personal Facts in Dialogue |
提出一种扩展的个人事实标注方案与多头分类器,显著提升对话系统中的事实提取与结构化能力。 |
large language model |
|
|
| 20 |
Extending Confidence-Based Text2Cypher with Grammar and Schema Aware Filtering |
提出基于语法与模式感知的过滤框架,提升Text2Cypher生成的可靠性与执行质量 |
large language model |
|
|
| 21 |
The Impact of Editorial Intervention on Detecting Native Language Traces |
量化编辑干预对母语识别的影响:揭示非母语文本中深层语言特征的鲁棒性 |
large language model |
|
|
| 22 |
NyayaAI: An AI-Powered Legal Assistant Using Multi-Agent Architecture and Retrieval-Augmented Generation |
提出NyayaAI多智能体法律助手,通过RAG架构提升印度法律文档的检索与分析效率 |
large language model |
✅ |
|
| 23 |
Synthetic Pre-Pre-Training Improves Language Model Robustness to Noisy Pre-Training Data |
提出合成数据预预训练(PPT)方法,显著提升大语言模型对噪声预训练数据的鲁棒性 |
large language model |
✅ |
|
| 24 |
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution |
提出SkillRAE框架,通过基于技能的上下文编译优化检索增强执行(RAE) |
large language model |
|
|
| 25 |
Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework |
提出C-BPO框架,通过偏好校准的二元反馈实现大语言模型的个性化对齐 |
large language model |
|
|
| 26 |
Speech-based Psychological Crisis Assessment using LLMs |
提出基于大语言模型的语音心理危机评估框架,通过副语言注入与推理增强提升分类性能。 |
large language model |
|
|
| 27 |
Annotations Mitigate Post-Training Mode Collapse |
提出标注锚定训练(Annotation-Anchored Training)以缓解后训练中的语义模式坍缩问题 |
instruction following |
|
|
| 28 |
FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning |
提出FocuSFT,通过双层优化解决长文本微调中的注意力稀释问题 |
large language model |
✅ |
|
| 29 |
PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning |
提出PruneTIR推理时工具调用剪枝框架,以提升大语言模型工具集成推理的准确性与效率。 |
large language model |
|
|
| 30 |
Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions |
提出VALDI评估框架与VIVALDI审计机制,揭示并缓解大模型中的“伪审慎”现象 |
large language model |
|
|