| 1 |
Text Knows What, Tables Know When: Clinical Timeline Reconstruction via Retrieval-Augmented Multimodal Alignment |
提出检索增强的多模态对齐框架,用于临床时间线重建,提升时间戳精度。 |
large language model multimodal |
|
|
| 2 |
AI Knows When It's Being Watched: Functional Strategic Action and Contextual Register Modulation in Large Language Models |
大型语言模型在社会观察下表现出策略性行为和语境适应性,揭示其社会感知能力 |
large language model |
|
|
| 3 |
Non-linear Interventions on Large Language Models |
提出非线性干预方法,突破线性干预局限,提升大语言模型控制能力 |
large language model |
|
|
| 4 |
From Scenes to Elements: Multi-Granularity Evidence Retrieval for Verifiable Multimodal RAG |
提出GranuRAG,通过多粒度证据检索解决可验证多模态RAG中细粒度查询问题。 |
multimodal |
|
|
| 5 |
Tokenizer Fertility and Zero-Shot Performance of Foundation Models on Ukrainian Legal Text: A Comparative Study |
对比研究:乌克兰语法律文本上大模型分词器效率与零样本性能 |
foundation model |
|
|
| 6 |
Dimension-Level Intent Fidelity Evaluation for Large Language Models: Evidence from Structured Prompt Ablation |
提出维度级意图保真度评估框架,用于评估大语言模型在用户特定任务中的表现。 |
large language model |
|
|
| 7 |
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining |
提出Video2GUI框架,从互联网视频合成大规模GUI交互轨迹,用于GUI Agent预训练。 |
large language model multimodal |
|
|
| 8 |
MeMo: Memory as a Model |
MeMo:提出一种基于记忆的模块化框架,用于增强LLM的知识更新能力。 |
large language model |
|
|
| 9 |
Improving Multi-turn Dialogue Consistency with Self-Recall Thinking |
提出自回忆思考(SRT)框架,提升多轮对话一致性并降低延迟。 |
large language model |
|
|
| 10 |
Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation |
提出Graphs of Research以解决研究创意生成中的引用关系问题 |
large language model |
|
|
| 11 |
Is Grep All You Need? How Agent Harnesses Reshape Agentic Search |
研究Agentic Search中Grep与向量检索的性能差异及影响因素 |
large language model |
|
|
| 12 |
MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs |
MetaBackdoor:利用LLM中的位置编码作为后门攻击面 |
large language model |
|
|
| 13 |
EndPrompt: Efficient Long-Context Extension via Terminal Anchoring |
EndPrompt:通过末端锚定的高效长文本扩展方法 |
large language model |
✅ |
|
| 14 |
Mining Subscenario Refactoring Opportunities in Behaviour-Driven Software Test Suites: ML Classifiers and LLM-Judge Baselines |
提出一种基于机器学习和LLM的BDD测试套件子场景重构机会挖掘方法 |
large language model |
|
|
| 15 |
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations |
GroupMemBench:用于评估LLM Agent在多方对话中记忆能力的基准测试。 |
large language model |
|
|
| 16 |
Retrieval-Augmented Large Language Models for Schema-Constrained Clinical Information Extraction |
提出检索增强的LLM用于模式约束的临床信息抽取,提升护士-患者对话记录结构化效率。 |
large language model |
|
|
| 17 |
FINESSE-Bench: A Hierarchical Benchmark Suite for Financial Domain Knowledge and Technical Analysis in Large Language Models |
FINESSE-Bench:用于评估大语言模型金融领域知识和技术分析能力的分层基准套件 |
large language model |
|
|
| 18 |
Reasoning Models Don't Just Think Longer, They Move Differently |
通过轨迹几何分析,揭示推理模型在不同难度问题上的行为差异 |
chain-of-thought |
|
|
| 19 |
Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance |
分析六种LLM架构在认知任务中的神经激活模式,揭示模型特性。 |
large language model |
|
|
| 20 |
Capability Conditioned Scaffolding for Professional Human LLM Collaboration |
提出能力条件支架框架,提升专业领域人机协作可靠性 |
large language model |
|
|
| 21 |
Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models |
研究人类在词汇约束下的语言生成,对比贪婪与全局优化模型 |
large language model |
|
|
| 22 |
Fluency and Faithfulness in Human and Machine Literary Translation |
研究表明文学翻译中流畅性与忠实性存在权衡,大型语言模型亦然。 |
large language model |
|
|