| 1 |
GAPMAP: Mapping Scientific Knowledge Gaps in Biomedical Literature Using Large Language Models |
提出GAPMAP以识别生物医学文献中的知识缺口 |
large language model |
|
|
| 2 |
A Survey on Unlearning in Large Language Models |
针对大型语言模型,提出基于干预阶段分类的全面性卸载学习综述 |
large language model |
|
|
| 3 |
The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions |
大型语言模型生成的多用户讨论具有高度真实性,可用于模拟在线社区。 |
large language model |
|
|
| 4 |
FlowMM: Cross-Modal Information Flow Guided KV Cache Merging for Efficient Multimodal Context Inference |
FlowMM:跨模态信息流引导的KV缓存融合,提升多模态上下文推理效率 |
multimodal |
|
|
| 5 |
MCP4IFC: IFC-Based Building Design Using Large Language Models |
MCP4IFC:利用大型语言模型驱动的IFC建筑设计框架 |
large language model |
✅ |
|
| 6 |
SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation |
SymCode:一种基于可验证代码生成的神经符号数学推理方法 |
large language model chain-of-thought |
|
|
| 7 |
DiagramEval: Evaluating LLM-Generated Diagrams via Graphs |
DiagramEval:提出基于图结构的LLM生成图表评估方法 |
large language model multimodal |
✅ |
|
| 8 |
TextualVerifier: Verify TextGrad Step-by-Step |
提出TextualVerifier,为TextGrad提供基于LLM的文本推理验证框架 |
large language model chain-of-thought |
|
|
| 9 |
A Critical Study of Automatic Evaluation in Sign Language Translation |
针对手语翻译, критически 评估现有自动评估指标的局限性。 |
large language model multimodal |
|
|
| 10 |
Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning |
Parrot:一种训练流程,增强程序CoT和自然语言CoT的数学推理能力 |
large language model chain-of-thought |
|
|
| 11 |
Testing Cross-Lingual Text Comprehension In LLMs Using Next Sentence Prediction |
利用下句预测任务评估大语言模型在跨语言文本理解中的能力 |
large language model chain-of-thought |
|
|
| 12 |
GMTRouter: Personalized LLM Router over Multi-turn User Interactions |
GMTRouter:基于多轮用户交互的个性化LLM路由方法 |
large language model |
✅ |
|
| 13 |
LISTEN to Your Preferences: An LLM Framework for Multi-Objective Selection |
LISTEN:利用LLM进行多目标选择的框架,解决专家偏好形式化难题 |
large language model |
|
|
| 14 |
TOPol: Capturing and Explaining Multidimensional Semantic Polarity Fields and Vectors |
TOPol:提出一种捕捉和解释多维语义极性场和向量的半监督框架 |
large language model |
|
|
| 15 |
Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items? |
利用大型语言模型评估阅读理解题目的认知复杂度 |
large language model |
|
|
| 16 |
Revisiting Multilingual Data Mixtures in Language Model Pretraining |
研究多语言数据混合对预训练语言模型的影响,挑战多语言学习的固有认知。 |
large language model |
|
|
| 17 |
RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline |
提出RECAP,通过Agent协作从LLM中提取并验证版权数据记忆 |
large language model |
|
|
| 18 |
The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework |
提出SKeB框架,评估LLM在诱导性提示下的遗忘能力,揭示模型大小与遗忘效果的关联。 |
large language model |
|
|
| 19 |
Knowledge Graph Analysis of Legal Understanding and Violations in LLMs |
提出知识图谱增强的RAG方法,评估LLM在法律理解和违规行为方面的能力。 |
large language model |
|
|
| 20 |
Evaluating the Role of Verifiers in Test-Time Scaling for Legal Reasoning Tasks |
研究验证器在法律推理任务测试时缩放中的作用,提升大语言模型性能。 |
large language model |
|
|
| 21 |
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry |
针对信息不对称下的LLM智能体协作,提出通信与验证框架以提升任务完成度和可解释性。 |
large language model |
✅ |
|
| 22 |
TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation |
TwinVoice:通过LLM角色模拟构建数字孪生的多维度基准测试 |
large language model |
|
|
| 23 |
Depth and Autonomy: A Framework for Evaluating LLM Applications in Social Science Research |
提出基于深度和自主性的LLM应用评估框架,提升社科研究可靠性。 |
large language model |
|
|
| 24 |
RLMEval: Evaluating Research-Level Neural Theorem Proving |
RLMEval:提出用于评估研究级神经定理证明的新基准 |
large language model |
|
|
| 25 |
Implicature in Interaction: Understanding Implicature Improves Alignment in Human-LLM Interaction |
通过理解会话含义提升人机交互中LLM的对齐效果 |
large language model |
|
|
| 26 |
Serve Programs, Not Prompts |
提出Symphony系统,通过服务LLM推理程序提升LLM服务效率与灵活性 |
large language model |
|
|
| 27 |
BhashaBench V1: A Comprehensive Benchmark for the Quadrant of Indic Domains |
提出BhashaBench V1,用于评估LLM在印度特定领域的性能 |
large language model |
|
|
| 28 |
Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy |
利用引用频率作为训练数据冗余的代理,研究LLM在文献推荐中的幻觉问题。 |
large language model |
|
|
| 29 |
Monitoring Transformative Technological Convergence Through LLM-Extracted Semantic Entity Triple Graphs |
提出一种基于LLM抽取语义三元组图的科技融合监测方法。 |
large language model |
|
|
| 30 |
Not ready for the bench: LLM legal interpretation is unstable and out of step with human judgments |
LLM法律解释不稳定且与人类判断不一致,不适用于法律实践 |
large language model |
|
|
| 31 |
Beyond One-Size-Fits-All: Personalized Harmful Content Detection with In-Context Learning |
提出基于上下文学习的个性化有害内容检测框架,提升用户定制化和隐私保护。 |
foundation model |
|
|
| 32 |
ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation |
ProMediate:用于评估多方协商中主动代理的社会认知框架 |
large language model |
|
|
| 33 |
Ideology-Based LLMs for Content Moderation |
研究表明,基于意识形态的角色扮演会使LLM在内容审核中产生偏差。 |
large language model |
|
|
| 34 |
Model-Document Protocol for AI Search |
提出模型-文档协议(MDP)框架,提升LLM在AI搜索中的知识利用效率 |
large language model |
|
|