| 1 |
H2HTalk: Evaluating Large Language Models as Emotional Companion |
H2HTalk:构建情感陪伴大语言模型评测基准,解决心理支持评估难题 |
large language model |
|
|
| 2 |
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset |
提出BMMR:一个大规模双语多模态多学科推理数据集,用于评估和提升大型多模态模型。 |
multimodal |
|
|
| 3 |
Improving Social Determinants of Health Documentation in French EHRs Using Large Language Models |
利用大型语言模型提升法语电子病历中社会决定因素的文档记录完整性 |
large language model |
|
|
| 4 |
Graph Repairs with Large Language Models: An Empirical Study |
利用大型语言模型进行图数据修复:一项实证研究 |
large language model |
|
|
| 5 |
TACOS: Open Tagging and Comparative Scoring for Instruction Fine-Tuning Data Selection |
TACOS:通过开放标签和比较评分进行指令微调数据选择 |
large language model instruction following |
|
|
| 6 |
STRUCTSENSE: A Task-Agnostic Agentic Framework for Structured Information Extraction with Human-In-The-Loop Evaluation and Benchmarking |
提出StructSense框架,利用领域知识和人机协同提升LLM在结构化信息抽取中的性能。 |
large language model |
|
|
| 7 |
Recon, Answer, Verify: Agents in Search of Truth |
提出RAV框架,通过多Agent协作提升LLM在政治声明事实核查中的准确性 |
large language model |
|
|
| 8 |
Read Quietly, Think Aloud: Decoupling Comprehension and Reasoning in LLMs |
通过解耦理解与推理,提升LLM的文本处理能力 |
large language model |
|
|
| 9 |
MemOS: A Memory OS for AI System |
提出MemOS:面向AI系统的内存操作系统,解决LLM长期记忆管理难题。 |
large language model |
|
|
| 10 |
Can LLMs Play Ô Ăn Quan Game? A Study of Multi-Step Planning and Decision Making |
利用大型语言模型评估多步规划与决策能力:以越南传统棋类游戏Ô Ăn Quan为例 |
large language model |
|
|
| 11 |
Four Shades of Life Sciences: A Dataset for Disinformation Detection in the Life Sciences |
提出Four Shades of Life Sciences数据集,用于生命科学领域虚假信息检测。 |
large language model |
✅ |
|