| 1 |
Harnessing Large Language Models for Precision Querying and Retrieval-Augmented Knowledge Extraction in Clinical Data Science |
利用大型语言模型进行临床数据科学中的精确查询和检索增强知识提取 |
large language model |
|
|
| 2 |
Efficient Multimodal Planning Agent for Visual Question-Answering |
提出多模态规划Agent,高效解决视觉问答任务中的多阶段检索增强生成问题 |
multimodal |
|
|
| 3 |
AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios |
提出AgentIF-OneDay基准,评估通用AI Agent在日常场景下的任务级指令遵循能力 |
instruction following |
|
|
| 4 |
Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models |
提出条件发散联想任务以评估语言模型的创造力 |
large language model |
|
|
| 5 |
MuVaC: AVariational Causal Framework for Multimodal Sarcasm Understanding in Dialogues |
提出MuVaC:一个变分因果框架,用于对话中的多模态讽刺理解。 |
multimodal |
|
|
| 6 |
SpeechMapper: Speech-to-text Embedding Projector for LLMs |
SpeechMapper:一种高效的语音到文本嵌入投影方法,用于连接语音基础模型和LLM。 |
foundation model instruction following |
|
|
| 7 |
Unit-Based Agent for Semi-Cascaded Full-Duplex Dialogue Systems |
提出基于单元的Agent,用于半级联全双工对话系统 |
large language model multimodal |
✅ |
|
| 8 |
QueerGen: How LLMs Reflect Societal Norms on Gender and Sexuality in Sentence Completion Tasks |
QueerGen:揭示LLM在句子补全任务中对性别和性取向的社会偏见 |
large language model |
|
|
| 9 |
MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment |
MobileBench-OL:一个全面的中文移动GUI Agent真实环境评测基准 |
instruction following |
|
|
| 10 |
AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts |
AgentLongBench:通过环境交互评估长文本Agent的基准测试 |
large language model |
|
|
| 11 |
ShieldedCode: Learning Robust Representations for Virtual Machine Protected Code |
ShieldedCode:学习虚拟机保护代码的鲁棒表示,提升软件防御能力 |
large language model |
|
|
| 12 |
A Dialectic Pipeline for Improving LLM Robustness |
提出一种辩证pipeline,通过自对话提升LLM的鲁棒性和输出质量。 |
chain-of-thought |
|
|
| 13 |
Can We Improve Educational Diagram Generation with In-Context Examples? Not if a Hallucination Spoils the Bunch |
利用上下文示例改进教育图表生成?警惕幻觉问题 |
large language model |
|
|
| 14 |
Beyond Accuracy: A Cognitive Load Framework for Mapping the Capability Boundaries of Tool-use Agents |
提出基于认知负荷理论的工具使用Agent能力边界评估框架 |
large language model |
|
|