| 1 |
OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning |
OctoMed:通过数据配方实现医学多模态推理的最优性能 |
large language model multimodal |
|
|
| 2 |
TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM |
提出TIM-PRM,通过工具集成主动验证多模态推理,解决幻觉和逻辑不一致问题。 |
large language model multimodal |
|
|
| 3 |
Chunking Strategies for Multimodal AI Systems |
综述多模态AI系统中数据分块策略,为高效多模态系统设计提供技术基础。 |
multimodal |
|
|
| 4 |
Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study |
微调大型语言模型,用于尼日利亚皮钦语的自动抑郁症筛查:GENSCORE先导研究 |
large language model |
|
|
| 5 |
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability? |
研究训练激励如何影响思维链的可监控性,并提出新的监控能力评估方法。 |
chain-of-thought |
✅ |
|
| 6 |
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models |
提出基于MCTS的知识检索方法,提升LLM在对话中的推理能力。 |
large language model |
|
|
| 7 |
AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture |
提出AgriCoT:农业领域视觉-语言模型推理能力评测基准 |
chain-of-thought |
✅ |
|
| 8 |
Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation |
Asm2SrcEval:首个大规模汇编到源代码翻译的LLM评测基准 |
large language model |
|
|
| 9 |
Generating Verifiable Chain of Thoughts from Exection-Traces |
提出基于执行轨迹的可验证思维链生成方法,提升代码推理能力。 |
chain-of-thought |
|
|
| 10 |
SimClinician: A Multimodal Simulation Testbed for Reliable Psychologist AI Collaboration in Mental Health Diagnosis |
SimClinician:用于心理健康诊断中可靠的心理学家-AI协作的多模态仿真测试平台 |
multimodal |
|
|
| 11 |
Efficient Asynchronous Federated Evaluation with Strategy Similarity Awareness for Intent-Based Networking in Industrial Internet of Things |
提出FEIBN框架,通过策略相似感知的联邦学习提升工业物联网意图网络的效率。 |
large language model multimodal |
|
|
| 12 |
LegalWebAgent: Empowering Access to Justice via LLM-Based Web Agents |
LegalWebAgent:利用LLM驱动的Web Agent赋能司法服务 |
large language model multimodal |
|
|
| 13 |
Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems |
LoRAServe:一种工作负载感知的LoRA适配器动态部署与路由框架,解决异构LoRA服务中的性能倾斜问题。 |
large language model |
|
|
| 14 |
CodeFlowLM: Incremental Just-In-Time Defect Prediction with Pretrained Language Models and Exploratory Insights into Defect Localization |
CodeFlowLM:利用预训练语言模型进行增量式即时缺陷预测 |
large language model |
|
|
| 15 |
Writing in Symbiosis: Mapping Human Creative Agency in the AI Era |
通过分析人类写作风格演变,揭示人机共生时代下的创作模式 |
large language model |
|
|
| 16 |
Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities |
评估大型语言模型在真实和人工漏洞的单样本补丁修复能力 |
large language model |
|
|
| 17 |
Retrieval-Augmented Few-Shot Prompting Versus Fine-Tuning for Code Vulnerability Detection |
提出检索增强的少样本提示方法,用于代码漏洞检测,优于微调模型。 |
large language model |
|
|
| 18 |
Autonomous QA Agent: A Retrieval-Augmented Framework for Reliable Selenium Script Generation |
提出Autonomous QA Agent,利用RAG提升Selenium脚本生成的可靠性 |
large language model |
|
|
| 19 |
MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents |
MindPower:赋能VLM具身智能体进行心理理论推理 |
multimodal |
|
|
| 20 |
AgentShield: Make MAS more secure and efficient |
AgentShield:一种高效安全的分布式框架,用于保护基于LLM的多智能体系统 |
large language model |
|
|
| 21 |
InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents |
InsightEval:一个专家构建的基准,用于评估LLM驱动的数据Agent中的洞察发现能力 |
large language model |
|
|