| 1 |
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts |
提出LogicVista基准,评估多模态LLM在视觉环境下的逻辑推理能力 |
large language model multimodal |
✅ |
|
| 2 |
Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns? |
提出BeyondX基准,揭示大语言模型在多未知数数学问题上的局限性,并提出Formulate-and-Solve策略。 |
large language model |
|
|
| 3 |
MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning |
MFE-ETP:用于具身任务规划的多模态基础模型综合评估基准 |
foundation model |
|
|
| 4 |
Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing |
利用大语言模型,自动化生成高性能计算单元测试 |
large language model |
|
|
| 5 |
Lucy: Think and Reason to Solve Text-to-SQL |
Lucy:结合LLM与自动推理,解决复杂数据库Text-to-SQL难题 |
large language model |
|
|
| 6 |
Are LLMs Correctly Integrated into Software Systems? |
研究揭示LLM与RAG集成软件系统中的缺陷模式并提出改进指南 |
large language model |
|
|
| 7 |
Achieving Tool Calling Functionality in LLMs Using Only Prompt Engineering Without Fine-Tuning |
仅用Prompt工程实现LLM的工具调用功能,无需微调 |
large language model |
|
|
| 8 |
Algorithmic Language Models with Neurally Compiled Libraries |
提出神经编译库增强算法语言模型,提升LLM的推理与规划能力 |
large language model |
|
|