| 1 |
GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning |
GRAFT:提出图表推理基准,用于评估LLM在视觉文本对齐和结构化指令跟随上的能力。 |
multimodal instruction following |
|
|
| 2 |
Foundation Models for Cross-Domain EEG Analysis Application: A Survey |
脑电分析领域首个模态导向的预训练模型综述,填补了研究体系的空白。 |
foundation model multimodal |
|
|
| 3 |
LLM4Sweat: A Trustworthy Large Language Model for Hyperhidrosis Support |
提出LLM4Sweat,一个用于多汗症支持的可信大型语言模型框架 |
large language model foundation model |
|
|
| 4 |
Flexible metadata harvesting for ecology using large language models |
提出基于LLM的元数据收集器,解决生态数据整合中元数据异构问题 |
large language model |
|
|
| 5 |
RETAIL: Towards Real-world Travel Planning for Large Language Models |
RETAIL:面向真实世界的大语言模型旅行规划数据集与主题引导多智能体框架 |
large language model |
|
|
| 6 |
Invisible Filters: Cultural Bias in Hiring Evaluations Using Large Language Models |
提出系统分析以解决大型语言模型招聘评估中的文化偏见问题 |
large language model |
|
|
| 7 |
"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries |
提出Geo-Visual Agents,通过分析地理空间图像回答视觉空间查询。 |
multimodal |
|
|
| 8 |
Language-Guided Tuning: Enhancing Numeric Optimization with Textual Feedback |
提出语言引导调优(LGT),利用大语言模型和文本反馈提升数值优化效果。 |
large language model |
|
|
| 9 |
Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs |
利用理论计算机科学,可扩展地合成形式化定理证明挑战 |
large language model |
|
|
| 10 |
From Bits to Boardrooms: A Cutting-Edge Multi-Agent LLM Framework for Business Excellence |
BusiAgent:面向企业决策的多智能体LLM框架,提升战略规划与协作效率 |
large language model |
|
|
| 11 |
Test-time Corpus Feedback: From Retrieval to RAG |
综述性论文:探索测试时语料反馈在检索增强生成(RAG)中的应用,弥合IR与NLP的差距。 |
large language model |
|
|
| 12 |
Multi-IaC-Eval: Benchmarking Cloud Infrastructure as Code Across Multiple Formats |
提出Multi-IaC-Eval基准,评估LLM在多IaC格式上的代码生成与修改能力 |
large language model |
|
|
| 13 |
ASIC-Agent: An Autonomous Multi-Agent System for ASIC Design with Benchmark Evaluation |
ASIC-Agent:用于ASIC设计的自主多智能体系统与基准评估 |
large language model |
|
|
| 14 |
Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making |
提出基于过程的评估框架,评估LLM在决策中模拟人类行为的保真度 |
large language model |
|
|
| 15 |
HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling |
HyperFlexis:面向多SLO服务和快速扩展的算法与系统联合设计 |
large language model |
|
|
| 16 |
Cybernaut: Towards Reliable Web Automation |
Cybernaut:面向企业级应用的可靠Web自动化框架 |
large language model |
|
|
| 17 |
Super-additive Cooperation in Language Model Agents |
提出基于语言模型智能体的超加性合作博弈框架,提升多智能体协作能力 |
large language model |
✅ |
|
| 18 |
IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents |
IPIGuard:一种基于工具依赖图的新型防御方法,用于抵御LLM Agent中的间接提示注入攻击 |
large language model |
|
|
| 19 |
Coarse-to-Fine Grounded Memory for LLM Agent Planning |
提出粗细粒度对齐记忆框架,提升LLM Agent在复杂规划任务中的适应性 |
large language model |
|
|
| 20 |
M-$LLM^3$REC: A Motivation-Aware User-Item Interaction Framework for Enhancing Recommendation Accuracy with LLMs |
提出M-$LLM^3$REC框架,利用大语言模型提取用户动机,提升推荐系统在冷启动场景下的准确性。 |
large language model |
|
|
| 21 |
R-ConstraintBench: Evaluating LLMs on NP-Complete Scheduling |
提出R-ConstraintBench以评估LLMs在NP完全调度问题上的表现 |
large language model |
|
|
| 22 |
PuzzleClone: An SMT-Powered Framework for Synthesizing Verifiable Data |
PuzzleClone:一个基于SMT的框架,用于合成可验证的数据,提升LLM推理能力。 |
large language model |
✅ |
|