| 1 |
A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes |
提出模块化多模态生成AI框架,用于生成城市建筑能源数据,合成住宅信息。 |
multimodal |
|
|
| 2 |
Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution |
Auras:通过解耦感知-生成和异步流水线执行加速具身智能体 |
embodied AI |
|
|
| 3 |
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering |
LoCoBench:用于评估长上下文LLM在复杂软件工程中性能的基准测试 |
large language model |
✅ |
|
| 4 |
Quality Assessment of Tabular Data using Large Language Models and Code Generation |
提出基于大语言模型和代码生成的表格数据质量评估框架 |
large language model |
|
|
| 5 |
Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks |
研究LLM对话Agent人格表达与用户匹配度对目标导向任务用户感知的影响 |
large language model |
|
|
| 6 |
LLMs as Agentic Cooperative Players in Multiplayer UNO |
提出基于LLM的合作型玩家在UNO游戏中的应用 |
large language model |
|
|
| 7 |
Towards a Common Framework for Autoformalization |
提出Autoformalization通用框架,促进AI系统在形式化推理领域的交叉融合。 |
large language model |
|
|
| 8 |
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs |
揭示LLM长程执行的衰减假象:通过隔离执行能力评估模型性能 |
large language model |
|
|
| 9 |
TORSO: Template-Oriented Reasoning Towards General Tasks |
提出TORSO:一种面向模板推理的通用任务解决框架,无需人工设计的few-shot示例。 |
large language model |
|
|
| 10 |
Towards Adaptive ML Benchmarks: Web-Agent-Driven Construction, Domain Expansion, and Metric Optimization |
提出TAM Bench:一个基于Web Agent驱动的自适应机器学习基准,用于评估LLM在端到端ML任务中的能力。 |
large language model |
|
|
| 11 |
LightAgent: Production-level Open-source Agentic AI Framework |
提出LightAgent:一个生产级开源Agentic AI框架,旨在简化多智能体系统部署。 |
large language model |
✅ |
|
| 12 |
Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search |
Jupiter:通过Notebook和推理时价值引导搜索增强LLM数据分析能力 |
large language model |
✅ |
|