| 1 |
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents |
Game-TARS:基于预训练通用多模态游戏Agent,实现跨域可扩展性 |
generalist agent foundation model multimodal |
|
|
| 2 |
Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier |
提出情感推理验证器,提升多模态LLM情感理解与解释一致性 |
large language model multimodal |
|
|
| 3 |
Beyond the Failures: Rethinking Foundation Models in Pathology |
病理学领域需重新思考基础模型,避免盲目套用自然图像方法 |
foundation model |
|
|
| 4 |
From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production |
IBM提出CUGA通用Agent,并应用于企业BPO人才招聘领域,验证其业务价值。 |
generalist agent |
✅ |
|
| 5 |
Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models |
利用大语言模型实现去中心化多智能体目标分配与路径规划 |
large language model |
|
|
| 6 |
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents |
ReCAP:用于大语言模型Agent的递归上下文感知推理与规划 |
large language model |
|
|
| 7 |
Alita-G: Self-Evolving Generative Agent for Agent Generation |
ALITA-G:一种自进化生成式Agent,通过生成、抽象和管理MCP工具实现Agent的领域专家化。 |
generalist agent large language model |
|
|
| 8 |
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence |
JanusCoder:构建用于代码智能的基础视觉-程序化接口 |
multimodal |
✅ |
|
| 9 |
PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs |
PRO:为开源LLM实现精确且鲁棒的文本水印,提升知识产权保护能力 |
large language model |
|
|
| 10 |
ReCode: Unify Plan and Action for Universal Granularity Control |
ReCode:通过统一计划与行动实现通用粒度控制 |
large language model |
✅ |
|
| 11 |
AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines |
AutoStreamPipe:利用LLM自动生成数据流处理管道,显著降低开发时间和错误率。 |
large language model |
|
|
| 12 |
Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards |
利用小型语言模型,结合过程与结果奖励,提升代码生成质量 |
large language model |
|
|
| 13 |
Evaluating the effectiveness of LLM-based interoperability |
评估基于LLM的互操作性有效性,实现系统自主运行时互操作。 |
large language model |
|
|
| 14 |
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges |
针对Agentic AI安全威胁,提出防御、评估框架及开放挑战分析 |
large language model |
|
|
| 15 |
Mutual Wanting in Human--AI Interaction: Empirical Evidence from Large-Scale Analysis of GPT Model Transitions |
提出互需对齐框架M-WAF,用于量化人机交互中用户期望与AI行为的匹配程度。 |
large language model |
|
|
| 16 |
A Survey of Data Agents: Emerging Paradigm or Overstated Hype? |
构建数据Agent分级体系,厘清自主程度,促进数据+AI生态发展。 |
large language model |
|
|
| 17 |
Policy-Aware Generative AI for Safe, Auditable Data Access Governance |
提出一种策略感知的生成式AI,用于安全、可审计的数据访问治理。 |
large language model |
|
|
| 18 |
Exploring Vulnerability in AI Industry |
提出AI脆弱性指数AIVI,量化评估AI产业上游价值链的系统性风险 |
foundation model |
|
|
| 19 |
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents |
提出QueryIPI,实现对编码Agent的查询无关型间接提示注入攻击 |
instruction following |
|
|
| 20 |
RefleXGen:The unexamined code is not worth using |
RefleXGen:通过自反思提升LLM代码生成的安全性 |
large language model |
|
|
| 21 |
Quantifying Document Impact in RAG-LLMs |
提出Influence Score (IS)以量化RAG中单个文档对生成结果的影响。 |
large language model |
|
|