| 1 |
MMSkills: Towards Multimodal Skills for General Visual Agents |
MMSkills:面向通用视觉Agent的多模态技能框架,提升决策能力 |
multimodal visual grounding |
|
|
| 2 |
(How) Do Large Language Models Understand High-Level Message Sequence Charts? |
评估大型语言模型对高层消息序列图语义的理解能力 |
large language model |
|
|
| 3 |
Assessing the Creativity of Large Language Models: Testing, Limits, and New Frontiers |
评估大语言模型创造力:测试、局限与新方向,提出DRAT有效预测科学构思能力 |
large language model |
|
|
| 4 |
AI Harness Engineering: A Runtime Substrate for Foundation-Model Software Agents |
提出AI Harness Engineering,提升基础模型在软件工程中的可靠性 |
foundation model |
|
|
| 5 |
Compact Latent Manifold Translation: A Parameter-Efficient Foundation Model for Cross-Modal and Cross-Frequency Physiological Signal Synthesis |
提出紧凑潜在流形转换(CLMT),用于生理信号跨模态和跨频率合成,实现边缘设备部署。 |
foundation model |
|
|
| 6 |
Multimodal Hidden Markov Models for Persistent Emotional State Tracking |
提出基于多模态隐马尔可夫模型的持续情感状态追踪框架,用于理解对话情感弧。 |
multimodal |
|
|
| 7 |
Senses Wide Shut: A Representation-Action Gap in Omnimodal LLMs |
揭示全模态大语言模型中表征与行动之间的差距,并提出探针引导的logit调整方法。 |
large language model multimodal |
|
|
| 8 |
ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles |
ScioMind:基于认知的社会模拟框架,提升LLM驱动的多智能体系统行为真实性 |
large language model |
|
|
| 9 |
Identifying AI Web Scrapers Using Canary Tokens |
提出基于Canary Token的AI网络爬虫识别方法,解决LLM训练数据来源追踪难题 |
large language model |
|
|
| 10 |
The Readability Spectrum: Patterns, Issues, and Prompt Effects in LLM-Generated Code |
构建代码可读性评估模型,揭示LLM生成代码的可读性模式与影响因素 |
large language model |
|
|
| 11 |
It's not the Language Model, it's the Tool: Deterministic Mediation for Scientific Workflows |
提出确定性中介模式,利用语言模型编排确定性工具,解决科学工作流中结果不可复现问题。 |
foundation model |
|
|
| 12 |
Retrieval-Augmented Tutoring for Algorithm Tracing and Problem-Solving in AI Education |
提出KITE:一种基于检索增强的算法辅导系统,用于算法追踪和问题解决 |
multimodal |
|
|
| 13 |
When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction |
提出Goal Accessibility Ratio (GAR)诊断LLM在多轮交互中丢失上下文的机制,揭示注意力机制失效后的信息残留。 |
large language model |
|
|
| 14 |
Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents |
提出Persona Policies (PPol)以生成更真实的用户角色,提升LLM Agent的鲁棒性。 |
large language model |
|
|
| 15 |
Quantifying LLM Safety Degradation Under Repeated Attacks Using Survival Analysis |
提出基于生存分析的LLM安全性评估框架,量化重复攻击下的安全性降级 |
large language model |
|
|