| 1 |
SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models |
SilentDrift:利用动作分块对VLA模型进行隐蔽后门攻击 |
vision-language-action VLA |
|
|
| 2 |
Diffusion Large Language Models for Black-Box Optimization |
提出基于扩散语言模型的黑盒优化方法dLLM,在少量样本下实现设计优化。 |
large language model |
|
|
| 3 |
Measuring the State of Open Science in Transportation Using Large Language Models |
利用大型语言模型评估交通运输研究中的开放科学实践现状 |
large language model |
|
|
| 4 |
On the Generalization Gap in LLM Planning: Tests and Verifier-Reward RL |
揭示LLM规划泛化差距:提出诊断干预方法与验证器奖励强化学习 |
large language model |
|
|
| 5 |
VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool Integration |
VisTIRA:通过结构化工具集成弥合视觉数学推理中的图像-文本模态差距 |
chain-of-thought |
|
|
| 6 |
Opportunities in AI/ML for the Rubin LSST Dark Energy Science Collaboration |
探索AI/ML在Rubin LSST暗能量科学合作中的应用机遇与挑战 |
foundation model |
|
|
| 7 |
Human Simulation Computation: A Human-Inspired Framework for Adaptive AI Systems |
提出人类模拟计算框架HSC,提升AI系统在动态环境中的适应性和推理能力 |
large language model |
|
|
| 8 |
LifeAgentBench: A Multi-dimensional Benchmark and Agent for Personal Health Assistants in Digital Health |
提出LifeAgentBench,用于评估和提升数字健康中个人健康助手的能力。 |
large language model |
|
|
| 9 |
HardSecBench: Benchmarking the Security Awareness of LLMs for Hardware Code Generation |
HardSecBench:评估LLM在硬件代码生成中的安全意识基准 |
large language model |
|
|
| 10 |
ToolCaching: Towards Efficient Caching for LLM Tool-calling |
提出ToolCaching,解决LLM工具调用中冗余请求问题,提升缓存效率。 |
large language model |
|
|
| 11 |
Hidden in Plain Text: Measuring LLM Deception Quality Against Human Baselines Using Social Deduction Games |
利用社交推理游戏评估LLM在自然语言欺骗中的表现,优于人类基线 |
large language model |
|
|
| 12 |
Why Does the LLM Stop Computing: An Empirical Study of User-Reported Failures in Open-Source LLMs |
大规模实证研究开源LLM部署失败问题,揭示系统性瓶颈与解决方案。 |
large language model |
|
|
| 13 |
Foundations of Global Consistency Checking with Noisy LLM Oracles |
提出基于LLM的全局一致性检查框架,通过自适应分治算法高效检测并修复不一致性。 |
large language model |
|
|
| 14 |
DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems |
DSAEval:提出一个真实世界数据科学问题评估基准,用于评估数据科学Agent的性能。 |
multimodal |
|
|
| 15 |
SCRIPTMIND: Crime Script Inference and Cognitive Evaluation for LLM-based Social Engineering Scam Detection System |
ScriptMind:用于LLM社交工程诈骗检测的犯罪脚本推理与认知评估框架 |
large language model |
|
|
| 16 |
Leveraging ChatGPT and Other NLP Methods for Identifying Risk and Protective Behaviors in MSM: Social Media and Dating apps Text Analysis |
利用ChatGPT等NLP方法识别MSM人群的风险和保护行为:社交媒体和约会应用文本分析 |
large language model |
|
|
| 17 |
CatMaster: An Agentic Autonomous System for Computational Heterogeneous Catalysis Research |
CatMaster:基于LLM的自主智能体系统,加速计算异构催化研究 |
large language model |
|
|