| 1 |
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning |
提出NoRD,一种数据高效的免推理端到端自动驾驶VLA模型 |
vision-language-action VLA |
|
|
| 2 |
Decoding the Hook: A Multimodal LLM Framework for Analyzing the Hooking Period of Video Ads |
提出基于多模态LLM的视频广告Hooking Period分析框架,提升广告效果评估与优化。 |
large language model multimodal |
|
|
| 3 |
SPM-Bench: Benchmarking Large Language Models for Scanning Probe Microscopy |
SPM-Bench:针对扫描探针显微镜的大语言模型权威自动化评测基准 |
large language model multimodal |
|
|
| 4 |
RAGdb: A Zero-Dependency, Embeddable Architecture for Multimodal Retrieval-Augmented Generation on the Edge |
RAGdb:一种零依赖、可嵌入的边缘多模态RAG架构 |
large language model multimodal |
|
|
| 5 |
G-reasoner: Foundation Models for Unified Reasoning over Graph-structured Knowledge |
提出G-reasoner,用于统一推理图结构知识的基座模型框架。 |
large language model foundation model |
|
|
| 6 |
FM-RME: Foundation Model Empowered Radio Map Estimation |
提出FM-RME,赋能多维无线电地图估计,实现零样本泛化。 |
foundation model |
|
|
| 7 |
Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models |
利用大语言模型绘制生命周期评估中人工智能应用图谱 |
large language model |
|
|
| 8 |
Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models |
提出元认知行为调整MBT,提升大语言模型复杂推理的稳定性和准确性 |
large language model |
|
|
| 9 |
Multi-Agent Large Language Model Based Emotional Detoxification Through Personalized Intensity Control for Consumer Protection |
提出基于多Agent LLM的情绪解毒系统MALLET,以个性化强度控制保护消费者 |
large language model |
|
|
| 10 |
Enriching Taxonomies Using Large Language Models |
Taxoria:利用大型语言模型丰富现有分类体系,提升知识检索效果 |
large language model |
|
|
| 11 |
Automating the Detection of Requirement Dependencies Using Large Language Models |
提出LEREDD,利用大语言模型自动检测需求依赖关系 |
large language model |
|
|
| 12 |
LLM4AD: A Platform for Algorithm Design with Large Language Model |
LLM4AD:一个基于大语言模型的算法设计统一平台 |
large language model |
|
|
| 13 |
FHIR-RAG-MEDS: Integrating HL7 FHIR with Retrieval-Augmented Large Language Models for Enhanced Medical Decision Support |
提出FHIR-RAG-MEDS系统,融合HL7 FHIR与RAG,增强医疗决策支持。 |
large language model |
|
|
| 14 |
Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation |
提出MAGNET,通过模态引导的图专家混合网络和熵触发路由解决多模态推荐中的融合难题。 |
multimodal |
|
|
| 15 |
SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation |
SC-Arena:一个基于知识增强评估的单细胞推理自然语言基准 |
large language model foundation model |
|
|
| 16 |
EyeLayer: Integrating Human Attention Patterns into LLM-Based Code Summarization |
EyeLayer:将人类注意力模式融入LLM代码摘要生成,提升代码理解能力 |
large language model multimodal |
|
|
| 17 |
ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile Devices |
ProactiveMobile:一个全面的移动设备主动智能基准测试,旨在提升移动设备的主动智能水平。 |
large language model multimodal |
|
|
| 18 |
Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction |
评估小语言模型在领导者-跟随者交互中的零样本和单样本适应性 |
large language model |
|
|
| 19 |
CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety |
CourtGuard:一种模型无关的零样本策略适应框架,用于提升LLM安全性 |
large language model |
|
|
| 20 |
AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications |
提出AMA-Bench评估Agent在长时程记忆应用中的性能,并提出AMA-Agent提升效果。 |
large language model |
|
|
| 21 |
Intelligence per Watt: Measuring Intelligence Efficiency of Local AI |
提出每瓦特智能(IPW)指标,评估本地AI的能效,推动云端负载向本地设备转移。 |
large language model |
|
|
| 22 |
LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories) |
LeanCat:用于Lean形式化范畴论的基准测试套件,揭示了现有模型在抽象推理上的不足。 |
large language model |
|
|
| 23 |
GPT-4o Lacks Core Features of Theory of Mind |
GPT-4o缺乏核心的心智理论能力,无法建立一致的心智状态模型 |
large language model |
|
|
| 24 |
Large-scale online deanonymization with LLMs |
利用大型语言模型实现大规模在线去匿名化 |
large language model |
|
|
| 25 |
A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines |
提出评估代理(EA)框架,用于决策中心化地评估AutoML Agent的决策质量。 |
large language model |
|
|
| 26 |
ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization |
ConstraintBench:评估LLM在直接优化中约束推理能力的基准测试 |
large language model |
|
|
| 27 |
Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents |
利用认知模型与AI算法设计模块化语言智能体 |
large language model |
|
|
| 28 |
Requesting Expert Reasoning: Augmenting LLM Agents with Learned Collaborative Intervention |
提出AHCE框架,通过学习策略请求专家知识,提升LLM Agent在复杂任务中的表现 |
large language model |
|
|
| 29 |
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios |
MobilityBench:一个用于评估真实世界出行场景中路径规划Agent的基准测试 |
large language model |
|
|
| 30 |
Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions |
综述个性化LLM驱动Agent:聚焦长期交互中的用户适应与连续性 |
large language model |
|
|
| 31 |
MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks |
MiroFlow:面向通用深度研究任务的高性能鲁棒开源Agent框架 |
large language model |
|
|
| 32 |
Obscure but Effective: Classical Chinese Jailbreak Prompt Optimization via Bio-Inspired Search |
提出CC-BOS框架,利用文言文和果蝇优化算法实现大语言模型的黑盒越狱攻击。 |
large language model |
|
|
| 33 |
Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design |
提出基于LLM的自动启发式设计的AILS-AHD算法,提升CVRP求解性能 |
large language model |
|
|
| 34 |
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring |
提出决策理论视角的隐写术以监测大型语言模型 |
large language model |
|
|
| 35 |
Mitigating Legibility Tax with Decoupled Prover-Verifier Games |
提出解耦的证明者-验证者博弈,缓解大语言模型的可读性税问题。 |
large language model |
|
|
| 36 |
Comparative Analysis of Neural Retriever-Reranker Pipelines for Retrieval-Augmented Generation over Knowledge Graphs in E-commerce Applications |
针对电商知识图谱,提出并比较神经检索-重排序RAG流水线,显著提升问答性能。 |
large language model |
|
|
| 37 |
Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews |
针对中文网络,评估搜索引擎、LLM和AI概览中的错误信息暴露风险 |
large language model |
|
|
| 38 |
From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation |
评估LLM在任务并行代码生成中的性能,探索提示工程对代码质量的影响 |
large language model |
|
|
| 39 |
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks |
针对提示注入和越狱攻击,分析多种开源LLM的脆弱性及防御机制 |
large language model |
|
|
| 40 |
Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents |
提出上下文记忆虚拟化CMV,用于LLM Agent中基于DAG的状态管理和结构无损精简。 |
large language model |
|
|
| 41 |
HubScan: Detecting Hubness Poisoning in Retrieval-Augmented Generation Systems |
HubScan:检测检索增强生成系统中枢纽性投毒攻击 |
large language model |
|
|
| 42 |
Silent Egress: When Implicit Prompt Injection Makes LLM Agents Leak Without a Trace |
提出Silent Egress攻击,揭示LLM Agent中隐式Prompt注入导致敏感信息泄露的风险 |
large language model |
|
|
| 43 |
Generative Agents Navigating Digital Libraries |
Agent4DL:利用生成式Agent模拟数字图书馆用户搜索行为 |
large language model |
|
|
| 44 |
Addressing Climate Action Misperceptions with Generative AI |
利用生成式AI解决气候行动认知偏差,提升环保行为意愿 |
large language model |
|
|
| 45 |
IMMACULATE: A Practical LLM Auditing Framework via Verifiable Computation |
提出IMMACULATE框架以解决LLM审计问题 |
large language model |
|
|
| 46 |
AgentSentry: Mitigating Indirect Prompt Injection in LLM Agents via Temporal Causal Diagnostics and Context Purification |
AgentSentry:通过时序因果诊断和上下文净化缓解LLM Agent中的间接提示注入攻击 |
large language model |
|
|
| 47 |
Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study |
提出基于可再生能源消纳窗口的分布式LLM预训练方法 |
large language model |
|
|
| 48 |
Discovery of Interpretable Physical Laws in Materials via Language-Model-Guided Symbolic Regression |
提出语言模型引导的符号回归,用于发现材料科学中可解释的物理定律 |
large language model |
|
|
| 49 |
LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure |
LLMServingSim 2.0:异构和解耦LLM服务基础设施的统一模拟器 |
large language model |
|
|
| 50 |
Utilizing LLMs for Industrial Process Automation |
利用大型语言模型加速工业过程自动化软件开发 |
large language model |
|
|
| 51 |
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning |
提出Compositional-ARC数据集,并利用元学习提升模型在抽象空间推理中的系统泛化能力。 |
large language model |
|
|
| 52 |
Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs |
提出 ReasoningMath-Plus 基准,用于评估LLM在结构化数学推理中的过程能力。 |
large language model |
|
|
| 53 |
Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy |
VALTEST:利用语义熵自动验证语言模型生成的测试用例,提升代码生成质量 |
large language model |
|
|