| 1 |
SafeSteer: A Decoding-level Defense Mechanism for Multimodal Large Language Models |
SafeSteer:一种多模态大语言模型解码层面的防御机制 |
large language model multimodal |
|
|
| 2 |
MM-OptBench: A Solver-Grounded Benchmark for Multimodal Optimization Modeling |
提出MM-OptBench,用于评估多模态优化建模中语言模型生成优化模型和代码的能力。 |
large language model multimodal |
|
|
| 3 |
On the Limitations of Large Language Models for Conceptual Database Modeling |
分析大型语言模型在概念数据库建模中的局限性 |
large language model chain-of-thought |
|
|
| 4 |
Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity |
提出MM-Eval:一个综合评估多模态摘要质量、对齐性和多样性的统一框架 |
large language model multimodal |
|
|
| 5 |
OOM-Free Alpamayo via CPU-GPU Memory Swapping for Vision-Language-Action Models |
提出OOM-Free Alpamayo框架,通过CPU-GPU内存交换实现VLA模型在低显存GPU上的高效推理。 |
vision-language-action VLA |
|
|
| 6 |
Very Efficient Listwise Multimodal Reranking for Long Documents |
ZipRerank:高效列表式多模态重排序,加速长文档检索与生成。 |
multimodal |
✅ |
|
| 7 |
$δ$-mem: Efficient Online Memory for Large Language Models |
提出$δ$-mem,通过高效在线记忆增强大语言模型处理长程依赖任务的能力。 |
large language model |
|
|
| 8 |
Towards Automated Air Traffic Safety Assessment Around Non-Towered Airports Using Large Language Models |
提出基于大语言模型的非塔台机场空域安全自动评估框架 |
large language model |
|
|
| 9 |
Large Language Models as Amortized Pareto-Front Generators for Constrained Bi-Objective Convex Optimization |
DIPS:利用大语言模型生成约束双目标凸优化的Pareto前沿 |
large language model |
|
|
| 10 |
OmniRefine: Alignment-Aware Cooperative Compression for Efficient Omnimodal Large Language Models |
提出OmniRefine,用于高效压缩Omni-LLM中的音视频token,提升推理效率。 |
large language model |
|
|
| 11 |
Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models |
提出AutoREM,利用记忆增强大语言模型自动重构鲁棒优化问题。 |
large language model |
|
|
| 12 |
When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel |
揭示思维链推理过程中的不一致性:推理轨迹并非完美监督通道 |
chain-of-thought |
|
|
| 13 |
A CAP-like Trilemma for Large Language Models: Correctness, Non-bias, and Utility under Semantic Underdetermination |
针对大语言模型,提出语义欠定下的正确性、非偏见性和效用性三难困境 |
large language model |
|
|
| 14 |
Allegory of the Cave: Measurement-Grounded Vision-Language Learning |
提出PRISM-VL,通过测量域视觉语言学习提升多模态推理能力 |
multimodal |
|
|
| 15 |
SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces |
SkillSafetyBench:评估技能驱动攻击下Agent的安全性 |
large language model |
|
|
| 16 |
Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers |
形式化而非优化:LLM生成组合求解器中的启发式陷阱 |
large language model |
|
|
| 17 |
LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management |
LISA:基于LLM的认知仲裁无信号灯自动交叉口管理 |
large language model |
|
|
| 18 |
Iterative Audit Convergence in LLM-Managed Multi-Agent Systems: A Case Study in Prompt Engineering Quality Assurance |
提出基于LLM驱动的多Agent迭代审计方法,用于提升Prompt工程质量保障。 |
large language model |
|
|
| 19 |
How Useful Is Cross-Domain Generalization for Training LLM Monitors? |
研究跨领域泛化能力对训练LLM监控器的有效性 |
instruction following |
|
|
| 20 |
Reconnecting Fragmented Citation Networks with Semantic Augmentation |
提出基于语义增强的混合框架,用于补全碎片化的引文网络。 |
large language model |
|
|
| 21 |
No Action Without a NOD: A Heterogeneous Multi-Agent Architecture for Reliable Service Agents |
提出NOD异构多Agent架构,提升服务Agent在长程任务中的可靠性 |
large language model |
|
|
| 22 |
Uncertainty Quantification for LLM-based Code Generation |
RisCoSet:利用多重假设检验为LLM代码生成构建风险可控的预测集合 |
large language model |
|
|
| 23 |
BoolXLLM: LLM-Assisted Explainability for Boolean Models |
BoolXLLM:提出一种LLM辅助的布尔模型可解释性框架 |
large language model |
|
|
| 24 |
Intermediate Artifacts as First-Class Citizens: A Data Model for Durable Intermediate Artifacts in Agentic Systems |
提出一种数据模型,用于在Agent系统中持久化中间产物,提升可维护性。 |
chain-of-thought |
|
|
| 25 |
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory |
SAGE:一种自进化Agentic图记忆引擎,用于结构化关联记忆 |
foundation model |
|
|
| 26 |
LLMs and the ZPD |
基于维果茨基ZPD理论,探索LLM的“原始思维”模式与交互的重要性 |
large language model |
|
|
| 27 |
LegalCheck: Retrieval- and Context-Augmented Generation for Drafting Municipal Legal Advice Letters |
LegalCheck:结合检索与上下文增强生成,辅助起草市政法律咨询函 |
large language model |
|
|
| 28 |
CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference |
提出CR^2框架,解决无线边缘设备上LLM推理的成本感知风险控制路由问题 |
large language model |
|
|
| 29 |
BadSKP: Backdoor Attacks on Knowledge Graph-Enhanced LLMs with Soft Prompts |
提出BadSKP,针对知识图谱增强LLM的软提示后门攻击方法 |
large language model |
|
|
| 30 |
Counterfactual Trace Auditing of LLM Agent Skills |
提出CTA框架,通过反事实轨迹审计评估LLM Agent技能的影响,揭示现有评估方法的局限性。 |
large language model |
|
|
| 31 |
Domain Restriction via Multi SAE Layer Transitions |
利用多层稀疏自编码器转换进行领域限制,解决大语言模型领域外交互问题。 |
large language model |
|
|
| 32 |
Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving |
提出段级监督以解决大语言模型定理证明中的训练数据构建问题 |
large language model |
✅ |
|
| 33 |
From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP |
利用LRP解释EEG Transformer,揭示脑电信号中隐藏的行为模式与生物学假设 |
foundation model |
|
|
| 34 |
Why Users Go There: World Knowledge-Augmented Generative Next POI Recommendation |
提出AWARE,利用LLM增强POI推荐,解决模型知识固化问题。 |
large language model |
|
|
| 35 |
OptArgus: A Multi-Agent System to Detect Hallucinations in LLM-based Optimization Modeling |
提出OptArgus以解决LLM优化建模中的幻觉检测问题 |
large language model |
|
|
| 36 |
Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value Guidance |
提出SVGT,通过独立价值模块实现大语言模型稳定价值观对齐 |
large language model |
✅ |
|
| 37 |
Explaining and Breaking the Safety-Helpfulness Ceiling via Preference Dimensional Expansion |
提出MORA,通过扩展奖励维度打破大语言模型安全性-有用性瓶颈 |
large language model |
|
|
| 38 |
Every Bit, Everywhere, All at Once: A Binomial Multibit LLM Watermark |
提出一种基于二项式编码的多比特LLM水印方案,提升消息准确性和鲁棒性。 |
TAMP |
|
|
| 39 |
GAR: Carbon-Aware Routing for LLM Inference via Constrained Optimization |
提出GAR:通过约束优化实现LLM推理的碳感知路由 |
large language model |
|
|
| 40 |
Controllable User Simulation |
提出因果一致的可控用户模拟器,解决对话Agent离线评估中的偏差问题。 |
large language model |
|
|
| 41 |
AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive |
提出AutoLLMResearch以解决高成本LLM实验配置问题 |
large language model |
|
|