| 1 |
Unveiling Intrinsic Text Bias in Multimodal Large Language Models through Attention Key-Space Analysis |
通过注意力键空间分析揭示多模态大语言模型中固有的文本偏见 |
large language model multimodal |
|
|
| 2 |
Normative Reasoning in Large Language Models: A Comparative Benchmark from Logical and Modal Perspectives |
提出NeuBAROCO基准,对比评估LLM在逻辑和模态视角下的规范推理能力 |
large language model |
✅ |
|
| 3 |
Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling |
提出基于LLM的智能家居能源管理系统,实现住宅负荷优化调度 |
large language model |
|
|
| 4 |
SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning |
SecureReviewer:通过安全感知微调增强大型语言模型以实现安全代码审查 |
large language model |
|
|
| 5 |
CausalGuard: A Smart System for Detecting and Preventing False Information in Large Language Models |
CausalGuard:利用因果推理与符号逻辑检测并预防大语言模型中的虚假信息 |
large language model |
|
|
| 6 |
Chain-of-Thought Hijacking |
提出CoT Hijacking攻击,揭示思维链推理中大型语言模型的安全漏洞 |
chain-of-thought |
|
|
| 7 |
Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures |
揭示LLM多阶段流水线中的信任漏洞,提出零信任架构Countermind |
large language model |
|
|
| 8 |
Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System |
提出Urban-MAS框架以解决人本城市预测问题 |
large language model multimodal |
✅ |
|
| 9 |
LLMs are Overconfident: Evaluating Confidence Interval Calibration with FermiEval |
FermiEval评估LLM置信区间校准,揭示其过度自信问题并提出校正方法 |
large language model |
|
|
| 10 |
Autograder+: A Multi-Faceted AI Framework for Rich Pedagogical Feedback in Programming Education |
Autograder+:用于编程教育中提供丰富教学反馈的多方面AI框架 |
large language model |
|
|
| 11 |
Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings |
Scales++:利用认知尺度嵌入实现计算高效的评估子集选择 |
large language model |
|
|
| 12 |
QuantumBench: A Benchmark for Quantum Problem Solving |
提出QuantumBench以评估量子领域中的大语言模型 |
large language model |
|
|
| 13 |
Beyond Synthetic Benchmarks: Evaluating LLM Performance on Real-World Class-Level Code Generation |
提出真实类级别代码生成基准,评估LLM在实际场景下的性能瓶颈与改进策略 |
large language model |
|
|
| 14 |
LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks |
提出基于LLM的物联网/工业物联网多分类攻击分析与缓解框架 |
large language model |
|
|
| 15 |
Artificial Intelligence in Elementary STEM Education: A Systematic Review of Current Applications and Future Challenges |
系统性回顾AI在小学STEM教育中的应用,揭示挑战并展望未来方向 |
multimodal |
|
|
| 16 |
Cognition Envelopes for Bounded AI Reasoning in Autonomous UAS Operations |
提出认知包络,约束自主无人机系统中AI推理的决策边界 |
large language model |
|
|
| 17 |
How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison |
对比Grokipedia与维基百科:多维度文本与结构分析揭示AI生成百科全书的潜在偏见 |
large language model |
|
|
| 18 |
ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference |
ExpertFlow:自适应专家调度与内存协调,提升MoE模型推理效率 |
large language model |
|
|
| 19 |
Delegated Authorization for Agents Constrained to Semantic Task-to-Scope Matching |
提出基于语义任务-范围匹配的代理授权模型,解决大模型工具调用安全问题 |
large language model |
|
|
| 20 |
CATArena: Evaluating Evolutionary Capabilities of Code Agents via Iterative Tournaments |
CATArena:通过迭代竞赛评估代码智能体的演化能力 |
large language model |
|
|
| 21 |
Who Has The Final Say? Conformity Dynamics in ChatGPT's Selections |
揭示ChatGPT在招聘决策中易受社会影响的特性,强调AI决策的独立性风险 |
large language model |
|
|
| 22 |
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token |
提出CPT-Filtering,通过统计单Token字符数过滤混淆提示词,防御LLM越狱攻击。 |
large language model |
|
|
| 23 |
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning |
提出BOTS框架以解决LLM强化微调中的任务选择问题 |
large language model |
✅ |
|
| 24 |
GraphCompliance: Aligning Policy and Context Graphs for LLM-Based Regulatory Compliance |
GraphCompliance:对齐策略图和上下文图,用于LLM的监管合规 |
large language model |
|
|
| 25 |
SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detection |
提出SynBullying:一个用于网络欺凌检测的多LLM合成对话数据集 |
large language model |
|
|
| 26 |
Linking Heterogeneous Data with Coordinated Agent Flows for Social Media Analysis |
SIA:利用协同Agent流连接异构数据,用于社交媒体分析 |
large language model |
|
|
| 27 |
ToolRM: Towards Agentic Tool-Use Reward Modeling |
提出ToolRM,用于提升Agent在工具使用场景下的奖励建模能力。 |
large language model |
|
|
| 28 |
The FM Agent |
提出FM Agent以解决复杂科学与工程问题 |
large language model |
|
|
| 29 |
Beyond Benchmarks: The Economics of AI Inference |
构建LLM推理经济学框架,揭示成本、规模与质量间的关系 |
large language model |
|
|