| 1 |
Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding |
揭示多模态大语言模型在离散符号理解中的认知失配问题,并提出评测基准。 |
large language model multimodal |
|
|
| 2 |
dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models |
dTRPO:通过轨迹缩减优化扩散大语言模型的策略 |
large language model instruction following |
|
|
| 3 |
How Uncertainty Estimation Scales with Sampling in Reasoning Models |
研究推理模型中基于采样的不确定性估计方法,并提出混合估计器。 |
chain-of-thought |
|
|
| 4 |
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM Creativity |
跨领域映射提升人类与LLM创造力:设计中的意外发现 |
large language model |
|
|
| 5 |
Evaluating 5W3H Structured Prompting for Intent Alignment in Human-AI Interaction |
提出基于5W3H结构化提示的PPS框架,提升人机交互中意图对齐效果 |
large language model |
|
|
| 6 |
I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems |
评估多智能体治理系统中腐败现象,强调制度设计的重要性 |
large language model |
|
|
| 7 |
Quantitative Introspection in Language Models: Tracking Internal States Across Conversation |
提出数值自报告以追踪语言模型的内部状态 |
large language model |
|
|
| 8 |
Geography According to ChatGPT -- How Generative AI Represents and Reasons about Geography |
评估ChatGPT的地理知识表示与推理能力,揭示生成式AI的地理认知局限性 |
foundation model |
|
|
| 9 |
Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code Review |
研究LLM辅助代码审查中的确认偏差,揭示软件供应链攻击风险 |
large language model |
|
|
| 10 |
Analysis Of Linguistic Stereotypes in Single and Multi-Agent Generative AI Architectures |
分析单智能体和多智能体生成式AI架构中的语言刻板印象 |
chain-of-thought |
|
|
| 11 |
Accurate and Efficient Multi-Channel Time Series Forecasting via Sparse Attention Mechanism |
提出Li-Net,通过稀疏注意力机制实现准确高效的多通道时间序列预测。 |
multimodal |
|
|
| 12 |
An Onto-Relational-Sophic Framework for Governing Synthetic Minds |
提出Onto-Relational-Sophic框架,为通用人工智能的治理提供哲学基础。 |
foundation model |
|
|
| 13 |
ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs |
ZebraArena:用于研究工具增强LLM中推理-行动耦合的诊断模拟环境 |
large language model |
|
|
| 14 |
The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation |
提出针对企业AI洗涤行为的政策工具以促进绿色创新 |
large language model |
|
|
| 15 |
PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents |
提出PlanTwin以解决云端规划中的隐私泄露问题 |
large language model |
|
|
| 16 |
ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models |
ItinBench:利用大语言模型在多认知维度上进行规划的基准测试 |
large language model |
✅ |
|
| 17 |
Learning to Disprove: Formal Counterexample Generation with Large Language Models |
提出基于大语言模型的形式化反例生成方法,提升数学推理能力 |
large language model |
|
|
| 18 |
The Autonomy Tax: Defense Training Breaks LLM Agents |
揭示防御训练导致LLM Agent能力退化的“自主性税”,并分析其根本原因。 |
large language model |
|
|
| 19 |
POET: Power-Oriented Evolutionary Tuning for LLM-Based RTL PPA Optimization |
POET:面向功耗优化的LLM驱动RTL代码演化调优框架 |
large language model |
|
|
| 20 |
PlanTwin: Privacy-Preserving Planning Abstractions for Cloud-Assisted LLM Agents |
PlanTwin:为云辅助LLM代理提供隐私保护的规划抽象 |
large language model |
|
|