| 1 |
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy |
Optimus-2:提出基于目标-观察-动作条件策略的多模态Minecraft智能体 |
large language model multimodal |
✅ |
|
| 2 |
LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis |
提出基于token间时间间隔和网络流量分析的LLM指纹识别方法,提升模型安全与可信度。 |
large language model |
|
|
| 3 |
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models |
Meta-Reasoner:动态引导大语言模型优化推理时推理 |
large language model |
|
|
| 4 |
ACE, Action and Control via Explanations: A Proposal for LLMs to Provide Human-Centered Explainability for Multimodal AI Assistants |
提出ACE框架,利用LLM解释实现人机协作,提升多模态AI助手在制造业中的性能 |
multimodal |
|
|
| 5 |
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory |
提出基于行为博弈论的LLM战略推理评估框架,揭示模型决策机制与偏见。 |
large language model chain-of-thought |
|
|
| 6 |
An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs |
评估LLM在PDDL理解与生成中的能力,揭示其在自动规划任务中的潜力和局限 |
large language model chain-of-thought |
|
|
| 7 |
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts |
COMET:面向混合专家模型,实现细粒度计算-通信重叠优化。 |
large language model |
|
|
| 8 |
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers |
提出多Agent验证(MAV),通过扩展验证器数量提升LLM测试时性能。 |
large language model |
|
|
| 9 |
EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants |
提出EAIRA方法,用于全面评估AI模型作为科研助手的能力 |
large language model |
|
|
| 10 |
Evaluating Human Trust in LLM-Based Planners: A Preliminary Study |
初步研究:评估人类对基于LLM规划器的信任度 |
large language model |
|
|
| 11 |
AI Will Always Love You: Studying Implicit Biases in Romantic AI Companions |
研究浪漫AI伴侣中的隐性偏见,揭示性别化角色对LLM响应的刻板影响 |
large language model |
|
|
| 12 |
Will AI replace Software Engineers? Do not hold your breath |
AI能否取代软件工程师?短期内不会,软件维护能力是关键壁垒 |
large language model |
|
|
| 13 |
Societal Alignment Frameworks Can Improve LLM Alignment |
引入社会对齐框架以提升大型语言模型的对齐效果 |
large language model |
|
|
| 14 |
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models |
DiffCSS:利用扩散模型实现多样且富有表现力的对话语音合成 |
multimodal |
|
|
| 15 |
LLM-driven Effective Knowledge Tracing by Integrating Dual-channel Difficulty |
提出DDKT框架,利用LLM和RAG提升知识追踪的准确性和可解释性。 |
large language model |
|
|
| 16 |
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments |
提出CONVCODEWORLD以解决多轮交互代码生成评估问题 |
large language model |
✅ |
|
| 17 |
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration |
HALO:一种硬件感知的低关键路径延迟权重量化方法,用于加速LLM推理。 |
large language model |
|
|