| 1 |
An analysis of AI Decision under Risk: Prospect theory emerges in Large Language Models |
首次验证:大型语言模型在风险决策中表现出前景理论偏差 |
large language model chain-of-thought |
|
|
| 2 |
Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach |
通过监督微调对齐大语言模型智能体与理性和道德偏好 |
large language model |
|
|
| 3 |
Enhancing Large Multimodal Models with Adaptive Sparsity and KV Cache Compression |
提出自适应稀疏化与KV缓存压缩方法,提升大模型在边缘设备上的部署效率。 |
multimodal |
|
|
| 4 |
Pareto-Grid-Guided Large Language Models for Fast and High-Quality Heuristics Design in Multi-Objective Combinatorial Optimization |
提出基于Pareto网格引导的大语言模型进化算法,用于快速高质量的多目标组合优化启发式设计。 |
large language model |
✅ |
|
| 5 |
MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs |
提出MMGraphRAG,利用多模态知识图谱增强视觉语言检索增强生成任务 |
multimodal |
|
|
| 6 |
How Chain-of-Thought Works? Tracing Information Flow from Decoding, Projection, and Activation |
通过追踪信息流,揭示思维链(CoT)提示的工作机制 |
chain-of-thought |
|
|
| 7 |
Agentic Web: Weaving the Next Web with AI Agents |
构建Agentic Web:利用AI Agent实现自主、目标驱动的互联网交互 |
large language model |
✅ |
|
| 8 |
Prescriptive Agents based on RAG for Automated Maintenance (PARAM) |
PARAM:基于RAG的工业设备预测性维护智能体,实现故障诊断与维护建议自动化。 |
large language model |
|
|
| 9 |
MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection |
提出MIMII-Agent以解决无真实异常声数据的评估问题 |
large language model |
|
|
| 10 |
Teaching Language Models To Gather Information Proactively |
提出主动信息收集框架,提升LLM在复杂任务中作为协作伙伴的能力。 |
large language model |
|
|
| 11 |
MAAD: Automate Software Architecture Design through Knowledge-Driven Multi-Agent Collaboration |
MAAD:通过知识驱动的多智能体协作实现软件架构设计的自动化 |
large language model |
|
|
| 12 |
Curiosity by Design: An LLM-based Coding Assistant Asking Clarification Questions |
设计好奇心:基于LLM的编码助手通过提问进行澄清 |
large language model |
|
|
| 13 |
LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems |
LeMix:多GPU系统上LLM训练与推理的统一调度系统 |
large language model |
|
|
| 14 |
CompoST: A Benchmark for Analyzing the Ability of LLMs To Compositionally Interpret Questions in a QALD Setting |
CompoST:评估LLM在QALD环境中组合性理解问题的基准测试 |
large language model |
|
|
| 15 |
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence |
首个自进化Agent综述:系统性地研究了通向通用人工智能的自进化Agent的设计要素与未来方向。 |
large language model |
|
|
| 16 |
MIRAGE-Bench: LLM Agent is Hallucinating and Where to Find Them |
MIRAGE-Bench:首个交互式LLM Agent幻觉行为统一评测基准 |
large language model |
|
|
| 17 |
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories |
TypyBench:评估LLM在无类型Python仓库中的类型推断能力 |
large language model |
✅ |
|
| 18 |
The Xeno Sutra: Can Meaning and Value be Ascribed to an AI-Generated "Sacred" Text? |
利用大型语言模型生成佛教“经文”,探讨AI在意义创造领域的哲学和社会影响 |
large language model |
|
|