| 1 |
Towards Considerate Embodied AI: Co-Designing Situated Multi-Site Healthcare Robots from Abstract Concepts to High-Fidelity Prototypes |
通过共创设计,为多场景医疗机器人打造更周到的具身AI |
embodied AI |
✅ |
|
| 2 |
Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis |
提出Agentic Proposing框架,通过组合技能合成高质量数据,提升大语言模型推理能力。 |
large language model |
|
|
| 3 |
An Empirical Study of Collective Behaviors and Social Dynamics in Large Language Model Agents |
研究大型语言模型Agent的集体行为与社会动态,提出CoST方法抑制有害信息发布 |
large language model |
|
|
| 4 |
VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models |
提出VALUEFLOW以解决大语言模型价值对齐问题 |
large language model |
|
|
| 5 |
De-conflating Preference and Qualification: Constrained Dual-Perspective Reasoning for Job Recommendation with Large Language Models |
JobRec:通过约束双视角推理,解耦偏好与资格,用于LLM驱动的职位推荐 |
large language model |
|
|
| 6 |
Large Language Models Can Take False First Steps at Inference-time Planning |
大型语言模型在推理时规划中存在虚假先验步骤问题 |
large language model |
|
|
| 7 |
CSR-Bench: A Benchmark for Evaluating the Cross-modal Safety and Reliability of MLLMs |
提出CSR-Bench,用于评估多模态大语言模型跨模态安全性和可靠性 |
large language model multimodal |
|
|
| 8 |
Are LLMs Biased Like Humans? Causal Reasoning as a Function of Prior Knowledge, Irrelevant Information, and Reasoning Budget |
评估大型语言模型的因果推理与人类偏见的关系 |
large language model chain-of-thought |
|
|
| 9 |
Conformal Thinking: Risk Control for Reasoning on a Compute Budget |
提出基于风险控制的自适应推理框架,优化大语言模型在计算预算下的推理。 |
large language model |
|
|
| 10 |
DiscoverLLM: From Executing Intents to Discovering Them |
提出DiscoverLLM框架,通过意图发现提升LLM在开放式任务中的交互性能。 |
large language model |
|
|
| 11 |
Methods and Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment |
综述可微社会选择方法:学习机制、决策与对齐,并提出开放性问题。 |
large language model |
|
|
| 12 |
Persona Generators: Generating Diverse Synthetic Personas at Scale |
提出Persona Generators,利用进化算法生成多样化合成角色,用于AI系统评估。 |
large language model |
|
|
| 13 |
When Routing Collapses: On the Degenerate Convergence of LLM Routers |
提出EquiRouter以解决LLM路由中的退化收敛问题,提升成本效益。 |
multimodal |
✅ |
|
| 14 |
Ontology-to-tools compilation for executable semantic constraint enforcement in LLM agents |
提出本体到工具的编译方法,用于LLM Agent中可执行的语义约束强化 |
large language model |
|
|
| 15 |
Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial Expectations |
ExpSum:结合工业期望的知识引导代码摘要生成方法 |
large language model |
|
|
| 16 |
The Necessity of a Unified Framework for LLM-Based Agent Evaluation |
提出LLM Agent统一评估框架,解决评估标准不一致问题 |
large language model |
|
|
| 17 |
Beyond Quantity: Trajectory Diversity Scaling for Code Agents |
TDScaling:通过轨迹多样性提升代码智能体性能,突破数量 scaling 瓶颈 |
large language model |
|
|
| 18 |
Internet of Agentic AI: Incentive-Compatible Distributed Teaming and Workflow |
提出Internet of Agentic AI框架,实现可扩展的Agentic AI分布式协作与工作流。 |
large language model |
|
|
| 19 |
Understanding Multi-Agent LLM Frameworks: A Unified Benchmark and Experimental Analysis |
提出MAFBench,用于系统评估多智能体LLM框架架构对性能的影响 |
large language model |
|
|
| 20 |
Digital Lifelong Learning in the Age of AI: Trends and Insights |
分析AI时代终身数字学习趋势,揭示学习动机与平台优化策略 |
large language model |
|
|
| 21 |
Risky-Bench: Probing Agentic Safety Risks under Real-World Deployment |
提出Risky-Bench以解决现实环境中代理安全风险评估问题 |
large language model |
|
|
| 22 |
MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems |
MAS-ProVe:系统性研究多智能体系统过程验证的有效性与挑战 |
large language model |
✅ |
|
| 23 |
RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents |
提出RC-GRPO,通过奖励调节提升多轮工具调用Agent的性能 |
large language model |
|
|