| 1 |
Automated urban waterlogging assessment and early warning through a mixture of foundation models |
提出UWAssess,利用混合基础模型自动评估城市内涝并预警 |
foundation model chain-of-thought |
|
|
| 2 |
CytoNet: A Foundation Model for the Human Cerebral Cortex |
CytoNet:用于人脑皮层分析的基础模型,实现细胞级微观结构理解 |
foundation model |
|
|
| 3 |
Counterfactual Reasoning for Steerable Pluralistic Value Alignment of Large Language Models |
提出COUPLE框架,利用反事实推理实现大语言模型对多元价值的可控对齐 |
large language model |
|
|
| 4 |
Unifying Inductive, Cross-Domain, and Multimodal Learning for Robust and Generalizable Recommendation |
MICRec:融合归纳、跨域和多模态学习的鲁棒通用推荐框架 |
multimodal |
|
|
| 5 |
Illusions of reflection: open-ended task reveals systematic failures in Large Language Models' reflective reasoning |
揭示大语言模型反思推理的系统性缺陷:开放任务下约束违反 |
large language model |
✅ |
|
| 6 |
The MUSE Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMS |
MUSE基准测试:用于评估音频LLM音乐感知和听觉关系推理能力 |
large language model multimodal chain-of-thought |
|
|
| 7 |
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability |
提出CoT混淆压力测试方法,评估推理模型在对抗环境下的可监控性 |
chain-of-thought |
|
|
| 8 |
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models |
HarmNet:一种用于大语言模型的多轮自适应越狱攻击框架 |
large language model |
|
|
| 9 |
Exploring Membership Inference Vulnerabilities in Clinical Large Language Models |
探索临床大语言模型中的成员推理漏洞,评估患者隐私泄露风险 |
large language model |
|
|
| 10 |
StarBench: A Turn-Based RPG Benchmark for Agentic Multimodal Decision-Making and Information Seeking |
StarBench:一个用于智能体多模态决策与信息寻求的回合制RPG基准 |
multimodal |
|
|
| 11 |
PlanU: Large Language Model Reasoning through Planning under Uncertainty |
提出PlanU:通过不确定性下的规划增强大语言模型推理能力 |
large language model |
|
|
| 12 |
Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning |
Earth AI:利用基础模型和跨模态推理解锁地理空间洞察 |
foundation model |
|
|
| 13 |
A Justice Lens on Fairness and Ethics Courses in Computing Education: LLM-Assisted Multi-Perspective and Thematic Evaluation |
利用LLM多视角评估,提升计算教育中公平与伦理课程的教学设计。 |
large language model |
|
|
| 14 |
Cultural Alien Sampler: Open-ended art generation balancing originality and coherence |
提出文化异类采样器(CAS),在开放式艺术生成中平衡原创性和连贯性。 |
large language model |
|
|
| 15 |
Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality |
基于最优传输的测试时验证:揭示覆盖率、ROC与次优性之间的关系 |
large language model |
|
|
| 16 |
Prompt Decorators: A Declarative and Composable Syntax for Reasoning, Formatting, and Control in LLMs |
提出Prompt Decorators以解决LLMs控制不足问题 |
large language model |
|
|
| 17 |
LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources |
LAFA:基于Agentic LLM的去中心化数据联邦分析框架 |
large language model |
|
|
| 18 |
Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents |
提出基于概率意图建模的LLM Agent框架,提升社交对话中的智能水平。 |
large language model |
|
|
| 19 |
CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs |
CircuitSeer:通过探查LLM数学推理电路挖掘高质量数据 |
large language model |
|
|
| 20 |
Genesis: Evolving Attack Strategies for LLM Web Agent Red-Teaming |
Genesis:演化攻击策略,用于LLM Web Agent的红队测试 |
large language model |
|
|
| 21 |
Prospects for Using Artificial Intelligence to Understand Intrinsic Kinetics of Heterogeneous Catalytic Reactions |
利用人工智能理解非均相催化反应的本征动力学 |
multimodal |
|
|
| 22 |
EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs |
EdgeReasoning:表征边缘GPU上推理LLM的部署,优化延迟-精度权衡 |
large language model |
|
|
| 23 |
ssToken: Self-modulated and Semantic-aware Token Selection for LLM Fine-tuning |
提出ssToken,通过自调制和语义感知选择token,提升LLM微调效果。 |
large language model |
|
|