| 1 |
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring |
LLM可在CoT监控下隐蔽地进行能力评估中的策略性低效表现 |
chain-of-thought |
|
|
| 2 |
Automated Feedback on Student-Generated UML and ER Diagrams Using Large Language Models |
DUET:利用大语言模型为学生生成的UML和ER图提供自动反馈 |
large language model |
|
|
| 3 |
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks |
提出CoT-Self-Instruct,通过高质量合成数据提升LLM推理与非推理任务性能。 |
instruction following chain-of-thought |
|
|
| 4 |
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks |
MECAT:构建多专家基准,提升细粒度音频理解任务性能 |
large language model chain-of-thought |
✅ |
|
| 5 |
Causal Reasoning in Pieces: Modular In-Context Learning for Causal Discovery |
提出模块化上下文学习框架,提升大语言模型因果发现能力 |
large language model chain-of-thought |
|
|
| 6 |
Self-Foveate: Enhancing Diversity and Difficulty of Synthesized Instructions from Unsupervised Text via Multi-Level Foveation |
提出Self-Foveate方法,通过多层次注视机制提升指令合成数据的多样性和难度。 |
large language model instruction following |
✅ |
|
| 7 |
LLM-Based Identification of Infostealer Infection Vectors from Screenshots: The Case of Aurora |
利用LLM从信息窃取器感染截图中识别感染向量,以Aurora为例。 |
large language model |
|
|
| 8 |
Accessibility Scout: Personalized Accessibility Scans of Built Environments |
Accessibility Scout:基于LLM的个性化无障碍环境扫描系统 |
large language model |
|
|
| 9 |
A Survey on Code Generation with LLM-based Agents |
综述基于LLM的智能体在代码生成中的应用,涵盖技术、应用、评估与挑战。 |
large language model |
|
|
| 10 |
DeformTune: A Deformable XAI Music Prototype for Non-Musicians |
DeformTune:面向非音乐家的可变形XAI音乐原型系统 |
multimodal |
|
|
| 11 |
A survey of multi-agent geosimulation methodologies: from ABM to LLM |
综述多智能体地理模拟方法:从ABM到LLM的演进与融合 |
large language model |
|
|
| 12 |
MemoCue: Empowering LLM-Based Agents for Human Memory Recall via Strategy-Guided Querying |
提出MemoCue,通过策略引导查询增强LLM在人脑记忆回忆中的表现 |
large language model |
|
|
| 13 |
DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer |
DICE:通过高效知识迁移在LLM Agent中进行动态In-Context示例选择 |
large language model |
|
|
| 14 |
Chatting with your ERP: A Recipe |
提出双Agent架构,利用LLM实现自然语言查询工业ERP系统。 |
large language model |
|
|
| 15 |
LLM4Rail: An LLM-Augmented Railway Service Consulting Platform |
LLM4Rail:一个基于大语言模型的铁路服务咨询平台,提供个性化服务。 |
large language model |
|
|
| 16 |
Trae Agent: An LLM-based Agent for Software Engineering with Test-time Scaling |
Trae Agent:基于LLM的软件工程智能体,具备测试时扩展能力,解决代码缺陷。 |
large language model |
✅ |
|
| 17 |
"I made this (sort of)": Negotiating authorship, confronting fraudulence, and exploring new musical spaces with prompt-based AI music generation |
利用提示词AI音乐生成探索作者身份、欺骗性及音乐新空间 |
large language model |
|
|
| 18 |
DSBC : Data Science task Benchmarking with Context engineering |
DSBC:通过上下文工程对数据科学任务进行基准测试,评估LLM在实际应用中的性能。 |
large language model |
|
|
| 19 |
How Far Are AI Scientists from Changing the World? |
综述AI科学家系统,探讨其在改变科研范式和解决重大挑战中的潜力与瓶颈 |
large language model |
|
|
| 20 |
AutoBridge: Automating Smart Device Integration with Centralized Platform |
AutoBridge:自动化智能设备与中心化平台的集成,无需人工干预。 |
multimodal |
|
|