| 1 |
Towards Explainable Conversational AI for Early Diagnosis with Large Language Models |
提出基于LLM的对话式AI,用于早期诊断并提升可解释性 |
large language model chain-of-thought |
|
|
| 2 |
Quantifying Laziness, Decoding Suboptimality, and Context Degradation in Large Language Models |
量化大语言模型的惰性、次优解码和上下文退化现象 |
large language model instruction following |
|
|
| 3 |
Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models |
提出注意力距离以解决现有模糊测试中的逻辑关系缺失问题 |
large language model |
✅ |
|
| 4 |
SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories |
SWE-Bench++:一个可扩展的软件工程基准测试框架,从开源仓库自动生成测试用例。 |
large language model |
|
|
| 5 |
Holistic Evaluation of State-of-the-Art LLMs for Code Generation |
全面评估大型语言模型在代码生成任务中的性能表现 |
large language model |
|
|
| 6 |
Rethinking Multi-Agent Intelligence Through the Lens of Small-World Networks |
利用小世界网络优化多智能体系统通信拓扑,提升共识稳定性 |
large language model |
|
|
| 7 |
Specification and Detection of LLM Code Smells |
定义并检测LLM代码异味,提升软件系统质量 |
large language model |
|
|
| 8 |
LLM-based Behaviour Driven Development for Hardware Design |
提出基于LLM的硬件设计行为驱动开发方法,提升测试验证效率 |
large language model |
|
|
| 9 |
Eidoku: A Neuro-Symbolic Verification Gate for LLM Reasoning via Structural Constraint Satisfaction |
Eidoku:通过结构约束满足实现LLM推理的神经符号验证门 |
large language model |
|
|
| 10 |
UmniBench: Unified Understand and Generation Model Oriented Omni-dimensional Benchmark |
提出 UmniBench,用于全面评估统一多模态模型的理解、生成和编辑能力。 |
multimodal |
|
|
| 11 |
PILAR: Personalizing Augmented Reality Interactions with LLM-based Human-Centric and Trustworthy Explanations for Daily Use Cases |
PILAR:利用LLM提供个性化AR交互解释,提升日常使用场景的用户体验和信任度 |
large language model |
|
|
| 12 |
QMBench: A Research Level Benchmark for Quantum Materials Research |
QMBench:用于评估大语言模型在量子材料研究中能力的基准测试 |
large language model |
|
|