| 1 |
Moonbeam: A MIDI Foundation Model Using Both Absolute and Relative Music Attributes |
Moonbeam:一种利用绝对和相对音乐属性的MIDI基础模型 |
foundation model |
|
|
| 2 |
An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology |
CHROMA:用于精准肿瘤学中染色体核型分析的通用基础模型 |
foundation model |
|
|
| 3 |
Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: An Empirical Study on Popular Open-Source Projects |
利用大型语言模型检测Python开源项目中的命令注入漏洞 |
large language model |
|
|
| 4 |
Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent Optimization |
提出基于密度驱动的群体智能增强推理框架,提升LLM在复杂推理场景下的优化能力。 |
large language model chain-of-thought |
|
|
| 5 |
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval |
提出安全上下文检索(SCR)方法,提升LLM抵抗恶意越狱攻击的防御能力 |
large language model |
|
|
| 6 |
CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution |
CRAKEN:基于知识的LLM网络安全Agent,提升漏洞检测与利用能力 |
large language model |
✅ |
|
| 7 |
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges |
提出ModelingAgent,桥接LLM与数学建模,解决真实世界复杂问题 |
large language model |
|
|
| 8 |
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior |
研究LLM Agent记忆管理对长期性能的影响,揭示经验跟随行为及应对策略 |
large language model |
|
|
| 9 |
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution |
SPhyR:提出基于材料分布的空间物理推理基准测试,评估LLM的推理能力 |
large language model |
|
|
| 10 |
NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction |
NEXT-EVAL:提出一个Web数据记录抽取的综合评估框架,支持传统算法和LLM的公平比较。 |
large language model |
|
|
| 11 |
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey |
构建大型音频语言模型评测体系:提出全面评估框架与系统性分类 |
large language model |
|
|
| 12 |
PhyX: Does Your Model Have the "Wits" for Physical Reasoning? |
提出PhyX:一个大规模物理推理基准,揭示现有模型在物理场景理解上的不足。 |
multimodal |
✅ |
|
| 13 |
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement |
HybridProver:结合LLM驱动的证明合成与细化的定理证明框架 |
large language model |
|
|
| 14 |
A Qualitative Investigation into LLM-Generated Multilingual Code Comments and Automatic Evaluation Metrics |
评估LLM生成的多语言代码注释质量,揭示现有自动评估指标的局限性 |
large language model |
|
|
| 15 |
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries |
提出IKEA隐式知识提取攻击,通过良性查询从RAG系统中提取知识。 |
large language model |
|
|
| 16 |
ClickSight: Interpreting Student Clickstreams to Reveal Insights on Learning Strategies via LLMs |
ClickSight:利用LLM解释学生点击流数据,揭示学习策略 |
large language model |
|
|
| 17 |
Adaptive Plan-Execute Framework for Smart Contract Security Auditing |
提出SmartAuditFlow,通过自适应规划执行框架提升智能合约安全审计能力。 |
large language model |
|
|
| 18 |
ThinkRec: Thinking-based recommendation via LLM |
ThinkRec:通过大语言模型进行基于思考的推荐,提升推荐精度和可解释性。 |
large language model |
✅ |
|