| 1 |
Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities |
提出MOSAIC基准,用于细粒度评估大语言模型指令遵循能力 |
large language model instruction following |
|
|
| 2 |
RareAlert: Aligning heterogeneous large language model reasoning for early rare disease risk screening |
RareAlert:对齐异构大语言模型推理,用于罕见病早期风险筛查 |
large language model |
|
|
| 3 |
Mitigating the OWASP Top 10 For Large Language Models Applications using Intelligent Agents |
利用智能代理缓解大语言模型应用中OWASP Top 10安全风险 |
large language model |
|
|
| 4 |
Think-Augmented Function Calling: Improving LLM Parameter Accuracy Through Embedded Reasoning |
提出Think-Augmented Function Calling,通过嵌入式推理提升LLM函数调用参数准确率 |
large language model chain-of-thought |
|
|
| 5 |
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models |
TSRBench:一个综合性的多任务多模态时间序列推理基准,用于评估通用模型。 |
multimodal |
✅ |
|
| 6 |
A Balanced Neuro-Symbolic Approach for Commonsense Abductive Logic |
提出一种平衡神经符号的归纳逻辑方法,提升常识推理能力 |
large language model |
|
|
| 7 |
Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs |
揭示LLM训练稳定性与生成质量的矛盾:稳定训练导致语言结构退化 |
large language model |
|
|
| 8 |
Design Techniques for LLM-Powered Interactive Storytelling: A Case Study of the Dramamancer System |
Dramamancer:基于LLM的交互式叙事设计技术研究 |
large language model |
|
|
| 9 |
FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory |
提出FadeMem,一种受生物启发的记忆架构,提升Agent记忆效率并减少存储占用。 |
large language model |
|
|
| 10 |
FastInsight: Fast and Insightful Retrieval via Fusion Operators for Graph RAG |
FastInsight:通过融合算子实现图RAG的快速且深刻的检索 |
large language model |
|
|
| 11 |
A Generative AI-Driven Reliability Layer for Action-Oriented Disaster Resilience |
Climate RADAR:基于生成式AI的行动导向型灾害韧性可靠性层 |
large language model |
|
|
| 12 |
TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance |
TAM-Eval:用于评估LLM在自动化单元测试维护中的性能的框架与基准 |
large language model |
✅ |
|
| 13 |
Generative AI in Saudi Arabia: A National Survey of Adoption, Risks, and Public Perceptions |
沙特阿拉伯生成式AI应用调查:揭示采用现状、风险认知与公众期望 |
multimodal |
|
|
| 14 |
EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization |
提出EvolVE框架以优化LLM驱动的Verilog生成 |
large language model |
✅ |
|