| 1 |
FMint-SDE: A Multimodal Foundation Model for Accelerating Numerical Simulation of SDEs via Error Correction |
FMint-SDE:基于误差校正的多模态基础模型加速随机微分方程数值模拟 |
foundation model multimodal |
|
|
| 2 |
What a diff makes: automating code migration with large language models |
利用大型语言模型和代码差异自动化代码迁移,提升软件兼容性。 |
large language model |
|
|
| 3 |
Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models |
提出BioRiskEval框架,评估开放生物大模型潜在的生物风险与数据过滤有效性。 |
foundation model |
|
|
| 4 |
Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation |
提出基于RAG的框架,增强LLM在网络安全领域的适应性和可靠性 |
large language model |
|
|
| 5 |
ConnectomeBench: Can LLMs Proofread the Connectome? |
ConnectomeBench:评估LLM在神经连接体校对中的能力,探索AI辅助神经科学新途径 |
large language model multimodal |
✅ |
|
| 6 |
Validity Is What You Need |
Agentic AI应用落地关键在于有效性验证,而非过度依赖大型语言模型 |
large language model foundation model |
|
|
| 7 |
CodeAlignBench: Assessing Code Generation Models on Developer-Preferred Code Adjustments |
CodeAlignBench:评估代码生成模型在开发者偏好代码调整上的性能 |
large language model instruction following |
|
|
| 8 |
ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use |
提出ToolScope框架,解决多模态LLM在长程视觉问答中工具利用的难题 |
large language model multimodal |
|
|
| 9 |
LongCat-Flash-Omni Technical Report |
美团提出LongCat-Flash-Omni,一个5600亿参数的实时音视频交互全模态开源模型 |
multimodal |
|
|
| 10 |
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits |
提出基于CXL的PNM架构,加速百万Token LLM推理,突破GPU显存限制 |
large language model |
|
|
| 11 |
Advancing Cognitive Science with LLMs |
利用大型语言模型(LLMs)促进认知科学的知识整合与理论形式化 |
large language model |
|
|
| 12 |
Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories |
通过分析代码Agent轨迹理解其行为,揭示成功与失败模式 |
large language model |
|
|
| 13 |
Simulating Misinformation Vulnerabilities With Agent Personas |
利用Agent Persona模拟信息误导的脆弱性,评估不同群体对虚假信息的反应。 |
large language model |
|
|
| 14 |
VeriMoA: A Mixture-of-Agents Framework for Spec-to-HDL Generation |
提出VeriMoA框架以解决HDL生成中的噪声传播与推理空间限制问题 |
large language model |
|
|
| 15 |
Mechanics of Learned Reasoning 1: TempoBench, A Benchmark for Interpretable Deconstruction of Reasoning System Performance |
TempoBench:用于可解释地解构推理系统性能的基准测试 |
large language model |
✅ |
|
| 16 |
GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language |
GeoFM:通过形式语言生成合成数据,提升多模态大语言模型几何推理能力 |
large language model |
|
|
| 17 |
Thinking Like a Student: AI-Supported Reflective Planning in a Theory-Intensive Computer Science Course |
利用LLM模拟学生视角,改进理论密集型计算机课程的教学设计 |
large language model |
|
|
| 18 |
An In-depth Study of LLM Contributions to the Bin Packing Problem |
深入研究LLM在装箱问题中的贡献:有效性与局限性分析 |
large language model |
|
|
| 19 |
Inferring multiple helper Dafny assertions with LLMs |
提出DAISY,利用LLM自动推断Dafny程序中缺失的多个辅助断言,提升形式化验证效率。 |
large language model |
|
|
| 20 |
Vintage Code, Modern Judges: Meta-Validation in Low Data Regimes |
提出SparseAlign框架,解决低数据环境下LaaJ的元验证问题 |
large language model |
|
|
| 21 |
Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering |
Fints:通过细粒度实例定制引导,实现LLM的高效推理时个性化 |
large language model |
✅ |
|
| 22 |
Glia: A Human-Inspired AI for Automated Systems Design and Optimization |
Glia:一种受人类启发的人工智能,用于自动化系统设计与优化 |
large language model |
|
|
| 23 |
Expressive Range Characterization of Open Text-to-Audio Models |
提出基于ERA的框架,用于评估开放文本到音频模型的表达范围。 |
multimodal |
|
|