| 1 |
From Literal to Liberal: A Meta-Prompting Framework for Eliciting Human-Aligned Exception Handling in Large Language Models |
提出RID框架,通过元提示提升LLM在异常处理中与人类意图对齐的能力 |
large language model instruction following chain-of-thought |
|
|
| 2 |
Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey |
综述Meta LLaMA模型演进及参数高效微调方法,为LLM研究者提供一站式资源 |
large language model foundation model multimodal |
|
|
| 3 |
MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science |
MatSciBench:构建材料科学领域LLM推理能力评估基准 |
large language model multimodal chain-of-thought |
|
|
| 4 |
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents |
GenCellAgent:基于大语言模型Agent的通用、免训练细胞图像分割 |
large language model |
|
|
| 5 |
From Narratives to Probabilistic Reasoning: Predicting and Interpreting Drivers' Hazardous Actions in Crashes Using Large Language Model |
利用大型语言模型从事故叙述中预测和解释驾驶员危险行为 |
large language model |
|
|
| 6 |
Developing and Validating the Arabic Version of the Attitudes Toward Large Language Models Scale |
开发并验证阿拉伯语版大语言模型态度量表,填补非西方文化背景下LLM认知研究空白。 |
large language model |
|
|
| 7 |
Beyond Postconditions: Can Large Language Models infer Formal Contracts for Automatic Software Verification? |
提出NL2Contract,利用大语言模型推断形式化契约以提升软件自动验证效果 |
large language model |
|
|
| 8 |
A Survey of Vibe Coding with Large Language Models |
对基于大语言模型的“Vibe Coding”范式进行全面综述,揭示其挑战与机遇。 |
large language model |
|
|
| 9 |
Evaluating the Quality of Randomness and Entropy in Tasks Supported by Large Language Models |
评估大语言模型在随机性任务中的随机质量与熵值 |
large language model |
|
|
| 10 |
HiCoTraj:Zero-Shot Demographic Reasoning via Hierarchical Chain-of-Thought Prompting from Trajectory |
HiCoTraj:利用轨迹分层思维链提示实现零样本人口统计推理 |
chain-of-thought |
|
|
| 11 |
Benefits and Limitations of Communication in Multi-Agent Reasoning |
提出多智能体推理理论框架,分析通信对解决复杂任务的益处与局限 |
large language model chain-of-thought |
|
|
| 12 |
Toward Reasoning-Centric Time-Series Analysis |
提出以推理为中心的时间序列分析方法,利用LLM提升复杂场景下的可解释性 |
large language model multimodal |
|
|
| 13 |
Artificial Intelligence Virtual Cells: From Measurements to Decisions across Modality, Scale, Dynamics, and Evaluation |
提出基于Cell-State Latent的AI虚拟细胞框架,提升跨模态、尺度和干预的细胞状态建模能力。 |
foundation model multimodal |
|
|
| 14 |
RAG-Anything: All-in-One RAG Framework |
提出RAG-Anything统一框架,实现跨模态知识的全面检索与增强生成。 |
large language model multimodal |
✅ |
|
| 15 |
EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making |
提出EmboMatrix:一个可扩展的具身决策训练平台,提升LLM的物理世界理解能力。 |
large language model |
|
|
| 16 |
AI Agents as Universal Task Solvers |
将AI Agent视为通用任务求解器,关注时间在学习推理中的关键作用 |
chain-of-thought |
|
|
| 17 |
Deliberate Lab: A Platform for Real-Time Human-AI Social Experiments |
Deliberate Lab:用于实时人机社会实验的开源平台,支持大规模LLM智能体。 |
large language model |
|
|
| 18 |
Development and Benchmarking of a Blended Human-AI Qualitative Research Assistant |
开发并评测了混合人机定性研究助手Muse,提升定性研究效率与一致性。 |
large language model |
|
|
| 19 |
SENTINEL: A Multi-Level Formal Framework for Safety Evaluation of LLM-based Embodied Agents |
SENTINEL:用于评估LLM具身智能体安全性的多层次形式化框架 |
large language model |
|
|
| 20 |
InferA: A Smart Assistant for Cosmological Ensemble Data |
提出InferA,利用多智能体系统辅助分析大规模宇宙学模拟数据。 |
large language model |
|
|
| 21 |
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems |
KVCOMM:面向LLM多智能体系统的高效在线跨上下文KV缓存通信 |
large language model |
|
|
| 22 |
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics |
Ax-Prover:基于深度推理Agent的数学与量子物理定理证明框架 |
large language model |
|
|
| 23 |
Multi-Agent Debate for LLM Judges with Adaptive Stability Detection |
提出基于多智能体辩论的LLM评判框架,提升评判准确性和效率 |
large language model |
|
|
| 24 |
Adaptive Generation of Bias-Eliciting Questions for LLMs |
提出自适应偏差诱导问题生成框架CAB,用于评估大型语言模型中的偏见。 |
large language model |
|
|
| 25 |
MTOS: A LLM-Driven Multi-topic Opinion Simulation Framework for Exploring Echo Chamber Dynamics |
提出MTOS框架,利用LLM模拟多主题意见演化,探索回音室效应 |
large language model |
|
|
| 26 |
(R)evolution of Programming: Vibe Coding as a Post-Coding Paradigm |
探索Vibe Coding:一种基于情感驱动的后编程范式,重塑开发者与AI的交互模式 |
large language model |
|
|
| 27 |
PromptLocate: Localizing Prompt Injection Attacks |
PromptLocate:首个用于定位提示注入攻击的方法 |
large language model |
|
|
| 28 |
GOAT: A Training Framework for Goal-Oriented Agent with Tools |
GOAT:一种用于训练具备工具使用能力的面向目标Agent的框架 |
large language model |
|
|
| 29 |
ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization |
ThinkPilot:通过自动优化Think-prefixes来引导推理模型 |
instruction following |
✅ |
|
| 30 |
Empowering LLM Agents with Geospatial Awareness: Toward Grounded Reasoning for Wildfire Response |
提出GAL框架,赋予LLM地理空间感知能力,用于野火响应中的情境推理。 |
large language model |
|
|