| 1 |
Data Language Models: A New Foundation Model Class for Tabular Data |
提出数据语言模型(DLM),为表格数据提供原生理解能力,无需预处理。 |
large language model foundation model |
|
|
| 2 |
Multimodal Deep Generative Model for Semi-Supervised Learning under Class Imbalance |
提出一种多模态深度生成模型,解决类别不平衡下的半监督学习问题。 |
multimodal |
|
|
| 3 |
Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models |
提出基于Graphlet结构词汇的知识图谱基础模型,提升零样本迁移能力 |
foundation model |
|
|
| 4 |
NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research |
NeuroAgent:基于LLM的多模态神经影像分析智能体框架 |
multimodal |
|
|
| 5 |
Debiased Multimodal Personality Understanding through Dual Causal Intervention |
提出双重因果干预网络DCAN,解决多模态人格理解中的偏差问题。 |
multimodal |
✅ |
|
| 6 |
Mind the Gap? A Distributional Comparison of Real and Synthetic Priors for Tabular Foundation Models |
对比分析真实与合成表格数据先验分布差异,评估其对表格预训练模型性能的影响 |
foundation model |
|
|
| 7 |
CoupleEvo: Evolving Heuristics for Coupled Optimization Problems Using Large Language Models |
CoupleEvo:利用大语言模型进化耦合优化问题的启发式算法 |
large language model |
✅ |
|
| 8 |
GlazyBench: A Benchmark for Ceramic Glaze Property Prediction and Image Generation |
GlazyBench:用于陶瓷釉料属性预测与图像生成的基准数据集 |
large language model multimodal |
|
|
| 9 |
Super-Level-Set Regression: Conditional Quantiles via Volume Minimization |
提出超水平集回归(SLS),通过最小化体积直接学习条件分位数,解决多元回归问题。 |
multimodal |
|
|
| 10 |
Rethinking Adapter Placement: A Dominant Adaptation Module Perspective |
提出DomLoRA,通过单适配器放置实现参数高效的微调,优于传统LoRA。 |
instruction following |
|
|
| 11 |
MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems |
MASPO:面向LLM多智能体系统的联合提示优化框架 |
large language model |
✅ |
|
| 12 |
Process Matters more than Output for Distinguishing Humans from Machines |
提出CogCAPTCHA30认知任务集,通过过程特征而非输出区分人类与机器。 |
large language model |
|
|
| 13 |
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors |
PrefixGuard:从LLM-Agent轨迹到在线故障预警监控器 |
large language model |
|
|
| 14 |
Constraint Decay: The Fragility of LLM Agents in Backend Code Generation |
揭示LLM Agent在后端代码生成中结构约束下的脆弱性,发现“约束衰减”现象 |
large language model |
|
|
| 15 |
SCRuB: Social Concept Reasoning under Rubric-Based Evaluation |
提出SCRuB框架,用于评估大语言模型在社会概念推理方面的能力。 |
large language model |
|
|
| 16 |
Knowledge Graphs, the Missing Link in Agentic AI-based Formal Verification |
提出基于知识图谱的Agentic AI形式验证方法,提升SystemVerilog断言生成质量。 |
large language model |
|
|
| 17 |
From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work |
提出执行谱系,通过确定性图解决AI原生工作流的可复现性问题 |
large language model |
|
|
| 18 |
Human-AI Co-Evolution and Epistemic Collapse: A Dynamical Systems Perspective |
提出人机协同进化动态系统模型,揭示AI依赖可能导致认知退化风险 |
large language model |
|
|
| 19 |
Fine-Tuning Small Language Models for Solution-Oriented Windows Event Log Analysis |
微调小型语言模型,解决面向解决方案的Windows事件日志分析难题 |
large language model |
|
|
| 20 |
Improving the Efficiency of Language Agent Teams with Adaptive Task Graphs |
LATTE框架通过自适应任务图提升语言代理团队的效率,降低资源消耗。 |
large language model |
|
|
| 21 |
Measuring Black-Box Confidence via Reasoning Trajectories: Geometry, Coverage, and Verbalization |
提出基于推理轨迹几何、覆盖度和文本置信度的黑盒置信度评估方法 |
chain-of-thought |
|
|
| 22 |
Addressing Labelled Data Scarcity: Taxonomy-Agnostic Annotation of PII Values in HTTP Traffic using LLMs |
提出基于LLM的HTTP流量PII值分类标注方法,解决标注数据稀缺和分类体系固定的问题。 |
large language model |
|
|
| 23 |
Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions |
大规模研究揭示LLM生成代码中库版本选择的安全漏洞与兼容性风险 |
large language model |
✅ |
|
| 24 |
OmicsLM: A Multimodal Large Language Model for Multi-Sample Omics Reasoning |
提出多模态大语言模型OmicsLM,实现转录组定量数据与自然语言生物学推理的深度融合。 |
large language model multimodal instruction following |
|
|
| 25 |
ICU-Bench:Benchmarking Continual Unlearning in Multimodal Large Language Models |
提出ICU-Bench基准以评估多模态大模型在持续学习场景下的隐私遗忘能力 |
large language model multimodal |
|
|
| 26 |
Causal Probing for Internal Visual Representations in Multimodal Large Language Models |
提出基于因果干预的探测框架,揭示多模态大模型内部视觉表征的编码机制与缩放规律 |
large language model multimodal |
|
|
| 27 |
AstroAlertBench: Evaluating the Accuracy, Reasoning, and Honesty of Multimodal LLMs in Astronomical Classification |
提出AstroAlertBench基准测试,评估多模态大模型在天文瞬变事件分类中的准确性、推理能力与诚实度。 |
large language model multimodal |
|
|
| 28 |
An Interpretable and Scalable Framework for Evaluating Large Language Models |
提出基于Majorization-Minimization的IRT评估框架,实现大模型能力评估的可解释性与高效扩展 |
large language model |
|
|
| 29 |
Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters |
提出基于Cayley酉矩阵适配器的量子增强大语言模型,在真实量子硬件上实现性能提升 |
large language model |
|
|
| 30 |
Saliency-Aware Regularized Quantization Calibration for Large Language Models |
提出显著性感知正则化量化校准(SARQC),提升大语言模型量化后性能。 |
large language model |
|
|
| 31 |
Systematic Evaluation of Large Language Models for Post-Discharge Clinical Action Extraction |
提出两阶段提示框架,系统评估大语言模型在出院临床行动提取任务中的表现 |
large language model |
|
|
| 32 |
LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution |
提出LCC-LLM框架与LCCD数据集,通过代码中心化检索增强与多任务推理实现精准恶意软件归因 |
large language model |
|
|
| 33 |
DataDignity: Training Data Attribution for Large Language Models |
提出DataDignity框架与FakeWiki基准,通过监督对比学习实现大语言模型训练数据溯源 |
large language model |
|
|
| 34 |
Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning |
通过从LLM推理轨迹中提取搜索树,揭示其规划过程中的近视性特征 |
large language model chain-of-thought |
|
|
| 35 |
Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models |
提出动态边界评估(DBE)框架,通过自适应搜索解决大模型静态基准测试的饱和与偏差问题。 |
large language model instruction following |
|
|
| 36 |
CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs |
提出CrossCult-KIBench基准与MCKI方法,以解决多模态大模型跨文化知识注入与对齐难题 |
large language model multimodal |
|
|
| 37 |
LLM-Driven Design Space Exploration of FPGA-based Accelerators |
提出SECDA-DSE框架,利用大语言模型驱动FPGA加速器的自动化设计空间探索 |
large language model chain-of-thought |
|
|
| 38 |
Null Space Constrained Contrastive Visual Forgetting for MLLM Unlearning |
提出基于零空间约束的对比视觉遗忘方法,实现多模态大模型的高效知识移除 |
large language model multimodal |
|
|
| 39 |
LeakDojo: Decoding the Leakage Threats of RAG Systems |
提出LeakDojo评估框架,系统性揭示检索增强生成(RAG)系统的知识泄露风险 |
large language model instruction following |
✅ |
|
| 40 |
Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs |
提出基于重构-隐蔽权衡的MLLM越狱攻击框架,通过字符移除与关键词干扰提升攻击成功率 |
large language model multimodal |
|
|
| 41 |
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety |
提出SafeHarbor框架:通过分层记忆增强与自演化机制,解决LLM智能体安全防御中的过度拒绝问题。 |
large language model foundation model |
✅ |
|
| 42 |
CITE: Anytime-Valid Statistical Inference in LLM Self-Consistency |
提出CITE算法,实现大模型自洽性采样中任意时刻有效的统计推断与错误控制 |
large language model |
|
|
| 43 |
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems |
提出基于集成卡尔曼反演的主动学习框架,优化大模型多智能体系统的通信结构 |
large language model |
|
|
| 44 |
From Surface Learning to Deep Understanding: A Grounded AI Tutoring System for Moodle |
提出基于检索增强生成(RAG)的Moodle AI教学助手,实现教育内容的精准溯源与苏格拉底式交互。 |
large language model |
|
|
| 45 |
How Well Do LLMs Perform on the Simplest Long-Chain Reasoning Tasks: An Empirical Study on the Equivalence Class Problem |
通过等价类问题(ECP)实证评估大语言模型在长链推理任务中的性能表现 |
large language model |
|
|
| 46 |
LLM-Guided Open Hypothesis Learning from Autonomous Scanning Probe Microscopy Experiments |
提出基于大模型引导的开放式假设学习框架,实现扫描探针显微镜的自主科学发现 |
large language model |
|
|
| 47 |
When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning |
提出SCALAR框架:通过结构化批评-行动循环提升AI在理论物理研究中的推理能力 |
large language model |
|
|
| 48 |
A Self-Healing Framework for Reliable LLM-Based Autonomous Agents |
提出一种面向LLM自主智能体的自愈框架,通过故障检测与动态重规划提升系统鲁棒性。 |
large language model |
|
|
| 49 |
Towards Annotation-Free Validation of MLLMs: A Vision-Language Logical Consistency Metric |
提出视觉-语言逻辑一致性度量(VL-LCM),实现无需标注的MLLM评估 |
large language model |
|
|
| 50 |
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models |
揭示大模型社会角色表征的“粒度轴”:一种微观到宏观的潜在因果方向 |
large language model |
|
|
| 51 |
Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios |
提出Event-Causal RAG框架,通过事件因果图与双重存储机制实现超长视频的因果推理。 |
foundation model |
|
|
| 52 |
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost |
提出Post-Reasoning方法,通过后置推理机制提升非思维链模型性能且零推理成本 |
large language model |
|
|
| 53 |
Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs |
提出基于知识优先的自动启发式设计框架,通过LLM实现组合优化中代码与知识的深度融合。 |
large language model |
|
|
| 54 |
Visual Fingerprints for LLM Generation Comparison |
提出基于视觉指纹的方法,用于比较不同生成条件下LLM的输出倾向。 |
large language model |
|
|
| 55 |
Safety Anchor: Defending Harmful Fine-tuning via Geometric Bottlenecks |
提出安全瓶颈正则化(SBR)方法,通过几何锚点防御大模型的有害微调攻击 |
large language model |
|
|
| 56 |
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System |
提出MAS-Algorithm多智能体工作流,通过模块化协作提升AI算法编程问题的求解能力 |
chain-of-thought |
|
|
| 57 |
Taklif.AI: LLM-Powered Platform for Interest-Based Personalized College Assignments |
提出Taklif.AI平台,利用大语言模型实现基于学生兴趣与文化背景的个性化作业生成 |
large language model |
|
|
| 58 |
CircuitFormer: A Circuit Language Model for Analog Topology Design from Natural Language Prompt |
提出CircuitFormer与电路专用分词器CKT,实现基于自然语言的模拟电路拓扑自动设计 |
large language model |
✅ |
|
| 59 |
ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning |
提出ReFlect推理框架:通过确定性封装实现长程任务的错误检测与自动恢复 |
chain-of-thought |
|
|
| 60 |
An Empirical Study of Proactive Coding Assistants in Real-World Software Development |
揭示主动式编程助手仿真与现实的鸿沟:提出ProCodeBench基准与真实行为数据集 |
large language model |
|
|
| 61 |
Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering |
提出自适应多原则引导(AMPS)框架,解决大型推理模型(LRM)推理链中的安全隐患问题。 |
chain-of-thought |
|
|
| 62 |
Text-Graph Synergy: A Bidirectional Verification and Completion Framework for RAG |
提出TGS-RAG框架,通过文本与知识图谱的双向协同机制解决RAG中的信息孤岛与推理路径丢失问题。 |
large language model |
|
|
| 63 |
From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms |
提出LLM智能体记忆演进框架:从存储、反射到经验的三阶段范式 |
large language model |
|
|
| 64 |
Prober.ai: Gated Inquiry-Based Feedback via LLM-Constrained Personas for Argumentative Writing Development |
提出Prober.ai:基于门控式探究反馈与LLM约束角色的论证写作辅助系统,旨在缓解AI辅助写作带来的认知负债。 |
large language model |
|
|