| 1 |
MISID: A Multimodal Multi-turn Dataset for Complex Intent Recognition in Strategic Deception Games |
MISID:用于策略欺骗游戏中复杂意图识别的多模态多轮数据集 |
large language model multimodal |
|
|
| 2 |
Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models |
提出启发式思维分类提示(HCoT),将专家系统启发式推理融入大语言模型。 |
large language model chain-of-thought |
|
|
| 3 |
MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents |
提出MultiDocFusion以解决长工业文档处理中的信息损失问题 |
large language model multimodal |
|
|
| 4 |
Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks |
CodeRQ-Bench:用于评估LLM在代码任务中推理质量的基准测试与VERA评估器 |
large language model |
✅ |
|
| 5 |
Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints |
提出耦合权重与激活约束(CWAC)方法,防止大语言模型微调过程中的安全性漂移 |
large language model |
|
|
| 6 |
A Scoping Review of Large Language Model-Based Pedagogical Agents |
综述基于大型语言模型的教学代理以推动教育创新 |
large language model |
|
|
| 7 |
Modality-Native Routing in Agent-to-Agent Networks: A Multimodal A2A Protocol Extension |
提出MMA2A,通过模态原生路由提升多智能体系统跨模态推理任务准确率。 |
multimodal |
|
|
| 8 |
Modeling Co-Pilots for Text-to-Model Translation |
提出Text2Model和Text2Zinc,用于文本到组合优化模型的自动翻译。 |
large language model chain-of-thought |
|
|
| 9 |
RPRA: Predicting an LLM-Judge for Efficient but Performant Inference |
提出RPRA框架,提升小模型推理效率,通过预测LLM判决结果实现自适应推理。 |
large language model |
|
|
| 10 |
PAL: Personal Adaptive Learner |
PAL:提出一种个性化自适应学习平台,通过实时互动提升学习体验 |
multimodal |
|
|
| 11 |
LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software |
LogicEval:系统性评估真实软件中逻辑漏洞的自动修复技术 |
large language model |
|
|
| 12 |
CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference |
CoDe-R:通过LLM、理由引导和自适应推理改进反编译器输出,显著提升代码可执行性。 |
large language model |
✅ |
|
| 13 |
BEAM: Bi-level Memory-adaptive Algorithmic Evolution for LLM-Powered Heuristic Design |
提出BEAM以解决现有LHH在启发式设计中的局限性 |
large language model |
|
|
| 14 |
AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance |
AISafetyBenchExplorer:构建AI安全基准评测体系,揭示碎片化测量和薄弱的基准治理问题 |
large language model |
|
|
| 15 |
DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant |
DeepTest 2026:LLM汽车助手评测竞赛,评估故障检测工具 |
large language model |
|
|
| 16 |
IDEA: An Interpretable and Editable Decision-Making Framework for LLMs via Verbal-to-Numeric Calibration |
IDEA框架:通过Verbal-to-Numeric校准,实现LLM决策过程的可解释与可编辑性 |
large language model |
✅ |
|
| 17 |
A Two-Stage LLM Framework for Accessible and Verified XAI Explanations |
提出双阶段LLM框架,提升可解释AI解释的可访问性和可靠性 |
large language model |
|
|
| 18 |
Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments |
提出轻量级序列化遗忘框架,用于在政治敏感环境中部署符合隐私法规的大语言模型。 |
large language model |
|
|
| 19 |
Is Vibe Coding the Future? An Empirical Assessment of LLM Generated Codes for Construction Safety |
评估LLM生成代码在建筑安全中的可靠性,揭示“氛围编程”的潜在风险 |
large language model |
|
|
| 20 |
GAM: Hierarchical Graph-based Agentic Memory for LLM Agents |
提出GAM:一种层级图结构的Agent记忆框架,解决LLM Agent长期交互中的知识保留与适应性问题。 |
large language model |
|
|
| 21 |
Designing Reliable LLM-Assisted Rubric Scoring for Constructed Responses: Evidence from Physics Exams |
设计可靠的LLM辅助物理考试评分系统,提升评分一致性与效率 |
large language model |
|
|
| 22 |
Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities |
提出认知诊断框架以解决大语言模型评估的细粒度能力问题 |
large language model |
|
|
| 23 |
EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture |
EMBER:混合LLM架构中基于学习的脉冲神经网络动态的自主认知行为 |
large language model |
|
|