| 1 |
BEAR: Towards Beam-Search-Aware Optimization for Recommendation with Large Language Models |
BEAR:面向大语言模型推荐,提出波束搜索感知的优化方法 |
large language model |
|
|
| 2 |
Evaluating Large Language Models for Security Bug Report Prediction |
评估大型语言模型在安全漏洞报告预测中的应用 |
large language model |
|
|
| 3 |
RAudit: A Blind Auditing Protocol for Large Language Model Reasoning |
提出RAudit以解决大型语言模型推理中的盲审计问题 |
large language model |
|
|
| 4 |
Chain-of-thought obfuscation learned from output supervision can generalise to unseen tasks |
基于输出监督的思维链混淆学习可泛化至未见任务 |
chain-of-thought |
|
|
| 5 |
Eroding the Truth-Default: A Causal Analysis of Human Susceptibility to Foundation Model Hallucinations and Disinformation in the Wild |
提出JudgeGPT和RogueGPT双轴框架,分析人类对大型模型幻觉和虚假信息的易感性 |
foundation model |
|
|
| 6 |
Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models |
提出基于嵌入空间几何的隐写术与检测方法,提升大语言模型隐蔽通信安全性。 |
large language model |
|
|
| 7 |
Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling |
提出SABER方法,通过小样本量预测大规模语言模型在Best-of-N采样下的对抗风险。 |
large language model |
|
|
| 8 |
EntroCut: Entropy-Guided Adaptive Truncation for Efficient Chain-of-Thought Reasoning in Small-scale Large Reasoning Models |
提出EntroCut,通过熵引导自适应截断提升小规模LRM的CoT推理效率。 |
chain-of-thought |
|
|
| 9 |
Make Anything Match Your Target: Universal Adversarial Perturbations against Closed-Source MLLMs via Multi-Crop Routed Meta Optimization |
提出MCRMO-Attack,提升通用目标可迁移对抗攻击在闭源多模态大语言模型上的成功率。 |
large language model multimodal |
|
|
| 10 |
Quantifying Model Uniqueness in Heterogeneous AI Ecosystems |
提出统计框架以审计异构AI生态系统中的模型独特性 |
large language model foundation model |
|
|
| 11 |
Alignment among Language, Vision and Action Representations |
研究揭示语言、视觉和动作表征之间的对齐现象,促进跨模态知识迁移。 |
embodied AI large language model |
|
|
| 12 |
Darwinian Memory: A Training-Free Self-Regulating Memory System for GUI Agent Evolution |
提出达尔文记忆系统,解决GUI Agent在长程任务中的上下文不足问题 |
large language model multimodal |
|
|
| 13 |
On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study |
探讨代码注释对自动化修复bug的影响 |
large language model |
|
|
| 14 |
UCPO: Uncertainty-Aware Policy Optimization |
提出UCPO框架,解决LLM中基于不确定性的强化学习策略优化中的偏差问题。 |
large language model |
|
|
| 15 |
High-quality generation of dynamic game content via small language models: A proof of concept |
提出一种基于小语言模型的高质量动态游戏内容生成方法,解决叙事连贯性和高运营成本问题。 |
large language model |
|
|
| 16 |
OrLog: Resolving Complex Queries with LLMs and Probabilistic Reasoning |
OrLog:结合LLM和概率推理解决复杂查询,提升检索精度。 |
large language model |
|
|
| 17 |
From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics |
ContextMATH基准测试揭示LLM在情境数学推理中问题建模能力的不足 |
large language model |
|
|
| 18 |
Protecting Private Code in IDE Autocomplete using Differential Privacy |
利用差分隐私保护IDE代码自动补全中的私有代码 |
large language model |
|
|
| 19 |
Game-Theoretic Co-Evolution for LLM-Based Heuristic Discovery |
提出ASRO框架以解决LLM基础启发式发现中的过拟合问题 |
large language model |
|
|
| 20 |
MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering |
MEnvAgent:用于可验证软件工程的可扩展多语言环境构建框架 |
large language model |
✅ |
|
| 21 |
Conditional Performance Guarantee for Large Reasoning Models |
提出G-PAC推理框架,为大模型推理提供分组条件下的性能保证,提升效率。 |
chain-of-thought |
|
|
| 22 |
How Far Can Pretrained LLMs Go in Symbolic Music? Controlled Comparisons of Supervised and Preference-based Adaptation |
对比监督学习与偏好学习,评估预训练LLM在符号音乐领域的潜力 |
large language model |
|
|
| 23 |
Qualitative Evaluation of LLM-Designed GUI |
评估LLM设计的GUI:可用性、可定制性与用户需求匹配度分析 |
large language model |
|
|
| 24 |
AutoRefine: From Trajectories to Reusable Expertise for Continual LLM Agent Refinement |
AutoRefine:通过轨迹提炼可复用经验,持续优化LLM Agent |
large language model |
|
|
| 25 |
Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support |
提出任务感知LLM委员会(TALC),用于自适应决策支持。 |
large language model |
|
|
| 26 |
MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics |
MCP-Diag:一种确定性的、协议驱动的AI原生网络诊断架构 |
large language model |
|
|
| 27 |
SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly |
SYMPHONY:异构语言模型协同的多智能体规划框架,提升复杂任务解决能力 |
large language model |
|
|
| 28 |
PerfGuard: A Performance-Aware Agent for Visual Content Generation |
PerfGuard:一种面向视觉内容生成的性能感知Agent框架 |
large language model |
✅ |
|
| 29 |
Decoding in Geometry: Alleviating Embedding-Space Crowding for Complex Reasoning |
提出CraEG,通过几何引导重加权缓解LLM推理中嵌入空间拥挤问题 |
large language model |
|
|