| 1 |
Knowing When Not to Answer: Evaluating Abstention in Multimodal Reasoning Systems |
提出MM-AQA基准,评估多模态推理系统中有效拒绝回答的能力 |
multimodal |
|
|
| 2 |
IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation |
提出IUQ框架,通过提问式不确定性量化提升长文本大语言模型生成结果的可信度。 |
large language model |
✅ |
|
| 3 |
Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models |
提出K-Token Merging,通过潜在空间压缩减少LLM长文本处理的计算成本。 |
large language model |
|
|
| 4 |
QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies |
提出QuantCode-Bench基准,评估大语言模型生成可执行量化交易策略的能力 |
large language model |
|
|
| 5 |
Blinded Multi-Rater Comparative Evaluation of a Large Language Model and Clinician-Authored Responses in CGM-Informed Diabetes Counseling |
检索增强LLM在CGM指导的糖尿病咨询中表现优于临床医生 |
large language model |
|
|
| 6 |
SPAGBias: Uncovering and Tracing Structured Spatial Gender Bias in Large Language Models |
提出SPAGBias框架,揭示并追踪大语言模型中结构化的空间性别偏见 |
large language model |
|
|
| 7 |
Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models |
分析视觉语言模型推理动态,揭示其对不同模态依赖的局限性 |
multimodal chain-of-thought |
|
|
| 8 |
CURaTE: Continual Unlearning in Real Time with Ensured Preservation of LLM Knowledge |
CURaTE:提出一种支持实时持续卸载并保证LLM知识保留的框架 |
large language model |
|
|
| 9 |
RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding |
提出RACER,融合检索与logits信息加速LLM推断,无需训练。 |
large language model |
✅ |
|
| 10 |
Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding |
提出将Schema关键词作为指令通道,提升约束解码下结构化生成性能。 |
large language model |
|
|
| 11 |
CausalDetox: Causal Head Selection and Intervention for Language Model Detoxification |
CausalDetox:通过因果头选择与干预实现语言模型解毒 |
large language model |
|
|
| 12 |
From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning |
提出SpecGuard,通过内部信号进行验证感知推测解码,提升多步推理效率。 |
large language model |
|
|
| 13 |
Segment-Level Coherence for Robust Harmful Intent Probing in LLMs |
提出基于片段一致性的流式探针,提升LLM在CBRN领域恶意意图检测的鲁棒性。 |
large language model |
|
|
| 14 |
Modeling LLM Unlearning as an Asymmetric Two-Task Learning Problem |
将LLM非学习建模为非对称双任务学习问题,提升知识遗忘与能力保持的平衡 |
large language model |
|
|
| 15 |
Pushing the Boundaries of Multiple Choice Evaluation to One Hundred Options |
提出大规模选项评估方法,用于更可靠地评估语言模型在密集干扰下的性能。 |
large language model |
|
|
| 16 |
PeerPrism: Peer Evaluation Expertise vs Review-writing AI |
提出PeerPrism基准,用于评估同行评审中人类专家与AI写作的贡献区分。 |
large language model |
✅ |
|