| 1 |
Performance of Large Language Models in Supporting Medical Diagnosis and Treatment |
评估大型语言模型在医疗诊断和治疗中的表现,并分析其成本效益。 |
large language model chain-of-thought |
|
|
| 2 |
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge |
VisualPuzzles:解耦多模态推理评估与领域知识,专注通用推理能力 |
large language model multimodal |
|
|
| 3 |
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models |
提出LLM-SRBench以解决科学方程发现评估问题 |
large language model |
|
|
| 4 |
You've Changed: Detecting Modification of Black-Box Large Language Models |
提出一种基于文本特征分布比较的黑盒大语言模型修改检测方法 |
large language model |
|
|
| 5 |
Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models |
提出潜空间推理基准,评估大语言模型在隐式推理中的能力。 |
large language model |
|
|
| 6 |
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA |
CheckboxQA数据集:解决大语言模型在复选框理解上的盲点 |
large language model |
✅ |
|
| 7 |
Probing then Editing Response Personality of Large Language Models |
提出一种探查与编辑框架,用于控制大型语言模型的回应人格。 |
large language model |
✅ |
|
| 8 |
The Mirage of Performance Gains: Why Contrastive Decoding Fails to Mitigate Object Hallucinations in MLLMs? |
揭示对比解码的局限性:无法有效缓解多模态大语言模型中的对象幻觉问题 |
large language model multimodal |
|
|
| 9 |
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning |
提出Weight-of-Thought推理,利用神经网络权重增强LLM推理能力 |
large language model chain-of-thought |
|
|
| 10 |
HELIOS: Adaptive Model And Early-Exit Selection for Efficient LLM Inference Serving |
HELIOS:面向高效LLM推理服务的自适应模型与提前退出选择框架 |
large language model |
|
|
| 11 |
LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks |
LLM卸载研究揭示:现有基准测试中存在超乎预期的 Coreset 效应 |
large language model |
✅ |
|
| 12 |
Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data |
RefAlign:无需二元人类偏好数据,利用参考答案对齐语言模型 |
large language model |
✅ |
|
| 13 |
HalluSearch at SemEval-2025 Task 3: A Search-Enhanced RAG Pipeline for Hallucination Detection |
HalluSearch:一种搜索增强的RAG流水线,用于幻觉检测 |
large language model |
|
|
| 14 |
C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation |
提出C-FAITH:一个中文细粒度幻觉评估基准,用于自动化评估大语言模型的幻觉问题。 |
large language model |
|
|
| 15 |
Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment |
提出CRV+CogPO框架,提升小模型在复杂推理任务中的认知对齐能力 |
chain-of-thought |
|
|
| 16 |
DioR: Adaptive Cognitive Detection and Contextual Retrieval Optimization for Dynamic Retrieval-Augmented Generation |
提出DioR,自适应认知检测与上下文检索优化动态RAG,提升LLM生成质量。 |
large language model |
|
|
| 17 |
DataPuzzle: Breaking Free from the Hallucinated Promise of LLMs in Data Analysis |
DataPuzzle:提出多智能体框架,解决LLM在数据分析中幻觉问题,提升可信度。 |
large language model |
|
|
| 18 |
C-MTCSD: A Chinese Multi-Turn Conversational Stance Detection Dataset |
提出C-MTCSD:一个大规模中文多轮对话立场检测数据集,用于提升社交媒体分析。 |
large language model |
|
|
| 19 |
Guiding Reasoning in Small Language Models with LLM Assistance |
SMART框架:利用LLM辅助小模型进行复杂推理 |
large language model |
|
|
| 20 |
Augmented Relevance Datasets with Fine-Tuned Small LLMs |
利用微调的小型LLM增强相关性数据集,提升排序模型性能 |
large language model |
|
|