| 1 |
How does Chain of Thought Think? Mechanistic Interpretability of Chain-of-Thought Reasoning with Sparse Autoencoding |
通过稀疏自编码器,研究思维链(CoT)推理的机制可解释性 |
large language model chain-of-thought |
|
|
| 2 |
FeynTune: Large Language Models for High-Energy Theory |
FeynTune:利用大型语言模型进行高能理论研究的专用模型微调 |
large language model |
|
|
| 3 |
The Moral Gap of Large Language Models |
揭示大型语言模型在道德推理上的局限性,表明微调模型优于提示工程。 |
large language model |
|
|
| 4 |
Evaluating Large Language Models (LLMs) in Financial NLP: A Comparative Study on Financial Report Analysis |
对比评估大型语言模型在金融报告分析中的表现,揭示GPT模型的优越性。 |
large language model |
|
|
| 5 |
Deep Learning Approaches for Multimodal Intent Recognition: A Survey |
综述深度学习在多模态意图识别中的应用,分析方法、数据集、挑战与未来方向。 |
multimodal |
|
|
| 6 |
GIIFT: Graph-guided Inductive Image-free Multimodal Machine Translation |
提出GIIFT框架,利用图结构引导的归纳式无图多模态机器翻译,显著提升翻译效果。 |
multimodal |
|
|
| 7 |
StyleAdaptedLM: Enhancing Instruction Following Models with Efficient Stylistic Transfer |
StyleAdaptedLM:利用高效风格迁移增强指令跟随模型 |
instruction following |
|
|
| 8 |
SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models |
SCOPE:通过随机和反偏置选项放置评估大语言模型 |
large language model |
|
|
| 9 |
HIVMedQA: Benchmarking large language models for HIV medical decision support |
HIVMedQA:评估大型语言模型在HIV医疗决策支持中的性能 |
large language model |
|
|
| 10 |
Hybrid and Unitary PEFT for Resource-Efficient Large Language Models |
提出混合正交稳定与梯度对齐PEFT方法,高效微调大规模语言模型 |
large language model |
|
|
| 11 |
Synthetic Data Generation for Phrase Break Prediction with Large Language Model |
利用大型语言模型生成合成数据,解决短语停顿预测的数据标注难题 |
large language model |
|
|
| 12 |
EH-Benchmark Ophthalmic Hallucination Benchmark and Agent-Driven Top-Down Traceable Reasoning Workflow |
提出EH-Benchmark眼科幻觉基准及Agent驱动的可溯源推理工作流,提升眼科诊断准确性。 |
large language model multimodal |
✅ |
|
| 13 |
GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs |
GrAInS:利用梯度归因实现LLM和VLM的推理时引导 |
large language model multimodal |
|
|
| 14 |
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit |
提出BadReasoner,通过可控的过度推理后门攻击大型推理模型 |
large language model chain-of-thought |
✅ |
|
| 15 |
TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards |
TRPrompt:从文本奖励引导查询感知的提示优化,提升LLM推理能力 |
large language model |
|
|
| 16 |
Not All Features Deserve Attention: Graph-Guided Dependency Learning for Tabular Data Generation with Language Models |
提出GraDe,利用图引导依赖学习提升语言模型在表格数据生成中的性能。 |
large language model |
|
|
| 17 |
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy |
CLEAR:利用LLM作为评判者进行简易的错误分析 |
large language model |
|
|
| 18 |
Trusted Knowledge Extraction for Operations and Maintenance Intelligence |
针对航空运维情报,提出可信知识抽取方法,解决数据保密与集成难题。 |
large language model |
|
|
| 19 |
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs |
AQuilt:通过逻辑推理与自检,低成本合成高质量领域专家LLM训练数据 |
large language model |
✅ |
|
| 20 |
GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface |
提出GLiNER2以解决信息提取任务的多样性与效率问题 |
large language model |
✅ |
|
| 21 |
Enhancing RAG Efficiency with Adaptive Context Compression |
提出ACC-RAG,通过自适应上下文压缩提升RAG效率 |
large language model |
|
|
| 22 |
CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages |
CodeMixBench:提出多语言混合代码能力评测基准,揭示LLM在跨语系混合场景下的性能瓶颈。 |
large language model |
✅ |
|
| 23 |
Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs |
提出WINO算法,通过可撤销解码显著提升扩散大语言模型的速度与质量。 |
large language model |
|
|
| 24 |
Protecting Vulnerable Voices: Synthetic Dataset Generation for Self-Disclosure Detection |
提出一种基于LLM的合成数据集生成方法,用于保护社交平台中易受攻击群体的个人信息自披露行为。 |
large language model |
|
|
| 25 |
Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation |
提出ARG-Designer,通过自回归图生成自动设计多智能体通信拓扑,提升任务适应性。 |
large language model |
✅ |
|
| 26 |
Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation |
研究表明指令微调提升LLM可用性的同时,显著增加其对虚假信息的接受度 |
large language model |
|
|
| 27 |
Safeguarding RAG Pipelines with GMTP: A Gradient-based Masked Token Probability Method for Poisoned Document Detection |
提出GMTP方法,通过梯度分析和掩码概率检测RAG管道中的恶意文档。 |
large language model |
|
|
| 28 |
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning |
MathOPEval:一个用于评估MLLM在数学推理中视觉操作能力的细粒度基准 |
large language model |
|
|
| 29 |
Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs |
提出基于提示的方法,提升LLM生成合成评论的多样性并保护隐私 |
large language model |
|
|
| 30 |
Resource Consumption Red-Teaming for Large Vision-Language Models |
提出RECITE,通过视觉引导优化实现LVLM的资源消耗红队测试。 |
large language model |
|
|
| 31 |
NeuralDB: Scaling Knowledge Editing in LLMs to 100,000 Facts with Neural KV Database |
提出NeuralDB以高效编辑大规模知识库中的事实 |
large language model |
|
|