| 1 |
Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning |
提出一种资源高效的LLaMA-3.2-3B微调方法,用于提升医疗领域的CoT推理能力 |
large language model chain-of-thought |
|
|
| 2 |
Finish First, Perfect Later: Test-Time Token-Level Cross-Validation for Diffusion Large Language Models |
提出Tolerator,通过Token级交叉验证优化扩散大语言模型的解码策略。 |
large language model |
|
|
| 3 |
Imperceptible Jailbreaking against Large Language Models |
提出基于不可见Unicode变异选择器的LLM越狱攻击方法 |
large language model |
✅ |
|
| 4 |
Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation" |
复现XRec:基于大语言模型的可解释推荐框架,并探索MoE嵌入的影响 |
large language model |
✅ |
|
| 5 |
FocusMed: A Large Language Model-based Framework for Enhancing Medical Question Summarization with Focus Identification |
FocusMed:基于大语言模型的医疗问答摘要框架,增强焦点识别能力 |
large language model |
✅ |
|
| 6 |
A Lightweight Large Language Model-Based Multi-Agent System for 2D Frame Structural Analysis |
提出基于轻量级大语言模型的多Agent系统,用于2D框架结构分析的自动化有限元建模。 |
large language model |
|
|
| 7 |
Guided Query Refinement: Multimodal Hybrid Retrieval with Test-Time Optimization |
提出引导式查询优化(GQR),提升视觉文档检索效率与性能。 |
multimodal |
✅ |
|
| 8 |
FedSRD: Sparsify-Reconstruct-Decompose for Communication-Efficient Federated Large Language Models Fine-Tuning |
提出FedSRD框架,通过稀疏化-重构-分解,解决联邦LLM微调中的通信瓶颈问题。 |
large language model |
|
|
| 9 |
Large Language Models Preserve Semantic Isotopies in Story Continuations |
研究表明,大型语言模型在故事续写中能够保持语义同位素 |
large language model |
|
|
| 10 |
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs |
提出SwiReasoning,通过显隐式推理切换,提升LLM推理性能和效率。 |
large language model chain-of-thought |
|
|
| 11 |
Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper) |
探究提示语礼貌程度对LLM准确率的影响:不礼貌提示效果更佳 |
large language model |
|
|
| 12 |
WeatherArchive-Bench: Benchmarking Retrieval-Augmented Reasoning for Historical Weather Archives |
提出WeatherArchive-Bench,用于评估历史天气档案的检索增强推理能力。 |
large language model |
|
|
| 13 |
RAG Makes Guardrails Unsafe? Investigating Robustness of Guardrails under RAG-style Contexts |
研究表明RAG上下文增强会降低LLM安全防护模型的可靠性 |
large language model |
|
|
| 14 |
TeachLM: Post-Training LLMs for Education Using Authentic Learning Data |
TeachLM:利用真实学习数据后训练LLM,提升教育领域应用效果 |
large language model |
|
|
| 15 |
Do LLMs Align with My Task? Evaluating Text-to-SQL via Dataset Alignment |
通过数据集对齐评估LLM在Text-to-SQL任务中的性能 |
large language model |
|
|
| 16 |
SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants? |
提出SimulatorArena以评估用户模拟器在AI助手多轮对话中的可靠性 |
large language model |
|
|
| 17 |
Camellia: Benchmarking Cultural Biases in LLMs for Asian Languages |
Camellia:用于评估LLM在亚洲语言中文化偏见的基准测试 |
large language model |
|
|
| 18 |
Proactive defense against LLM Jailbreak |
ProAct:一种针对LLM越狱攻击的主动防御框架 |
large language model |
|
|
| 19 |
Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches |
综述检索增强代码生成,聚焦于仓库级别代码生成方法 |
large language model |
|
|
| 20 |
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA |
提出PsiloQA,一个多语言跨度级幻觉检测数据集,并评估多种检测方法。 |
large language model |
|
|
| 21 |
Instability in Downstream Task Performance During LLM Pretraining |
通过检查点集成提升LLM预训练下游任务性能的稳定性 |
large language model |
|
|
| 22 |
Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models |
提出Gricean Maxims基准,评估小规模语言模型(BabyLMs)的语用推理能力。 |
large language model |
|
|
| 23 |
A Low-Resource Speech-Driven NLP Pipeline for Sinhala Dyslexia Assistance |
提出一套面向僧伽罗语阅读障碍辅助的低资源语音驱动NLP流水线 |
multimodal |
|
|
| 24 |
A novel hallucination classification framework |
提出一种新型幻觉分类框架,用于自动检测大语言模型推理过程中的幻觉。 |
large language model |
|
|
| 25 |
GenQuest: An LLM-based Text Adventure Game for Language Learners |
GenQuest:基于LLM的文本冒险游戏,助力语言学习 |
large language model |
|
|
| 26 |
On the Role of Unobserved Sequences on Sample-based Uncertainty Quantification for LLMs |
强调未观测序列在LLM不确定性量化中的作用,并建议未来研究纳入考虑。 |
large language model |
|
|