| 1 |
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training |
提出InSerter:一种基于非监督交错预训练的语音指令跟随方法 |
large language model instruction following |
|
|
| 2 |
MCiteBench: A Multimodal Benchmark for Generating Text with Citations |
提出MCiteBench,评估多模态大语言模型生成带引用文本的能力,解决幻觉问题。 |
large language model multimodal |
|
|
| 3 |
InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model |
InfiniSST:利用大语言模型实现无界语音的同步翻译 |
large language model |
✅ |
|
| 4 |
Multilingual Relative Clause Attachment Ambiguity Resolution in Large Language Models |
研究大型语言模型在多语种环境下关系从句附着歧义消解能力,揭示其跨语言处理差异。 |
large language model |
|
|
| 5 |
Large Language Models for Multilingual Previously Fact-Checked Claim Detection |
评估大型语言模型在多语种已核实声明检测中的性能 |
large language model |
|
|
| 6 |
Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent |
提出GA-Rollback框架,解决LLM Agent一步到位推理中的错误传播问题 |
large language model |
|
|
| 7 |
AILS-NTUA at SemEval-2025 Task 4: Parameter-Efficient Unlearning for Large Language Models using Data Chunking |
提出基于数据分块的参数高效LLM遗忘方法,解决敏感内容移除问题。 |
large language model |
|
|
| 8 |
AILS-NTUA at SemEval-2025 Task 3: Leveraging Large Language Models and Translation Strategies for Multilingual Hallucination Detection |
利用大语言模型和翻译策略解决多语言幻觉检测问题 |
large language model |
|
|
| 9 |
MedEthicEval: Evaluating Large Language Models Based on Chinese Medical Ethics |
MedEthicEval:构建中文医学伦理评估基准,评估大型语言模型伦理推理能力 |
large language model |
|
|
| 10 |
Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm |
提出Add-One-In增量采样法,利用LLM进行高质量、多样性大模型训练数据选择。 |
large language model |
✅ |
|
| 11 |
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models |
PromptCoT:合成奥林匹克级别数学题,提升大语言模型数学推理能力 |
large language model |
✅ |
|
| 12 |
FairSense-AI: Responsible AI Meets Sustainability |
FairSense-AI:一个负责任且可持续的多模态AI偏见检测与缓解框架 |
large language model multimodal |
✅ |
|
| 13 |
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale |
提出OmniSQL框架,大规模合成高质量Text-to-SQL数据,并训练开源模型。 |
large language model chain-of-thought |
|
|
| 14 |
Prompting Science Report 1: Prompt Engineering is Complicated and Contingent |
提示工程复杂且依赖情境:基准测试标准选择和提示策略对大语言模型性能影响显著 |
large language model |
|
|
| 15 |
Multi-Agent System for AI-Assisted Extraction of Narrative Arcs in TV Series |
提出一种多智能体系统,用于AI辅助提取电视剧叙事弧线。 |
multimodal |
|
|
| 16 |
LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation |
提出LINGOLY-TOO基准,通过模板化正字法混淆解耦语言模型中的推理与知识 |
large language model |
|
|
| 17 |
Implicit Bias in LLMs: A Survey |
综述隐性偏见在大型语言模型中的影响及检测方法 |
large language model |
|
|
| 18 |
MPO: Boosting LLM Agents with Meta Plan Optimization |
MPO:通过元计划优化提升LLM Agent能力 |
large language model |
|
|
| 19 |
Improving LLM-as-a-Judge Inference with the Judgment Distribution |
利用判断分布改进LLM作为裁判的推理性能 |
chain-of-thought |
|
|
| 20 |
SAFE: A Sparse Autoencoder-Based Framework for Robust Query Enrichment and Hallucination Mitigation in LLMs |
提出SAFE框架,利用稀疏自编码器增强LLM查询并缓解幻觉问题 |
large language model |
|
|
| 21 |
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models |
提出无监督前缀微调(UPFT),高效提升LLM推理能力,无需标注数据。 |
large language model |
|
|
| 22 |
SteerConf: Steering LLMs for Confidence Elicitation |
SteerConf:通过引导LLM置信度来提高校准性和可靠性 |
large language model |
|
|
| 23 |
Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs' Decoding Layers |
提出HCL框架,量化评估LLM解码层中的幻觉与创造力权衡 |
large language model |
✅ |
|
| 24 |
Multidimensional Consistency Improves Reasoning in Language Models |
多维度一致性提升语言模型推理能力,尤其对数学问题 |
large language model |
|
|
| 25 |
Put the Space of LoRA Initialization to the Extreme to Preserve Pre-trained Knowledge |
提出LoRA-Null,通过激活空空间初始化LoRA,有效缓解大语言模型微调中的灾难性遗忘。 |
large language model |
✅ |
|
| 26 |
Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning Extraction |
提出基于LLM的协同标注与划分抽取方法,用于解决大规模类型事件抽取问题。 |
large language model |
|
|
| 27 |
LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs |
提出LADM框架,利用注意力机制进行长文本数据选择,提升LLM长文本处理能力。 |
large language model |
|
|
| 28 |
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization |
提出差异感知个性化学习(DPL),增强LLM的个性化能力 |
large language model |
✅ |
|
| 29 |
Hierarchical Re-ranker Retriever (HRR) |
提出分层重排序检索器(HRR),解决LLM应用中上下文检索粒度选择难题 |
large language model |
|
|
| 30 |
An Efficient and Precise Training Data Construction Framework for Process-supervised Reward Model in Mathematical Reasoning |
EpicPRM:高效精确的数学推理过程监督奖励模型训练数据构建框架 |
large language model |
✅ |
|
| 31 |
PanguIR Technical Report for NTCIR-18 AEOLLM Task |
PanguIR提出多模型协作、提示自动优化和ICL优化方法,提升LLM自动评估性能。 |
large language model |
|
|
| 32 |
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability |
DeLTa:基于Logit轨迹预测的解码策略,提升大语言模型的真实性和推理能力 |
large language model |
|
|
| 33 |
Limited Effectiveness of LLM-based Data Augmentation for COVID-19 Misinformation Stance Detection |
评估LLM数据增强在COVID-19虚假信息立场检测中的有效性,发现其增益有限 |
large language model |
|
|
| 34 |
Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions |
提出Recipe2Plan基准,评估LLM在时序约束下高效多任务规划能力 |
large language model |
✅ |
|
| 35 |
Enhancing LLM Reliability via Explicit Knowledge Boundary Modeling |
提出显式知识边界建模框架以提升大型语言模型的可靠性 |
large language model |
|
|
| 36 |
Call for Rigor in Reporting Quality of Instruction Tuning Data |
强调Instruction Tuning数据质量评估中超参数选择严谨性的必要性 |
large language model |
|
|
| 37 |
Measuring Intrinsic Dimension of Token Embeddings |
通过测量token嵌入的本征维度评估语言模型的冗余度并指导LoRA应用 |
large language model |
|
|