| 1 |
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks |
MCIF:首个基于科学讲座的多语言跨模态指令跟随基准 |
large language model multimodal instruction following |
|
|
| 2 |
HITSZ's End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track |
结合Whisper ASR与Indic LLM,HITSZ提出IWSLT 2025 Indic赛道端到端语音翻译系统 |
large language model chain-of-thought |
|
|
| 3 |
Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization |
提出基于时间戳对齐的ASR转录和说话人分离方法,提升语音情感识别精度 |
multimodal TAMP |
|
|
| 4 |
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning |
提出动态事实对齐方法,缓解大语言模型中的地理空间知识幻觉问题 |
large language model |
|
|
| 5 |
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models? |
小规模数据投毒会加剧大型语言模型中与方言相关的偏见 |
large language model |
|
|
| 6 |
Retrieval augmented generation based dynamic prompting for few-shot biomedical named entity recognition using large language models |
提出基于检索增强生成动态提示的少样本生物医学命名实体识别方法 |
large language model |
|
|
| 7 |
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models |
提出SpeechIQ评估框架,从认知层面评估语音大语言模型的语音理解能力 |
large language model |
|
|
| 8 |
An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case |
研究揭示大型语言模型在意大利语中性别刻板印象的表达,强调非英语语境下的偏见问题。 |
large language model |
|
|
| 9 |
RoD-TAL: A Benchmark for Answering Questions in Romanian Driving License Exams |
提出RoD-TAL基准数据集,评估LLM和VLM在罗马尼亚驾驶证考试问答中的能力 |
large language model multimodal chain-of-thought |
|
|
| 10 |
How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation under the One-Time-Pad-Based Framework |
提出ArxivRoll框架以解决大语言模型评估过高问题 |
large language model |
✅ |
|
| 11 |
Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents |
提出DebateCV框架,利用多智能体辩论提升复杂声明验证的准确性和可信度。 |
large language model |
|
|
| 12 |
LLaVA-NeuMT: Selective Layer-Neuron Modulation for Efficient Multilingual Multimodal Translation |
LLaVA-NeuMT:通过选择性层-神经元调制实现高效多语言多模态翻译 |
multimodal |
|
|
| 13 |
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks |
提出MMESGBench,首个面向ESG任务的多模态理解与复杂推理基准 |
multimodal |
✅ |
|
| 14 |
Large language models provide unsafe answers to patient-posed medical questions |
评估大型语言模型在医疗问答中的安全性,揭示潜在患者风险 |
large language model |
|
|
| 15 |
SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models |
提出SLoW方法,通过选择低频词字典提升大语言模型翻译性能并节省token消耗。 |
large language model |
|
|
| 16 |
Identifying Fine-grained Forms of Populism in Political Discourse: A Case Study on Donald Trump's Presidential Campaigns |
提出新型数据集并评估LLM在识别政治语篇中细粒度民粹主义形式的能力 |
large language model instruction following |
|
|
| 17 |
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts? |
MOCHA:评估代码大模型在多轮恶意编程提示下的鲁棒性 |
large language model |
|
|
| 18 |
Diverse LLMs or Diverse Question Interpretations? That is the Ensembling Question |
对比模型多样性和问题解释多样性,提升LLM二元问答集成性能 |
large language model |
|
|
| 19 |
A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation |
提出Multi-TAG框架,通过多工具聚合提升LLM在复杂数学推理中的性能。 |
large language model |
|
|
| 20 |
Smooth Reading: Bridging the Gap of Recurrent LLM to Self-Attention LLM on Long-Context Tasks |
提出Smooth Reading方法,提升循环LLM在长文本任务上的性能至与自注意力LLM相当水平 |
large language model |
|
|
| 21 |
Uncovering Cross-Linguistic Disparities in LLMs using Sparse Autoencoders |
利用稀疏自编码器揭示LLM跨语言能力差异 |
large language model |
|
|
| 22 |
Ta-G-T: Subjectivity Capture in Table to Text Generation via RDF Graphs |
提出Ta-G-T框架,通过RDF图在表格到文本生成中融入主观性 |
large language model |
|
|
| 23 |
Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks |
评估压缩多语言Transformer在不同语言基准上的性能,促进包容性NLP |
large language model |
|
|
| 24 |
TokenSmith: Streamlining Data Editing, Search, and Inspection for Large-Scale Language Model Training and Interpretability |
TokenSmith:简化大规模语言模型训练数据的编辑、搜索和检查流程 |
large language model |
|
|
| 25 |
Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented Generation |
提出Passage Injection方法,增强RAG系统对噪声检索结果的鲁棒性 |
large language model |
✅ |
|
| 26 |
Jailbreaking Large Language Diffusion Models: Revealing Hidden Safety Flaws in Diffusion-Based Text Generation |
提出PAD攻击,揭示大型语言扩散模型在安全性上的脆弱性 |
large language model |
|
|
| 27 |
Adaptive Learning Systems: Personalized Curriculum Design Using LLM-Powered Analytics |
提出基于LLM分析的自适应学习系统,实现个性化课程设计 |
large language model |
|
|