| 1 |
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences? |
提出TempVS基准以评估多模态大语言模型的事件顺序理解能力 |
large language model multimodal |
✅ |
|
| 2 |
FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction |
提出FPDS以解决建筑平面图生成的迭代性问题 |
large language model multimodal |
|
|
| 3 |
Accelerating Diffusion Large Language Models with SlowFast Sampling: The Three Golden Principles |
提出SlowFast采样以解决扩散语言模型的效率问题 |
large language model |
|
|
| 4 |
Large Language Models for Detection of Life-Threatening Texts |
利用大型语言模型检测生命威胁文本 |
large language model |
|
|
| 5 |
BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining |
提出BioPars以解决波斯语生物医学文本挖掘问题 |
large language model |
✅ |
|
| 6 |
Do Language Models Have Bayesian Brains? Distinguishing Stochastic and Deterministic Decision Patterns within Large Language Models |
提出区分语言模型决策模式的方法以解决贝叶斯推断问题 |
large language model |
|
|
| 7 |
Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models |
评估大型语言模型在多用户对话状态跟踪中的能力 |
large language model |
|
|
| 8 |
Team QUST at SemEval-2025 Task 10: Evaluating Large Language Models in Multiclass Multi-label Classification of News Entity Framing |
提出三阶段检索框架以优化新闻实体框架的多类多标签分类 |
large language model |
✅ |
|
| 9 |
FormosanBench: Benchmarking Low-Resource Austronesian Languages in the Era of Large Language Models |
提出FORMOSANBENCH以评估低资源南岛语言的LLM表现 |
large language model |
|
|
| 10 |
Code Execution as Grounded Supervision for LLM Reasoning |
提出基于代码执行的监督方法以提升LLM推理能力 |
large language model chain-of-thought |
|
|
| 11 |
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference Optimization |
提出ReCUT以解决大语言模型推理长度与准确性平衡问题 |
large language model chain-of-thought |
✅ |
|
| 12 |
Conversational Search: From Fundamentals to Frontiers in the LLM Era |
提出对话搜索系统以满足复杂信息需求 |
large language model instruction following |
|
|
| 13 |
AC/DC: LLM-based Audio Comprehension via Dialogue Continuation |
提出基于对话延续的音频理解模型以解决指令跟随问题 |
large language model instruction following |
|
|
| 14 |
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science |
提出AutoMind以解决数据科学自动化中的灵活性不足问题 |
large language model |
✅ |
|
| 15 |
ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark |
提出ChineseHarm-Bench以解决中文有害内容检测不足问题 |
large language model |
✅ |
|
| 16 |
Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization |
提出半非负矩阵分解以解析MLP激活特征 |
large language model |
|
|
| 17 |
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers |
提出统一机制以理解变换器中的上下文推理现象 |
large language model |
|
|
| 18 |
UCD: Unlearning in LLMs via Contrastive Decoding |
提出对比解码方法以解决大语言模型中的信息遗忘问题 |
large language model |
|
|
| 19 |
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models |
提出PREMISE以解决大型推理模型的冗余计算问题 |
chain-of-thought |
|
|
| 20 |
ClusterUCB: Efficient Gradient-Based Data Selection for Targeted Fine-Tuning of LLMs |
提出ClusterUCB以解决大语言模型微调中的数据选择问题 |
large language model |
|
|
| 21 |
No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning |
提出SEAR框架以解决时间表推理中的适应性提示问题 |
large language model |
|
|
| 22 |
Hybrid-NL2SVA: Integrating RAG and Finetuning for LLM-based NL2SVA |
提出Hybrid-NL2SVA框架以解决NL2SVA自动化问题 |
large language model |
|
|
| 23 |
Slimming Down LLMs Without Losing Their Minds |
提出基于LoRA和QLoRA的高效微调方法以提升大语言模型性能 |
large language model |
|
|
| 24 |
NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors |
提出基于检索增强提示的AI辅导员错误识别系统 |
large language model |
✅ |
|
| 25 |
Reliable Reasoning Path: Distilling Effective Guidance for LLM Reasoning with Knowledge Graphs |
提出RRP框架以解决LLM推理中的知识图谱路径提取问题 |
large language model |
|
|
| 26 |
ClimateChat: Designing Data and Methods for Instruction Tuning LLMs to Answer Climate Change Queries |
提出自动化方法构建气候变化指令数据以提升LLM性能 |
large language model |
|
|
| 27 |
Reasoning Isn't Enough: Examining Truth-Bias and Sycophancy in LLMs |
评估大型语言模型的真实性偏见与谄媚行为 |
large language model |
|
|
| 28 |
The Biased Samaritan: LLM biases in Perceived Kindness |
提出一种新方法评估大型语言模型的偏见问题 |
large language model |
|
|
| 29 |
Do We Still Need Audio? Rethinking Speaker Diarization with a Text-Based Approach Using Multiple Prediction Models |
提出文本基础的说话人分离方法以解决音频质量问题 |
multimodal |
|
|
| 30 |
From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review |
提出基于LLM的成对比较机制以优化同行评审流程 |
large language model |
|
|
| 31 |
Dynamic Epistemic Friction in Dialogue |
提出动态认知摩擦模型以优化人机对话中的信念更新 |
large language model |
|
|
| 32 |
Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning |
提出基于LLM评估的自动化数学推理评估方法 |
large language model |
|
|
| 33 |
Mitigating Negative Interference in Multilingual Sequential Knowledge Editing through Null-Space Constraints |
提出LangEdit以解决多语言知识编辑中的负干扰问题 |
large language model |
✅ |
|
| 34 |
Assessing RAG and HyDE on 1B vs. 4B-Parameter Gemma LLMs for Personal Assistants Integretion |
评估RAG与HyDE在Gemma LLMs中的应用以提升个人助手性能 |
large language model |
|
|
| 35 |
One Tokenizer To Rule Them All: Emergent Language Plasticity via Multilingual Tokenizers |
提出通用分词器以提升多语言模型的适应能力 |
large language model |
|
|
| 36 |
TaxoAdapt: Aligning LLM-Based Multidimensional Taxonomy Construction to Evolving Research Corpora |
提出TaxoAdapt以解决科学文献分类与检索的动态问题 |
large language model |
|
|
| 37 |
Inferring Adjective Hypernyms with Language Models to Increase the Connectivity of Open English Wordnet |
提出基于语言模型的形容词上位词推断方法以增强开放英语Wordnet的连接性 |
large language model |
|
|
| 38 |
Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters |
揭示LLMs在字符拼写中的复杂性与内部表征 |
large language model |
|
|
| 39 |
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models |
研究语言模型中的偏见问题及其影响 |
large language model |
|
|
| 40 |
Beyond the Battlefield: Framing Analysis of Media Coverage in Conflict Reporting |
利用计算方法分析冲突报道中的框架偏见 |
large language model |
|
|
| 41 |
"Check My Work?": Measuring Sycophancy in a Simulated Educational Context |
研究用户建议对大型语言模型的影响以解决教育公平问题 |
large language model |
|
|