| 1 |
PersianMedQA: Evaluating Large Language Models on a Persian-English Bilingual Medical Question Answering Benchmark |
提出PersianMedQA以评估双语医疗问答中的大型语言模型 |
large language model instruction following chain-of-thought |
✅ |
|
| 2 |
Spurious Correlations and Beyond: Understanding and Mitigating Shortcut Learning in SDOH Extraction with Large Language Models |
提出新方法以解决社会健康决定因素提取中的快捷学习问题 |
large language model chain-of-thought |
|
|
| 3 |
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings |
提出多模态挑战分类以提升中文毒性检测能力 |
large language model multimodal |
|
|
| 4 |
Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities |
评估大型语言模型在密码分析与侧信道漏洞中的应用潜力 |
large language model chain-of-thought |
|
|
| 5 |
When Large Multimodal Models Confront Evolving Knowledge:Challenges and Pathways |
提出EVOKE基准以解决多模态模型知识演变问题 |
multimodal instruction following |
|
|
| 6 |
MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs |
提出MMAFFBen基准以解决多语言多模态情感分析评估问题 |
large language model multimodal |
✅ |
|
| 7 |
Don't Reinvent the Wheel: Efficient Instruction-Following Text Embedding based on Guided Space Transformation |
提出GSTransform以解决指令跟随文本嵌入的计算开销问题 |
instruction following |
✅ |
|
| 8 |
Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation |
提出位置敏感性指数以解决多模态RAG系统中的偏见问题 |
multimodal |
|
|
| 9 |
Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks |
探讨多语言性对零-shot迁移的影响,提出新见解 |
zero-shot transfer |
|
|
| 10 |
Revisiting Epistemic Markers in Confidence Estimation: Can Markers Accurately Reflect Large Language Models' Uncertainty? |
提出标记信心评估方法以解决LLM不确定性问题 |
large language model |
✅ |
|
| 11 |
HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America |
提出HESEIA数据集以评估语言模型中的社会偏见 |
large language model |
|
|
| 12 |
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration |
提出Soft Reasoning以解决大语言模型推理能力不足的问题 |
large language model |
✅ |
|
| 13 |
TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis |
提出TRIDENT以增强大型语言模型的安全性 |
large language model |
|
|
| 14 |
Disentangling Language and Culture for Evaluating Multilingual Large Language Models |
提出双重评估框架以评估多语言大语言模型的能力 |
large language model |
|
|
| 15 |
Harnessing Large Language Models for Scientific Novelty Detection |
利用大型语言模型解决科学新颖性检测问题 |
large language model |
|
|
| 16 |
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation |
提出CaMMT基准以解决文化内容翻译中的多模态挑战 |
multimodal |
|
|
| 17 |
Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts |
比较数据收集策略以优化情感标注的多模态社交媒体帖子 |
multimodal |
|
|
| 18 |
Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model |
提出多语言无注释手语翻译模型以解决低资源问题 |
foundation model |
|
|
| 19 |
Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research |
提出AGORA框架以解决语言代理开发中的标准化与评估问题 |
large language model multimodal chain-of-thought |
|
|
| 20 |
Advantageous Parameter Expansion Training Makes Better Large Language Models |
提出优势参数扩展训练以提升大语言模型性能 |
large language model |
|
|
| 21 |
Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models |
提出新框架以重塑大型语言模型的错误累积理解 |
large language model |
|
|
| 22 |
Effects of Theory of Mind and Prosocial Beliefs on Steering Human-Aligned Behaviors of LLMs in Ultimatum Games |
探讨心智理论与利他信念对LLM人类行为对齐的影响 |
large language model chain-of-thought |
✅ |
|
| 23 |
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation |
提出FinMME数据集以解决金融领域多模态评估不足问题 |
large language model multimodal |
✅ |
|
| 24 |
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text |
提出LegalEval-Q以解决法律文本生成质量评估问题 |
large language model |
✅ |
|
| 25 |
Lossless Token Sequence Compression via Meta-Tokens |
提出无损令牌序列压缩方法以优化大语言模型性能 |
large language model |
|
|
| 26 |
Model Unlearning via Sparse Autoencoder Subspace Guided Projections |
提出SAE引导的子空间投影去学习方法以解决隐私问题 |
large language model |
|
|
| 27 |
HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs |
提出HD-NDEs以解决大语言模型中的幻觉检测问题 |
large language model |
|
|
| 28 |
An evaluation of LLMs for generating movie reviews: GPT-4o, Gemini-2.0 and DeepSeek-V3 |
提出框架评估LLMs生成电影评论的有效性 |
large language model |
|
|
| 29 |
Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings |
提出多语言Matryoshka嵌入以解决新闻文章聚类问题 |
large language model |
|
|
| 30 |
Multiple LLM Agents Debate for Equitable Cultural Alignment |
提出多代理辩论框架以促进文化适应性 |
large language model |
|
|
| 31 |
Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX |
提出POLLUX以评估俄语LLM的生成能力 |
large language model |
|
|
| 32 |
Bench4KE: Benchmarking Automated Competency Question Generation |
提出Bench4KE以解决知识工程自动化评估标准化问题 |
large language model |
|
|
| 33 |
Cross-Attention Speculative Decoding |
提出跨注意力推测解码以简化大语言模型推理 |
large language model |
|
|
| 34 |
Localizing Persona Representations in LLMs |
研究如何在大型语言模型中定位个性化表征 |
large language model |
|
|
| 35 |
Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations |
提出LLM驱动的定理证明方法以提升NLI解释的可靠性与稳健性 |
large language model |
|
|
| 36 |
COSMIC: Generalized Refusal Direction Identification in LLM Activations |
提出COSMIC框架以自动识别大型语言模型中的拒绝行为 |
large language model |
|
|
| 37 |
LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing |
提出LKD-KGC以解决领域特定知识图谱构建的效率问题 |
large language model |
|
|
| 38 |
CASPER: A Large Scale Spontaneous Speech Dataset |
提出CASPER数据集以解决自发语音数据稀缺问题 |
large language model |
|
|
| 39 |
MultiHoax: A Dataset of Multi-hop False-Premise Questions |
提出MultiHoax数据集以解决多跳错误前提问题 |
large language model |
|
|
| 40 |
The Impact of Disability Disclosure on Fairness and Bias in LLM-Driven Candidate Selection |
探讨残疾信息披露对LLM驱动候选人选择的公平性影响 |
large language model |
|
|
| 41 |
Guiding Generative Storytelling with Knowledge Graphs |
提出知识图谱辅助的故事生成方法以提升叙事质量 |
large language model |
|
|
| 42 |
From Macro to Micro: Probing Dataset Diversity in Language Model Fine-Tuning |
提出数据集多样性控制策略以提升语言模型微调效果 |
large language model |
|
|
| 43 |
BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization |
提出SCRIPT以解决多语言预标记化中的挑战 |
large language model |
|
|
| 44 |
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings |
提出A*-Thought以解决低资源环境下推理效率问题 |
chain-of-thought |
✅ |
|
| 45 |
Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections |
提出AI工具以检测和上下文化遗产中的有害语言 |
large language model |
|
|
| 46 |
ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation |
提出ClueAnchor以解决RAG系统知识提取不足问题 |
large language model |
✅ |
|
| 47 |
LLM Inference Enhanced by External Knowledge: A Survey |
通过外部知识增强LLM推理能力以解决推理准确性问题 |
large language model |
|
|
| 48 |
HiCaM: A Hierarchical-Causal Modification Framework for Long-Form Text Modification |
提出HiCaM框架以解决长文本修改中的内容不一致问题 |
large language model |
|
|
| 49 |
Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation |
提出数据泄漏模拟方法以提升LLM评估的透明性 |
large language model |
|
|
| 50 |
Semi-structured LLM Reasoners Can Be Rigorously Audited |
提出半结构化推理模型以解决大型语言模型的可审计性问题 |
large language model |
|
|
| 51 |
CLaSp: In-Context Layer Skip for Self-Speculative Decoding |
提出CLaSp以解决自我推测解码中的层跳过问题 |
large language model |
|
|
| 52 |
CrossICL: Cross-Task In-Context Learning via Unsupervised Demonstration Transfer |
提出CrossICL以解决无监督示范转移的任务间学习问题 |
large language model |
|
|
| 53 |
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models |
提出R-KV以解决推理模型中的冗余KV缓存压缩问题 |
chain-of-thought |
|
|