| 1 |
Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding |
提出WiserUI-Bench基准,评估MLLM在理解UI/UX设计对用户行为影响方面的能力 |
large language model multimodal |
|
|
| 2 |
Chain-of-Thought Tokens are Computer Program Variables |
研究表明CoT中的Token类似于程序变量,可有效解决复杂推理任务 |
large language model chain-of-thought |
✅ |
|
| 3 |
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design |
设计具备论证能力的“合理鹦鹉”型大语言模型,提升批判性思维能力 |
large language model |
|
|
| 4 |
A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition |
提出U-MNER框架与Twitter2015-Urdu数据集,推进乌尔都语多模态命名实体识别研究。 |
multimodal |
|
|
| 5 |
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders |
利用稀疏自编码器揭示大型语言模型中的语言特定特征 |
large language model |
✅ |
|
| 6 |
Performance Evaluation of Large Language Models in Bangla Consumer Health Query Summarization |
评估大型语言模型在孟加拉语消费者健康查询摘要任务中的性能 |
large language model |
|
|
| 7 |
Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization |
提出基于EK-FAC的大规模多阶段影响函数,用于分析微调LLM对预训练数据的依赖 |
large language model |
✅ |
|
| 8 |
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging |
通过模型融合,将大型语言模型的推理能力迁移至视觉-语言模型 |
large language model multimodal |
|
|
| 9 |
Crosslingual Reasoning through Test-Time Scaling |
通过测试时缩放提升英语中心语言模型跨语言推理能力 |
large language model chain-of-thought |
|
|
| 10 |
KG-HTC: Integrating Knowledge Graphs into LLMs for Effective Zero-shot Hierarchical Text Classification |
提出KG-HTC,通过融合知识图谱与LLM,有效解决零样本分层文本分类问题。 |
large language model |
✅ |
|
| 11 |
ComPO: Preference Alignment via Comparison Oracles |
提出ComPO,通过比较Oracle进行偏好对齐,解决LLM中的噪声偏好问题 |
large language model |
|
|
| 12 |
UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections |
构建英国大选误导性叙事数据集,并评估大型语言模型检测能力。 |
large language model |
|
|
| 13 |
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations |
clem todd:用于系统评测基于LLM的任务型对话系统实现的框架 |
large language model |
|
|
| 14 |
Ultra-FineWeb: Efficient Data Filtering and Verification for High-Quality LLM Training Data |
Ultra-FineWeb:高效数据过滤与验证,提升大语言模型训练数据质量 |
large language model |
|
|
| 15 |
Frame In, Frame Out: Do LLMs Generate More Biased News Headlines than Humans? |
研究表明,大型语言模型比人类更易生成带有偏见的新闻标题。 |
large language model |
|
|
| 16 |
RICo: Refined In-Context Contribution for Automatic Instruction-Tuning Data Selection |
提出RICo,通过上下文学习改进指令微调数据选择,提升大模型性能。 |
large language model |
|
|
| 17 |
Product of Experts with LLMs: Boosting Performance on ARC Is a Matter of Perspective |
利用专家乘积与LLM提升ARC性能:视角是关键 |
large language model |
|
|
| 18 |
Reliably Bounding False Positives: A Zero-Shot Machine-Generated Text Detection Framework via Multiscaled Conformal Prediction |
提出基于多尺度共形预测的零样本机器生成文本检测框架,可靠控制误报率。 |
large language model |
|
|
| 19 |
Rethinking Invariance in In-context Learning |
提出InvICL,解决上下文学习中对示例顺序敏感且现有不变方法性能不足的问题。 |
large language model |
✅ |
|