| 1 |
11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis |
提出11Plus-Bench以评估多模态大语言模型的空间推理能力 |
large language model multimodal |
|
|
| 2 |
Uncertainty-Aware Collaborative System of Large and Small Models for Multimodal Sentiment Analysis |
提出不确定性感知协作系统以解决多模态情感分析中的性能与效率问题 |
large language model multimodal |
|
|
| 3 |
Prompting Strategies for Language Model-Based Item Generation in K-12 Education: Bridging the Gap Between Small and Large Language Models |
提出结构化提示策略以提升K-12教育中的题目生成质量 |
large language model chain-of-thought |
|
|
| 4 |
Survey of Specialized Large Language Model |
系统评估专用大型语言模型以解决专业领域应用问题 |
large language model multimodal |
|
|
| 5 |
MathBuddy: A Multimodal System for Affective Math Tutoring |
提出MathBuddy以解决情感状态对数学学习影响的问题 |
multimodal |
✅ |
|
| 6 |
Dhati+: Fine-tuned Large Language Models for Arabic Subjectivity Evaluation |
提出Dhati+以解决阿拉伯语主观性评估数据不足问题 |
large language model |
|
|
| 7 |
INSEva: A Comprehensive Chinese Benchmark for Large Language Models in Insurance |
提出INSEva基准以解决保险领域AI评估不足问题 |
large language model |
|
|
| 8 |
Geopolitical Parallax: Beyond Walter Lippmann Just After Large Language Models |
提出地缘政治视差分析以解决大语言模型偏见问题 |
large language model |
|
|
| 9 |
Logical Reasoning with Outcome Reward Models for Test-Time Scaling |
提出结果奖励模型以提升推理任务中的逻辑推理能力 |
large language model chain-of-thought |
|
|
| 10 |
Do MLLMs Really Understand the Charts? |
提出ChartVRBench以解决多模态大语言模型在图表理解中的不足 |
large language model multimodal |
|
|
| 11 |
Benchmarking Hindi LLMs: A New Suite of Datasets and a Comparative Analysis |
提出五个印地语LLM评估数据集以解决评估挑战 |
large language model |
|
|
| 12 |
LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation |
提出层融合解码以优化检索增强生成模型的外部知识利用 |
large language model |
|
|
| 13 |
Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts |
提出HAMLET框架以评估大语言模型在长文本中的理解能力 |
large language model |
✅ |
|
| 14 |
Language Models Identify Ambiguities and Exploit Loopholes |
研究大型语言模型识别模糊性与利用漏洞的能力 |
large language model |
|
|
| 15 |
Forewarned is Forearmed: Pre-Synthesizing Jailbreak-like Instructions to Enhance LLM Safety Guardrail to Potential Attacks |
提出IMAGINE框架以增强大型语言模型的安全性 |
large language model |
|
|
| 16 |
AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios |
提出AgentCoMa以解决混合常识与数学推理问题 |
large language model |
|
|
| 17 |
Scalable and consistent few-shot classification of survey responses using text embeddings |
提出基于文本嵌入的分类框架以解决开放式调查响应分析问题 |
large language model |
|
|
| 18 |
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables |
提出T2R-bench以解决工业表格信息报告生成问题 |
large language model |
|
|
| 19 |
Spotlight Attention: Towards Efficient LLM Generation via Non-linear Hashing-based KV Cache Retrieval |
提出Spotlight Attention以解决LLM生成中的KV缓存效率问题 |
large language model |
|
|
| 20 |
Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models |
提出CSKS框架以解决LLMs对上下文知识敏感度调整问题 |
large language model |
✅ |
|
| 21 |
Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs |
提出Router Lens与CEFT以提升混合专家模型的上下文可信度 |
large language model |
|
|
| 22 |
ArgCMV: An Argument Summarization Benchmark for the LLM-era |
提出ArgCMV数据集以解决现有论点摘要基准不足问题 |
large language model |
|
|
| 23 |
Functional Consistency of LLM Code Embeddings: A Self-Evolving Data Synthesis Framework for Benchmarking |
提出功能一致性框架以提升代码嵌入模型性能 |
large language model |
|
|
| 24 |
Rule Synergy Analysis using LLMs: State of the Art and Implications |
利用LLMs分析规则协同以解决复杂环境中的推理问题 |
large language model |
|
|