| 1 |
DataProphet: Demystifying Supervision Data Generalization in Multimodal LLMs |
DataProphet:揭示多模态LLM监督数据泛化能力,实现免训练数据集优选。 |
large language model multimodal |
|
|
| 2 |
Measuring Faithfulness Depends on How You Measure: Classifier Sensitivity in LLM Chain-of-Thought Evaluation |
CoT忠实度评估受分类器选择影响显著,单一指标不可靠 |
chain-of-thought |
|
|
| 3 |
Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models |
提出语义Token聚类(STC)方法,高效量化大语言模型的不确定性。 |
large language model |
|
|
| 4 |
When Contextual Inference Fails: Cancelability in Interactive Instruction Following |
提出BWIM交互式基准,揭示LLM在情境推理失败时的澄清行为缺陷 |
instruction following |
|
|
| 5 |
PoC: Performance-oriented Context Compression for Large Language Models via Performance Prediction |
提出PoC:一种面向性能的大语言模型上下文压缩方法,通过性能预测保证性能下限。 |
large language model |
|
|
| 6 |
TextReasoningBench: Does Reasoning Really Improve Text Classification in Large Language Models? |
提出TextReasoningBench以评估推理策略在文本分类中的有效性 |
large language model |
|
|
| 7 |
Borderless Long Speech Synthesis |
提出Borderless长语音合成框架,实现Agent驱动的无边界语音生成。 |
instruction following chain-of-thought |
|
|
| 8 |
Rethinking Ground Truth: A Case Study on Human Label Variation in MLLM Benchmarking |
提出考虑人类标注差异的多模态大语言模型评测方法,提升内容审核场景的鲁棒性。 |
large language model multimodal |
|
|
| 9 |
Reasoning Gets Harder for LLMs Inside A Dialogue |
揭示对话场景下LLM推理能力下降:提出BOULDER动态基准评测 |
large language model |
|
|
| 10 |
Current LLMs still cannot 'talk much' about grammar modules: Evidence from syntax |
评估大型语言模型在语法模块理解上的能力:以ChatGPT阿拉伯语翻译为例 |
large language model |
|
|
| 11 |
Predicting States of Understanding in Explanatory Interactions Using Cognitive Load-Related Linguistic Cues |
利用认知负荷相关语言线索预测解释性互动中的理解状态 |
multimodal |
|
|
| 12 |
An Agentic Approach to Generating XAI-Narratives |
提出基于多Agent框架的XAI叙事生成方法,提升解释的忠实性和连贯性 |
large language model |
|
|
| 13 |
Overreliance on AI in Information-seeking from Video Content |
研究揭示AI辅助视频信息检索中过度依赖AI的风险,导致准确率下降。 |
large language model |
|
|
| 14 |
Structured Prompting for Arabic Essay Proficiency: A Trait-Centric Evaluation Approach |
提出结构化提示框架,用于阿拉伯语作文评分,提升语言特征评估准确性 |
large language model |
|
|