| 1 |
ISO-Bench: Benchmarking Multimodal Causal Reasoning in Visual-Language Models through Procedural Plans |
ISO-Bench:通过程序化流程基准测试视觉-语言模型中的多模态因果推理 |
multimodal chain-of-thought |
|
|
| 2 |
Traits Run Deep: Enhancing Personality Assessment via Psychology-Guided LLM Representations and Multimodal Apparent Behaviors |
提出Traits Run Deep框架,利用心理学指导的LLM表征和多模态行为增强性格评估。 |
large language model multimodal |
✅ |
|
| 3 |
Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition |
提出基于概念瓶颈模型的仇恨和反仇恨言论识别方法,提升透明性和性能。 |
large language model |
|
|
| 4 |
BALSAM: A Platform for Benchmarking Arabic Large Language Models |
BALSAM:一个用于评估阿拉伯语大型语言模型的综合基准平台 |
large language model |
|
|
| 5 |
CliCARE: Grounding Large Language Models in Clinical Guidelines for Decision Support over Longitudinal Cancer Electronic Health Records |
CliCARE:将大型语言模型与临床指南相结合,为纵向癌症电子病历提供决策支持 |
large language model |
|
|
| 6 |
What is an "Abstract Reasoner"? Revisiting Experiments and Arguments about Large Language Models |
通过微调输入编码,提升大语言模型在抽象推理任务上的性能 |
large language model |
|
|
| 7 |
NeedleChain: Measuring Intact Context Comprehension Capability of Large Language Models |
提出NeedleChain基准,评估大语言模型在全相关上下文中的信息整合能力 |
large language model |
|
|
| 8 |
Resource-Efficient Adaptation of Large Language Models for Text Embeddings via Prompt Engineering and Contrastive Fine-tuning |
提出一种资源高效的LLM文本嵌入自适应方法,结合Prompt工程和对比微调。 |
large language model |
|
|
| 9 |
Listening to the Unspoken: Exploring "365" Aspects of Multimodal Interview Performance Assessment |
提出融合多模态信息的面试表现评估框架,提升评估的全面性和公平性。 |
multimodal |
✅ |
|
| 10 |
Multilingual Political Views of Large Language Models: Identification and Steering |
大规模研究揭示LLM多语言政治倾向并提出干预方法 |
large language model |
✅ |
|
| 11 |
A Benchmark Dataset and Evaluation Framework for Vietnamese Large Language Models in Customer Support |
提出CSConDa数据集与评测框架,用于评估越南语大模型在客服场景下的性能 |
large language model |
✅ |
|
| 12 |
PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs |
PATENTWRITER:利用LLM进行专利撰写基准测试,提升专利申请效率 |
large language model chain-of-thought |
|
|
| 13 |
IFEvalCode: Controlled Code Generation |
提出IFEvalCode基准,通过前后约束生成提升代码大模型指令遵循能力 |
large language model instruction following |
|
|
| 14 |
RASL: Retrieval Augmented Schema Linking for Massive Database Text-to-SQL |
RASL:提出检索增强的模式链接方法,解决大规模数据库Text-to-SQL的挑战。 |
large language model |
|
|
| 15 |
Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity |
揭示大语言模型在处理中文文本歧义时的脆弱性 |
large language model |
✅ |
|
| 16 |
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations |
C3:双语口语对话模型基准,探索复杂对话中的挑战 |
large language model |
|
|
| 17 |
Exploring In-Context Learning for Frame-Semantic Parsing |
探索上下文学习用于框架语义分析,无需微调实现高性能 |
large language model |
|
|
| 18 |
Opportunities and Challenges of LLMs in Education: An NLP Perspective |
探讨LLM在教育领域的机遇与挑战,聚焦NLP视角下的辅助与评估两大应用场景。 |
large language model |
|
|
| 19 |
Investigating Hallucination in Conversations for Low Resource Languages |
针对低资源语言对话场景,研究大型语言模型中的幻觉问题 |
large language model |
|
|
| 20 |
Heartificial Intelligence: Exploring Empathy in Language Models |
评估语言模型共情能力:大型模型认知共情超越人类,但情感共情仍有差距 |
large language model |
|
|
| 21 |
WINELL: Wikipedia Never-Ending Updating with LLM Agents |
WiNELL:利用LLM Agent持续更新维基百科知识 |
instruction following |
|
|
| 22 |
PersonaTwin: A Multi-Tier Prompt Conditioning Framework for Generating and Evaluating Personalized Digital Twins |
PersonaTwin:多层提示调节框架,用于生成和评估个性化数字孪生 |
large language model |
|
|
| 23 |
Hierarchical Verification of Speculative Beams for Accelerating LLM Inference |
提出分层验证树(HVT)加速LLM推断,提升推断效率和降低能耗。 |
large language model |
|
|