| 1 |
VeriTaS: The First Dynamic Benchmark for Multimodal Automated Fact-Checking |
提出VeriTaS:首个多模态自动事实核查动态基准,应对LLM预训练带来的数据泄露问题。 |
foundation model multimodal |
|
|
| 2 |
Resisting Manipulative Bots in Memecoin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning |
提出基于Chain-of-Thought多Agent系统的Memecoin跟单交易方法,抵抗操纵性机器人。 |
large language model chain-of-thought |
|
|
| 3 |
What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting |
提出What If TSF基准,用于评估情景引导的多模态时间序列预测模型 |
large language model multimodal |
✅ |
|
| 4 |
Enriching Semantic Profiles into Knowledge Graph for Recommender Systems Using Large Language Models |
提出SPiKE模型,利用大语言模型增强知识图谱推荐系统中的语义表示。 |
large language model |
|
|
| 5 |
Uncovering Political Bias in Large Language Models using Parliamentary Voting Records |
提出PoliBias基准,揭示大型语言模型在议会投票记录中的政治偏见 |
large language model |
|
|
| 6 |
An Under-Explored Application for Explainable Multimodal Misogyny Detection in code-mixed Hindi-English |
提出一种可解释的多模态仇恨言论检测Web应用,用于印地语-英语混合语境 |
multimodal |
|
|
| 7 |
MPCI-Bench: A Benchmark for Multimodal Pairwise Contextual Integrity Evaluation of Language Model Agents |
MPCI-Bench:用于评估语言模型智能体多模态情境完整性的基准 |
multimodal |
|
|
| 8 |
DNF: Dual-Layer Nested Fingerprinting for Large Language Model Intellectual Property Protection |
提出双层嵌套指纹技术以解决大语言模型知识产权保护问题 |
large language model |
|
|
| 9 |
ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios |
ViDoRe V3:提出一个综合性的多模态RAG基准,用于评估复杂现实场景下的检索增强生成。 |
multimodal visual grounding |
|
|
| 10 |
MEMEWEAVER: Inter-Meme Graph Reasoning for Sexism and Misogyny Detection |
MemeWeaver:提出基于Meme间图推理的性别歧视和厌女症检测框架。 |
multimodal |
|
|
| 11 |
MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness |
提出MirrorBench框架,用于评估用户代理生成类人对话的能力。 |
large language model |
✅ |
|
| 12 |
Why AI Alignment Failure Is Structural: Learned Human Interaction Structures and AGI as an Endogenous Evolutionary Shock |
AI对齐失败的结构性根源:学习人类交互结构与AGI的内生演化冲击 |
large language model |
|
|
| 13 |
Prism: Towards Lowering User Cognitive Load in LLMs via Complex Intent Understanding |
Prism:通过复杂意图理解降低LLM交互中的用户认知负荷 |
large language model |
|
|
| 14 |
Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement |
提出学习者定制程序修复方法以解决编程学习者的代码错误问题 |
large language model |
|
|
| 15 |
SUMMPILOT: Bridging Efficiency and Customization for Interactive Summarization System |
SUMMPILOT:交互式摘要系统,兼顾效率与用户定制化需求 |
large language model |
|
|
| 16 |
M3-BENCH: Process-Aware Evaluation of LLM Agents Social Behaviors in Mixed-Motive Games |
提出M3-Bench,用于在混合动机博弈中评估LLM智能体的社会行为 |
large language model |
|
|
| 17 |
Regulatory gray areas of LLM Terms |
分析LLM服务条款的监管灰色地带,揭示科研使用中的不确定性 |
large language model |
|
|
| 18 |
Improving LLM Reasoning with Homophily-aware Structural and Semantic Text-Attributed Graph Compression |
提出HS2C框架,利用同质性压缩文本属性图,提升LLM推理性能。 |
large language model |
|
|
| 19 |
The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios |
提出EvoEnv动态评估环境,解决多模态大模型在工作场景中的学习、探索和调度问题。 |
large language model |
✅ |
|