| 1 |
Can Large Language Models Simulate Human Responses? A Case Study of Stated Preference Experiments in the Context of Heating-related Choices |
利用大型语言模型模拟人类在供暖选择偏好实验中的行为 |
large language model chain-of-thought |
|
|
| 2 |
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information |
提出S2S-Arena:一个评估语音到语音模型指令跟随能力及副语言信息的基准 |
large language model instruction following |
|
|
| 3 |
Dynamic Knowledge Integration for Evidence-Driven Counter-Argument Generation with Large Language Models |
提出动态知识集成方法,提升大语言模型生成证据驱动的反驳论证质量 |
large language model |
|
|
| 4 |
Evaluating open-source Large Language Models for automated fact-checking |
评估开源大语言模型在自动化事实核查中的能力与局限性 |
large language model |
|
|
| 5 |
Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter |
提出策略链优化方法,提升大语言模型的情感支持能力 |
large language model |
|
|
| 6 |
Coreference as an indicator of context scope in multimodal narrative |
揭示多模态叙事中大型语言模型与人类指代消解模式的差异 |
multimodal |
✅ |
|
| 7 |
AdaSpec: Adaptive Speculative Decoding for Fast, SLO-Aware Large Language Model Serving |
AdaSpec:自适应推测解码,加速LLM服务并满足SLO |
large language model |
✅ |
|
| 8 |
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching |
提出Sketch-of-Thought,通过认知启发式草图方法提升LLM推理效率并减少token使用。 |
large language model chain-of-thought |
|
|
| 9 |
RocketEval: Efficient Automated LLM Evaluation via Grading Checklist |
RocketEval:通过检查清单分级实现高效的自动化LLM评估 |
large language model chain-of-thought |
✅ |
|
| 10 |
IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining |
提出IDEA Prune,一种生成语言模型预训练中集成的放大-剪枝流水线,提升剪枝模型性能。 |
large language model |
|
|
| 11 |
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation |
提出QG-SMS框架,利用学生建模与模拟增强试题分析,提升问题生成评估质量 |
large language model |
|
|
| 12 |
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs |
提出WikiBigEdit大规模基准,评估LLM终身知识编辑的极限 |
large language model |
|
|
| 13 |
ORANSight-2.0: Foundational LLMs for O-RAN |
ORANSight-2.0:为O-RAN定制的基础大语言模型,提升特定任务性能。 |
large language model |
|
|
| 14 |
SINdex: Semantic INconsistency Index for Hallucination Detection in LLMs |
提出SINdex,一种基于语义不一致性的LLM幻觉检测方法 |
large language model |
|
|
| 15 |
SANDWiCH: Semantical Analysis of Neighbours for Disambiguating Words in Context ad Hoc |
SANDWiCH:提出一种基于邻域语义分析的多语言词义消歧框架,达到新的SOTA。 |
large language model |
|
|
| 16 |
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs |
揭示LLM混合架构的欺骗脆弱性,并提出防御机制以提升鲁棒性 |
large language model |
|
|
| 17 |
AutoIOT: LLM-Driven Automated Natural Language Programming for AIoT Applications |
AutoIOT:LLM驱动的AIoT应用自然语言自动编程 |
large language model |
|
|
| 18 |
Knowledge Updating? No More Model Editing! Just Selective Contextual Reasoning |
提出选择性上下文推理(SCR),无需模型编辑即可更新LLM知识。 |
large language model |
|
|
| 19 |
Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning |
针对LLM推理中奖励模型问题的分析与缓解方法 |
chain-of-thought |
|
|
| 20 |
EMCee: Improving Multilingual Capability of LLMs via Bridging Knowledge and Reasoning with Extracted Synthetic Multilingual Context |
EMCee:通过提取合成多语言上下文桥接知识与推理,提升LLM的多语言能力 |
large language model |
|
|
| 21 |
DETQUS: Decomposition-Enhanced Transformers for QUery-focused Summarization |
DETQUS:利用分解增强Transformer,解决查询聚焦的表格摘要生成问题 |
large language model |
|
|
| 22 |
MastermindEval: A Simple But Scalable Reasoning Benchmark |
提出MastermindEval:一个简单可扩展的推理基准测试,用于评估大型语言模型的演绎推理能力。 |
large language model |
|
|
| 23 |
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio |
MM-StoryAgent:提出一种多智能体框架,用于生成沉浸式叙事故事书视频 |
large language model |
|
|