| 1 |
Question: How do Large Language Models perform on the Question Answering tasks? Answer: |
对比研究:大型语言模型在问答任务中的表现及单次推理提示优化 |
large language model instruction following |
|
|
| 2 |
DateLogicQA: Benchmarking Temporal Biases in Large Language Models |
提出DateLogicQA基准测试,用于评估大型语言模型中的时间推理偏差。 |
large language model |
|
|
| 3 |
Refining Answer Distributions for Improved Large Language Model Reasoning |
提出精炼答案分布法,提升大语言模型推理能力 |
large language model |
|
|
| 4 |
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations |
提出压缩思维链(CCoT),通过稠密表示提升语言模型推理效率。 |
chain-of-thought |
|
|
| 5 |
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study |
研究大型语言模型在生成德国公众意见中的算法忠实性 |
large language model |
|
|
| 6 |
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval |
CLASP:用于多语言多模态信息检索的对比语言-语音预训练 |
multimodal |
|
|
| 7 |
RCLMuFN: Relational Context Learning and Multiplex Fusion Network for Multimodal Sarcasm Detection |
提出RCLMuFN模型,通过关系上下文学习和多路复用融合提升多模态讽刺检测性能。 |
multimodal |
|
|
| 8 |
SnakModel: Lessons Learned from Training an Open Danish Large Language Model |
SnakModel:基于Llama2-7B的丹麦语大语言模型训练与优化实践 |
large language model |
|
|
| 9 |
DSGram: Dynamic Weighting Sub-Metrics for Grammatical Error Correction in the Era of Large Language Models |
提出DSGram框架,利用动态权重子指标提升大语言模型时代语法纠错评估的有效性 |
large language model |
|
|
| 10 |
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement |
RAG-Star:通过检索增强的验证与改进提升LLM的审慎推理能力 |
large language model chain-of-thought |
|
|
| 11 |
An Automated Explainable Educational Assessment System Built on LLMs |
AERA Chat:基于LLM的自动化、可解释教育评估系统 |
large language model |
|
|
| 12 |
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation |
提出多阶段参数高效微调方法,扩展Llama模型至波斯语 |
large language model |
|
|
| 13 |
Training Dynamics of a 1.7B LLaMa Model: A Data-Efficient Approach |
训练17亿参数LLaMa模型:一种数据高效的方法 |
large language model |
✅ |
|
| 14 |
DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation |
提出DnDScore,通过解构和去语境化提升长文本生成的事实性验证效果 |
large language model |
|
|
| 15 |
Memory-Augmented Agent Training for Business Document Understanding |
提出Matrix框架,通过记忆增强Agent训练提升LLM在商业文档理解中的性能 |
large language model |
|
|
| 16 |
AI PERSONA: Towards Life-long Personalization of LLMs |
提出AI Persona框架,实现大语言模型(LLM)的终身个性化 |
large language model |
|
|
| 17 |
AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark |
提出AIR-Bench:自动化异构信息检索评测基准,解决新兴领域评测难题 |
large language model |
✅ |
|
| 18 |
NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation |
提出NAVCON:一个认知启发且语言对齐的视觉语言导航语料库 |
VLN |
|
|
| 19 |
Unlocking LLMs: Addressing Scarce Data and Bias Challenges in Mental Health |
提出IC-AnnoMI数据集,解决心理健康领域LLM数据稀缺和偏差问题 |
large language model |
|
|
| 20 |
Adaptations of AI models for querying the LandMatrix database in natural language |
利用AI模型适配LandMatrix数据库,实现自然语言查询 |
large language model |
✅ |
|
| 21 |
Truthful Text Sanitization Guided by Inference Attacks |
提出基于推理攻击指导的文本泛化脱敏方法,平衡隐私保护与效用 |
large language model |
|
|
| 22 |
Benchmarking and Understanding Compositional Relational Reasoning of LLMs |
提出GAR基准测试,用于评估和理解LLM的组合关系推理能力 |
large language model |
✅ |
|
| 23 |
More Tokens, Lower Precision: Towards the Optimal Token-Precision Trade-off in KV Cache Compression |
提出量化剪枝方法,优化KV缓存压缩中的Token-精度权衡,提升长文本LLM性能 |
large language model |
✅ |
|
| 24 |
Trigger$^3$: Refining Query Correction via Adaptive Model Selector |
提出Trigger$^3$,通过自适应模型选择优化查询纠错。 |
large language model |
|
|
| 25 |
Exploring Cross-lingual Latent Transplantation: Mutual Opportunities and Open Challenges |
提出跨语言隐空间迁移框架,提升LLM多语言能力和文化适应性 |
large language model |
|
|