| 1 |
From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking |
综述性研究:探索大语言模型与多模态大语言模型的越狱攻击 |
large language model multimodal |
|
|
| 2 |
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph |
LM-Polygraph:用于大规模语言模型不确定性量化的综合基准测试 |
large language model |
✅ |
|
| 3 |
Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization |
重新思考大语言模型剪枝:重建误差最小化的益处与陷阱 |
large language model |
|
|
| 4 |
Large Language Models have Intrinsic Self-Correction Ability |
揭示大语言模型内在自纠错能力,强调零温度与公正提示的重要性 |
large language model |
|
|
| 5 |
Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition |
评估大型语言模型在词汇蕴含识别中的局限性,揭示其在动词语义理解上的挑战 |
large language model |
|
|
| 6 |
ICLEval: Evaluating In-Context Learning Ability of Large Language Models |
ICLEval:提出评估大语言模型上下文学习能力的新基准 |
large language model |
✅ |
|
| 7 |
ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models |
提出ESC-Eval框架以评估大语言模型的情感支持对话 |
large language model |
✅ |
|
| 8 |
Safely Learning with Private Data: A Federated Learning Framework for Large Language Model |
提出FL-GLM:一种面向大语言模型的安全联邦学习框架,解决隐私泄露和效率问题。 |
large language model |
|
|
| 9 |
InternLM-Law: An Open Source Chinese Legal Large Language Model |
提出 InternLM-Law,一个开源的中文法律大语言模型,用于解决法律领域的复杂查询。 |
large language model |
|
|
| 10 |
70B-parameter large language models in Japanese medical question-answering |
利用70B参数大语言模型,通过指令微调提升日语医疗问答能力 |
large language model |
|
|
| 11 |
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video |
提出Sports Intelligence基准,评估语言模型在体育理解方面的能力,填补多模态体育理解的空白。 |
large language model multimodal chain-of-thought |
|
|
| 12 |
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models |
提出PE-Rank,利用段落嵌入加速大语言模型列表式重排序。 |
large language model |
✅ |
|
| 13 |
Towards Retrieval Augmented Generation over Large Video Libraries |
提出基于检索增强生成的大型视频库问答系统,助力视频内容高效再利用 |
large language model TAMP |
|
|
| 14 |
DEM: Distribution Edited Model for Training with Mixed Data Distributions |
提出分布编辑模型(DEM),高效解决混合数据分布下的模型训练难题 |
instruction following |
|
|
| 15 |
Synthetic Lyrics Detection Across Languages and Genres |
提出合成歌词检测方法以解决版权和内容透明性问题 |
large language model |
|
|
| 16 |
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network |
提出基于浅层无监督多头注意力网络的类脑语言处理模型 |
large language model |
|
|
| 17 |
ToVo: Toxicity Taxonomy via Voting |
ToVo:提出一种基于投票机制的毒性内容分类方法,解决现有模型透明性、定制性和可复现性不足的问题。 |
chain-of-thought |
|
|
| 18 |
Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks |
提出ArGPT数据集,用于评估和提升大型语言模型生成论证的质量。 |
large language model |
|
|
| 19 |
News Deja Vu: Connecting Past and Present with Semantic Search |
News Deja Vu:利用语义搜索连接历史与现代新闻,辅助社会科学研究。 |
large language model |
|
|
| 20 |
Detecting AI-Generated Text: Factors Influencing Detectability with Current Methods |
综述AI生成文本检测方法,分析影响可检测性的关键因素,并为未来研究提供建议。 |
large language model |
|
|
| 21 |
How language models extrapolate outside the training data: A case study in Textualized Gridworld |
提出基于认知地图的CoT框架,提升语言模型在文本化Gridworld中的外推能力 |
chain-of-thought |
|
|
| 22 |
Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics |
提出细粒度引用评估框架,分析现有忠实度指标在生成文本中的有效性 |
large language model |
|
|
| 23 |
A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation |
提出基于LLM排序的自动反叙事生成评估方法,显著提升与人类判断的相关性。 |
large language model |
|
|
| 24 |
Unsupervised Extraction of Dialogue Policies from Conversations |
提出一种基于图结构的无监督对话策略提取方法,提升任务型对话系统开发效率。 |
large language model |
|
|
| 25 |
PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data |
PARIKSHA:大规模研究人类与LLM评估器在多语言和多文化数据上的一致性 |
large language model |
|
|
| 26 |
Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation |
提出Retrieve-Plan-Generation框架,通过迭代规划和检索增强知识密集型LLM生成。 |
large language model |
|
|
| 27 |
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems |
RAG系统中Base LLM性能优于Instruct LLM,平均提升20% |
large language model |
|
|
| 28 |
AgriLLM: Harnessing Transformers for Farmer Queries |
AgriLLM:利用Transformer解决农民咨询问题 |
large language model |
|
|
| 29 |
OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants |
提出OATH-Frames框架,利用LLM辅助分析在线媒体中对无家可归者的态度 |
large language model |
|
|
| 30 |
Word Matters: What Influences Domain Adaptation in Summarization? |
研究词汇对摘要生成领域自适应的影响,提出基于学习难度的性能预测方法 |
large language model |
|
|
| 31 |
How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions |
基于霍夫斯泰德文化维度,评估大型语言模型在跨文化价值观上的表现 |
large language model |
|
|