| 1 |
EmoBench-M: Benchmarking Emotional Intelligence for Multimodal Large Language Models |
EmoBench-M:多模态大语言模型情感智能评测基准 |
large language model multimodal |
✅ |
|
| 2 |
Multimodal Medical Code Tokenizer |
提出MedTok:一种融合文本描述和关系信息的医疗代码多模态Tokenizer |
foundation model multimodal |
|
|
| 3 |
UltraIF: Advancing Instruction Following from the Wild |
UltraIF:一种从真实世界指令中提升LLM指令跟随能力的方法 |
large language model instruction following |
✅ |
|
| 4 |
Verifiable Format Control for Large Language Model Generations |
提出VFF数据集与渐进式训练方法,提升小型LLM在JSON等格式控制上的能力 |
large language model instruction following |
|
|
| 5 |
Experiments with Large Language Models on Retrieval-Augmented Generation for Closed-Source Simulation Software |
针对闭源仿真软件,探索基于检索增强生成的大语言模型应用 |
large language model |
|
|
| 6 |
LLMs to Support a Domain Specific Knowledge Assistant |
利用LLM生成高质量数据集,构建可持续性报告领域的知识助手 |
large language model chain-of-thought |
|
|
| 7 |
Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection |
提出FairOPT算法,通过群体自适应阈值优化提升AI生成文本检测的鲁棒性。 |
large language model |
|
|
| 8 |
Active Task Disambiguation with LLMs |
提出基于LLM的主动任务消歧方法,解决现实场景中任务定义模糊问题。 |
large language model |
|
|
| 9 |
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization |
探索基于不确定性的端侧LLM路由:从基准测试到泛化 |
large language model |
|
|
| 10 |
ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters |
ChameleonLLM:提出基于推理时聚类的批量自适应动态低秩调整方法 |
large language model |
|
|
| 11 |
The Best Instruction-Tuning Data are Those That Fit |
GRAPE:针对目标模型特性优化指令微调数据选择,显著提升性能 |
large language model |
|
|
| 12 |
Exploring Imbalanced Annotations for Effective In-Context Learning |
提出RCB方法,解决类不平衡标注下In-Context Learning性能下降问题 |
large language model |
|
|
| 13 |
MultiQ&A: An Analysis in Measuring Robustness via Automated Crowdsourcing of Question Perturbations and Answers |
MultiQ&A:通过众包问题扰动与答案评估LLM的鲁棒性与一致性 |
large language model |
|
|
| 14 |
Controlled LLM Decoding via Discrete Auto-regressive Biasing |
提出离散自回归偏置方法,解决LLM可控文本生成中流畅性与约束性难以平衡的问题。 |
large language model |
|
|
| 15 |
Building A Unified AI-centric Language System: analysis, framework and future work |
提出一种统一的AI中心语言系统框架,旨在提升AI模型效率并减少偏见。 |
large language model |
|
|
| 16 |
The simulation of judgment in LLMs |
通过模拟LLM的判断过程,揭示其评估标准与人类的差异,并探讨潜在风险。 |
large language model |
|
|
| 17 |
MAQInstruct: Instruction-based Unified Event Relation Extraction |
MAQInstruct:通过指令调整和二分图匹配改进事件关系抽取 |
large language model |
|
|
| 18 |
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models |
提出自回溯机制,提升语言模型推理能力与效率 |
large language model |
|
|
| 19 |
It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers |
提出ModernBERT-Large-Instruct,利用MLM头实现BERT类模型作为生成式分类器,提升零样本和微调性能。 |
large language model |
|
|
| 20 |
Division-of-Thoughts: Harnessing Hybrid Language Model Synergy for Efficient On-Device Agents |
提出Division-of-Thoughts框架,利用混合语言模型协同提升端侧AI Agent效率。 |
large language model |
|
|
| 21 |
A Comparison of DeepSeek and Other LLMs |
对比DeepSeek与主流LLM在文本分类任务上的性能,并构建新数据集。 |
large language model |
|
|
| 22 |
My LLM might Mimic AAE -- But When Should it? |
研究表明,黑人用户希望大语言模型在生成非洲裔美国英语时具有选择性和情境感知能力。 |
large language model |
✅ |
|
| 23 |
Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis |
提出参考级别反馈,引导数据合成,提升指令微调LLM性能 |
large language model |
|
|
| 24 |
ULPT: Prompt Tuning with Ultra-Low-Dimensional Optimization |
提出超低维Prompt Tuning(ULPT),高效微调大语言模型。 |
large language model |
|
|
| 25 |
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization |
提出内容-格式集成提示优化(CFPO),提升大语言模型在多任务上的性能。 |
large language model |
✅ |
|
| 26 |
Reformulation for Pretraining Data Augmentation |
提出大规模体裁-受众(MGA)重构方法,缓解预训练数据重复问题,提升大语言模型扩展性。 |
large language model |
|
|
| 27 |
Sports and Women's Sports: Gender Bias in Text Generation with Olympic Data |
利用奥运数据揭示大型语言模型在体育领域文本生成中的性别偏见 |
large language model |
|
|
| 28 |
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs |
研究LLM对输入顺序的敏感性,揭示其在不同任务中的性能退化 |
large language model |
|
|
| 29 |
Quantification of Biodiversity from Historical Survey Text with LLM-based Best-Worst Scaling |
利用LLM和Best-Worst Scaling从历史文本中量化生物多样性 |
large language model |
|
|
| 30 |
PsyPlay: Personality-Infused Role-Playing Conversational Agents |
PsyPlay:提出一种人格注入的角色扮演对话Agent框架 |
large language model |
|
|
| 31 |
Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective |
提出基于输出扰动的KV缓存关键性识别算法,提升LLM长序列推理效率 |
large language model |
|
|
| 32 |
Enhancing Hallucination Detection through Noise Injection |
通过噪声注入增强大语言模型幻觉检测 |
large language model |
|
|
| 33 |
Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing |
提出BLUE策略,提升模型编辑中locate-then-edit方法的精度和泛化能力 |
large language model |
✅ |
|