| 1 |
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models |
提出MINER框架以解决多模态大语言模型的可解释性问题 |
large language model multimodal |
|
|
| 2 |
Rational Metareasoning for Large Language Models |
提出基于元推理的LLM优化方法,降低推理成本并保持性能 |
large language model chain-of-thought |
|
|
| 3 |
Grounding Partially-Defined Events in Multimodal Data |
提出MultiVENT-G基准,用于多模态数据中部分定义事件的定位与理解。 |
multimodal |
|
|
| 4 |
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models |
Data Advisor:面向大语言模型安全对齐的动态数据管理方法 |
large language model |
|
|
| 5 |
Narrative-of-Thought: Improving Temporal Reasoning of Large Language Models via Recounted Narratives |
提出Narrative-of-Thought,提升大语言模型在时间推理任务上的性能 |
large language model |
✅ |
|
| 6 |
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models |
为泰语大语言模型开发,提出文化和核心能力基准测试集Thai-H6和ThaiCLI |
large language model |
|
|
| 7 |
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification |
提出基于属性控制的LLM微调框架,以提升模型输出安全性 |
large language model |
|
|
| 8 |
Mitigating the Risk of Health Inequity Exacerbated by Large Language Models |
提出EquityGuard框架,用于检测和缓解大语言模型在医疗应用中加剧的健康不公平风险。 |
large language model |
|
|
| 9 |
Efficient Inference for Large Language Model-based Generative Recommendation |
提出AtSpeed框架,加速基于大语言模型的生成式推荐系统推理。 |
large language model |
✅ |
|
| 10 |
Investigating large language models for their competence in extracting grammatically sound sentences from transcribed noisy utterances |
评估大型语言模型从含噪语音转录中提取语法正确句子的能力 |
large language model |
|
|
| 11 |
Explanation sensitivity to the randomness of large language models: the case of journalistic text classification |
揭示LLM随机性对解释性的影响:以新闻文本分类为例 |
large language model |
|
|
| 12 |
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes |
提出WeSaR重参数化方法,缓解大语言模型预训练中的Loss Spikes问题 |
large language model |
|
|
| 13 |
Formality is Favored: Unraveling the Learning Preferences of Large Language Models on Data with Conflicting Knowledge |
揭示大语言模型在冲突知识数据上的学习偏好:偏向形式化文本 |
large language model |
|
|
| 14 |
Superficial Safety Alignment Hypothesis |
提出浅层安全对齐假设(SSAH),揭示LLM安全对齐的关键神经元组件。 |
large language model instruction following |
|
|
| 15 |
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe |
SFTMix:利用Mixup提升语言模型指令微调效果,无需高质量数据集。 |
large language model instruction following |
|
|
| 16 |
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization |
揭示指令多样性对LLM泛化能力的关键影响,指导指令调优数据收集 |
large language model instruction following |
|
|
| 17 |
On Instruction-Finetuning Neural Machine Translation Models |
提出指令微调方法,将LLM的指令遵循能力迁移至更小的NMT模型,实现定制化翻译。 |
large language model instruction following |
|
|
| 18 |
A Recipe For Building a Compliant Real Estate Chatbot |
构建合规房地产聊天机器人,解决歧视性行为,媲美GPT-4o |
large language model instruction following |
|
|
| 19 |
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles |
提出TurtleBench,通过真实用户谜题评估大语言模型的推理能力 |
large language model chain-of-thought |
|
|
| 20 |
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References |
RevisEval:通过响应适配的参考文本提升LLM作为评估器的可靠性 |
large language model instruction following |
|
|
| 21 |
Fill In The Gaps: Model Calibration and Generalization with Synthetic Data |
提出基于合成数据的模型校准方法,提升泛化能力并保持模型精度。 |
large language model |
|
|
| 22 |
Causal Micro-Narratives |
提出一种基于主题本体的因果微叙事分类方法,应用于通货膨胀叙事分析。 |
large language model |
|
|
| 23 |
Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates |
Cookbook:通过程序化数据生成模板提升LLM生成能力 |
large language model |
|
|
| 24 |
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery |
ScienceAgentBench:面向数据驱动科学发现的语言智能体严格评估基准 |
large language model |
|
|
| 25 |
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths |
提出推理路径优化方法以提升复杂问题求解能力 |
large language model |
|
|
| 26 |
Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation |
提出D3框架,通过多智能体辩论实现可靠、可解释且低成本的大语言模型评估。 |
large language model |
|
|
| 27 |
Exploring the Personality Traits of LLMs through Latent Features Steering |
通过潜在特征引导探索LLM的个性特质 |
large language model |
|
|
| 28 |
Post-hoc Study of Climate Microtargeting on Social Media Ads with LLMs: Thematic Insights and Fairness Evaluation |
利用大型语言模型分析社交媒体气候微定向广告,揭示主题偏好与公平性问题。 |
large language model |
|
|
| 29 |
Mirror-Consistency: Harnessing Inconsistency in Majority Voting |
提出Mirror-Consistency,利用大语言模型投票不一致性提升推理能力和置信度校准 |
large language model |
|
|
| 30 |
Rationale-Aware Answer Verification by Pairwise Self-Evaluation |
提出REPS方法,通过成对自评估提升答案验证器对推理过程合理性的判断能力 |
large language model |
|
|
| 31 |
PECAN: LLM-Guided Dynamic Progress Control with Attention-Guided Hierarchical Weighted Graph for Long-Document QA |
PECAN:基于LLM引导的动态进度控制和注意力引导分层加权图的长文档问答方法 |
large language model |
|
|
| 32 |
The LLM Effect: Are Humans Truly Using LLMs, or Are They Being Influenced By Them Instead? |
研究表明LLM辅助分析虽提速,但可能引入锚定偏差,影响分析深度。 |
large language model |
|
|
| 33 |
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs |
提出MathHay:一个用于评估LLM长文本数学推理能力的自动化基准。 |
large language model |
|
|