| 1 |
K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology |
K-MetBench:用于气象领域专家推理、局部性和多模态的细粒度评估基准 |
large language model multimodal |
✅ |
|
| 2 |
AdapTime: Enabling Adaptive Temporal Reasoning in Large Language Models |
AdapTime:提出自适应时序推理方法,提升大语言模型处理时间信息的能力 |
large language model |
|
|
| 3 |
The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models |
揭示大语言模型人格崩塌现象,提出量化框架评估人口多样性 |
large language model |
|
|
| 4 |
Generating Place-Based Compromises Between Two Points of View |
提出基于同理心中立性的LLM提示方法,生成更易接受的观点折衷方案 |
large language model foundation model chain-of-thought |
|
|
| 5 |
Zero-shot Large Language Models for Automatic Readability Assessment |
提出基于零样本大语言模型的自动可读性评估方法,显著提升评估效果 |
large language model |
|
|
| 6 |
A Multi-Dimensional Audit of Politically Aligned Large Language Models |
提出多维度评估框架,用于审计政治倾向性大型语言模型的有效性、公平性、真实性和说服力。 |
large language model |
|
|
| 7 |
Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation |
提出CanMT以解决大语言模型在文化翻译中的不足 |
large language model |
|
|
| 8 |
Aligned Multi-View Scripts for Universal Chart-to-Code Generation |
提出CharLuMA,利用多语言对齐脚本提升图表到代码生成效果 |
multimodal language conditioned |
✅ |
|
| 9 |
MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG |
提出MEG-RAG以解决多模态证据选择问题 |
large language model multimodal |
|
|
| 10 |
OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents |
OS-SPEAR:用于操作系统代理安全性、性能、效率和鲁棒性分析的工具包 |
large language model multimodal |
✅ |
|
| 11 |
A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations |
针对LLM微调的分裂学习综述:模型、系统与隐私优化 |
large language model |
|
|
| 12 |
Kwai Summary Attention Technical Report |
提出Kwai Summary Attention (KSA),通过可学习的摘要token压缩长文本上下文,降低序列建模成本。 |
large language model |
|
|
| 13 |
Stabilizing Efficient Reasoning with Step-Level Advantage Selection |
提出步级优势选择(SAS)以稳定高效推理,提升LLM在短上下文下的性能。 |
large language model |
|
|
| 14 |
DepthKV: Layer-Dependent KV Cache Pruning for Long-Context LLM Inference |
DepthKV:针对长文本LLM推理的分层KV缓存剪枝方法 |
large language model |
|
|
| 15 |
Skill Retrieval Augmentation for Agentic AI |
提出技能检索增强(SRA)范式,解决Agentic AI中技能扩展瓶颈问题 |
large language model |
|
|
| 16 |
The Pragmatic Persona: Discovering LLM Persona through Bridging Inference |
提出基于桥接推理的LLM Persona发现框架,提升语义连贯性和Persona识别稳定性。 |
large language model |
✅ |
|
| 17 |
Contextual Linear Activation Steering of Language Models |
提出上下文线性激活调控(CLAS),提升大语言模型在少量数据下的行为控制能力。 |
large language model |
|
|
| 18 |
Can LLMs Act as Historians? Evaluating Historical Research Capabilities of LLMs via the Chinese Imperial Examination |
提出ProHist-Bench基准,评估LLM在科举历史研究中的推理能力。 |
large language model |
✅ |
|
| 19 |
Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style |
研究表明,用户可以通过编辑LLM生成文本,使其更贴近个人写作风格,但仍存在LLM痕迹。 |
large language model |
|
|
| 20 |
Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer |
提出可微忠实度对齐(DFA)框架,用于跨模型传递神经回路信息。 |
zero-shot transfer |
✅ |
|
| 21 |
PeeriScope: A Multi-Faceted Framework for Evaluating Peer Review Quality |
PeeriScope:一个多维度评估同行评审质量的综合框架 |
large language model |
✅ |
|