| 1 |
Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers |
提出主动交互推理PIR,将LLM从被动求解器转变为主动询问者,提升推理性能。 |
large language model chain-of-thought |
✅ |
|
| 2 |
On the Paradoxical Interference between Instruction-Following and Task Solving |
揭示指令遵循对LLM任务解决能力的悖论式干扰,并提出SUSTAINSCORE进行量化 |
large language model instruction following |
|
|
| 3 |
Scaling Reasoning Hop Exposes Weaknesses: Demystifying and Improving Hop Generalization in Large Language Models |
提出测试时推理修正方法,提升大语言模型在推理跳数泛化上的能力 |
large language model chain-of-thought |
|
|
| 4 |
A Federated and Parameter-Efficient Framework for Large Language Model Training in Medicine |
提出Fed-MedLoRA框架,用于医学领域大语言模型的联邦式参数高效训练。 |
large language model |
|
|
| 5 |
$G^2$-Reader: Dual Evolving Graphs for Multimodal Document QA |
提出G²-Reader双图演化框架,解决多模态文档QA中结构断裂和检索漂移问题 |
multimodal |
|
|
| 6 |
Mil-SCORE: Benchmarking Long-Context Geospatial Reasoning and Planning in Large Language Models |
Mil-SCORE:提出军事场景下长上下文地理空间推理与规划基准 |
large language model |
|
|
| 7 |
Temporal Guidance for Large Language Models |
提出时间引导(TeGu)方法,提升大语言模型生成质量并降低计算开销。 |
large language model |
|
|
| 8 |
SHARP: Social Harm Analysis via Risk Profiles for Measuring Inequities in Large Language Models |
SHARP:通过风险剖析进行社会危害分析,衡量大型语言模型中的不公平性 |
large language model |
|
|
| 9 |
Parametric Knowledge is Not All You Need: Toward Honest Large Language Models via Retrieval of Pretraining Data |
利用预训练数据检索,提升大语言模型回答问题的诚实度 |
large language model |
|
|
| 10 |
When "Better" Prompts Hurt: Evaluation-Driven Iteration for LLM Applications |
提出基于评估驱动的LLM应用迭代工作流,解决提示工程中的trade-off问题 |
large language model instruction following |
|
|
| 11 |
CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding |
CausalEmbed:面向视觉文档嵌入的隐空间自回归多向量生成方法 |
large language model multimodal |
|
|
| 12 |
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale |
FineInstructions:通过扩展合成指令数据到预训练规模,提升LLM性能 |
large language model |
✅ |
|
| 13 |
ECO: Quantized Training without Full-Precision Master Weights |
ECO:无需全精度Master Weights的量化训练方法 |
large language model |
|
|
| 14 |
FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning |
FIT框架:应对LLM持续卸载中的灾难性遗忘问题 |
large language model |
|
|
| 15 |
Thinking Out of Order: When Output Order Stops Reflecting Reasoning Order in Diffusion Language Models |
提出掩蔽扩散语言模型以解决自回归模型的推理顺序问题 |
chain-of-thought |
|
|
| 16 |
Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text |
提出Learn-to-Distance算法,自适应学习文本距离以检测LLM生成内容 |
large language model |
|
|
| 17 |
Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning |
提出OG-MAR框架,通过本体引导的多Agent推理提升LLM的文化一致性。 |
large language model |
|
|
| 18 |
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation |
提出CoNL框架,通过元评估自进化LLM,解决非验证性任务训练难题。 |
large language model |
|
|
| 19 |
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning |
提出VTC-R1,通过视觉-文本压缩提升长上下文推理效率。 |
large language model |
✅ |
|
| 20 |
MasalBench: A Benchmark for Contextual and Cross-Cultural Understanding of Persian Proverbs in LLMs |
MasalBench:构建波斯谚语理解基准,评估LLM的语境和跨文化能力 |
large language model |
✅ |
|
| 21 |
Embodied Task Planning via Graph-Informed Action Generation with Large Lanaguage Model |
提出GiG框架,利用图结构信息提升LLM在具身任务规划中的长程策略连贯性。 |
large language model |
|
|
| 22 |
Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention |
Zonkey:提出一种可微分分词和概率注意力机制的层级扩散语言模型,实现端到端优化。 |
large language model |
|
|
| 23 |
Why Attention Patterns Exist: A Unifying Temporal Perspective Analysis |
提出TAPPA框架,从时序角度统一解释LLM注意力模式并指导推理加速。 |
large language model |
✅ |
|
| 24 |
Scale-Dependent Semantic Dynamics Revealed by Allan Deviation |
利用Allan偏差揭示语义动态的尺度依赖性 |
large language model |
|
|
| 25 |
AdaptBPE: From General Purpose to Specialized Tokenizers |
AdaptBPE提出了一种后训练的tokenizer自适应方法,提升特定领域或语言的LLM效率。 |
large language model |
✅ |
|
| 26 |
DimStance: Multilingual Datasets for Dimensional Stance Analysis |
DimStance:提出多语言情感维度立场分析数据集,用于细粒度情感感知立场检测。 |
large language model |
|
|
| 27 |
User-Centric Evidence Ranking for Attribution and Fact Verification |
提出证据排序任务,优化用户在事实核查中的证据阅读效率和准确性 |
large language model |
|
|
| 28 |
MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation |
MGSM-Pro:一种稳健的多语言数学推理评估策略 |
large language model |
|
|
| 29 |
Scaling Embeddings Outperforms Scaling Experts in Language Models |
语言模型中,扩展嵌入层优于扩展专家层,并提出LongCat-Flash-Lite模型。 |
large language model |
|
|