| 1 |
Think Less, Label Better: Multi-Stage Domain-Grounded Synthetic Data Generation for Fine-Tuning Large Language Models in Telecommunications |
提出基于领域知识图谱的多阶段合成数据生成方法,用于电信领域大语言模型微调。 |
large language model instruction following |
|
|
| 2 |
CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling |
提出CATCH框架以提升AI咨询的治疗忠实度和决策合理性 |
large language model chain-of-thought |
|
|
| 3 |
BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses |
BiasFreeBench:用于评估和缓解大语言模型偏见的综合基准 |
large language model |
|
|
| 4 |
RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models |
提出RoBiologyDataChoiceQA,用于提升大语言模型在生物学理解方面的能力 |
large language model |
|
|
| 5 |
TraceDet: Hallucination Detection from the Decoding Trace of Diffusion Large Language Models |
TraceDet:利用扩散大语言模型解码轨迹进行幻觉检测 |
large language model |
|
|
| 6 |
Direct Token Optimization: A Self-contained Approach to Large Language Model Unlearning |
提出直接Token优化(DTO)方法,实现大语言模型自包含式遗忘学习。 |
large language model |
|
|
| 7 |
TAMA: Tool-Augmented Multimodal Agent for Procedural Activity Understanding |
提出TAMA:工具增强的多模态Agent,用于程序性活动理解 |
multimodal |
|
|
| 8 |
OceanGym: A Benchmark Environment for Underwater Embodied Agents |
OceanGym:水下具身智能体的综合基准环境,应对极端环境挑战。 |
embodied AI large language model |
✅ |
|
| 9 |
Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer |
研究个性化科学图表标题生成,探索作者风格迁移方法。 |
large language model multimodal |
|
|
| 10 |
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs |
BatonVoice:利用LLM语言智能增强可控语音合成的运算主义框架 |
large language model instruction following |
|
|
| 11 |
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts |
提出Latent Thinking Optimization,通过隐空间奖励建模提升LLM推理能力 |
large language model chain-of-thought |
|
|
| 12 |
SafePassage: High-Fidelity Information Extraction with Black Box LLMs |
SafePassage:利用黑盒LLM实现高保真信息抽取,显著降低幻觉。 |
large language model |
|
|
| 13 |
Catalog-Native LLM: Speaking Item-ID Dialect with Less Entanglement for Recommendation |
提出IDIOMoE,通过Item-ID方言和MoE结构,增强LLM在推荐系统中的协同过滤能力。 |
large language model |
|
|
| 14 |
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics |
TAU:一个用于文化声音理解的基准,超越语义层面 |
multimodal |
|
|
| 15 |
Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing |
提出动态增强退火(DBA)方法,解耦通用和领域学习,高效微调LLM。 |
large language model |
|
|
| 16 |
Uncertainty-Aware Answer Selection for Improved Reasoning in Multi-LLM Systems |
提出基于校准对数似然的多LLM答案选择方法,提升推理性能。 |
large language model |
|
|
| 17 |
Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction |
提出IR-Tuning,一种高效的层级LLM微调框架,用于文本修订意图预测。 |
large language model |
|
|
| 18 |
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture |
TUMIX:基于工具使用混合的多Agent测试时扩展方法 |
large language model |
|
|
| 19 |
Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It |
提出PREFDISCO评估框架,揭示LLM在即时个性化推理上的局限性 |
large language model |
|
|
| 20 |
Deconstructing Self-Bias in LLM-generated Translation Benchmarks |
揭示LLM生成翻译评测基准中的自偏见问题,并提出缓解策略 |
large language model |
|
|
| 21 |
PerQ: Efficient Evaluation of Multilingual Text Personalization Quality |
提出PerQ:一种高效评估多语言文本个性化质量的指标 |
large language model |
|
|
| 22 |
RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity |
提出RoleConflictBench基准,评估LLM在角色冲突场景下的情境敏感性 |
large language model |
|
|
| 23 |
LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts |
提出LD-MoLE,通过可学习动态路由实现LoRA专家混合,提升LLM微调性能。 |
large language model |
|
|
| 24 |
Submodular Context Partitioning and Compression for In-Context Learning |
提出Sub-CP框架,利用子模目标进行上下文分块和压缩,提升ICL性能。 |
large language model |
|
|
| 25 |
Evaluation Sheet for Deep Research: A Use Case for Academic Survey Writing |
提出评估框架,用于评估大语言模型在学术综述写作中的表现 |
large language model |
|
|
| 26 |
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization |
提出Matryoshka MoE,实现MoE模型在推理时专家利用的弹性调整 |
large language model |
|
|
| 27 |
CreAgentive: An Agent Workflow Driven Multi-Category Creative Generation Engine |
提出CreAgentive以解决多类别创作生成的四大问题 |
large language model |
|
|
| 28 |
Automatic Fact-checking in English and Telugu |
构建英-泰双语数据集,探索大型语言模型在英语和泰卢固语事实核查中的有效性。 |
large language model |
|
|
| 29 |
Fast-dLLM v2: Efficient Block-Diffusion LLM |
Fast-dLLM v2:高效块扩散语言模型,加速并行文本生成。 |
large language model |
|
|
| 30 |
VietBinoculars: A Zero-Shot Approach for Detecting Vietnamese LLM-Generated Text |
VietBinoculars:一种零样本越南语LLM生成文本检测方法 |
large language model |
|
|
| 31 |
Explaining novel senses using definition generation with open language models |
利用开源语言模型生成定义解释词义演变,性能超越闭源模型。 |
large language model |
|
|
| 32 |
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation |
IMProofBench:用于评估AI在研究级数学证明生成能力的新基准 |
large language model |
|
|
| 33 |
Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis |
LENS:通过潜在空间合成提升有限偏好数据下的奖励模型学习 |
large language model |
✅ |
|
| 34 |
RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation |
RE$^2$:通过检索带解释的合适示例来改进中文语法纠错 |
large language model |
|
|
| 35 |
Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel |
提出KERN以替代Softmax解决MoE模型中的路由问题 |
large language model |
|
|
| 36 |
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations |
ReFACT:提出一个科学知识捏造检测基准,包含位置错误标注。 |
large language model |
✅ |
|
| 37 |
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities |
提出数学原子能力解耦方法,探索大语言模型数学推理能力的本质 |
large language model |
|
|
| 38 |
The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale |
提出Media Bias Detector框架,用于大规模标注和分析新闻媒体的偏见 |
large language model |
|
|