| 1 |
Multimodal Programming in Computer Science with Interactive Assistance Powered by Large Language Model |
利用大语言模型构建交互式编程辅助系统,提升计算机科学教学效果 |
large language model multimodal |
|
|
| 2 |
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators |
大型语言模型可辅助人工标注事件,但无法独立完成高质量标注 |
large language model |
|
|
| 3 |
Delusions of Large Language Models |
揭示大语言模型幻觉新形态:高置信度幻觉(Delusion)及其缓解策略 |
large language model |
|
|
| 4 |
Alignment for Efficient Tool Calling of Large Language Models |
提出多目标对齐框架,提升大语言模型工具调用效率,减少不必要调用。 |
large language model |
|
|
| 5 |
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models |
InftyThink:突破大语言模型长文本推理长度限制,实现无限深度推理 |
large language model |
|
|
| 6 |
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation |
利用大型语言模型作为编码器,提升神经机器翻译的效率与泛化能力 |
large language model |
|
|
| 7 |
WildIFEval: Instruction Following in the Wild |
WildIFEval:提出大规模真实用户指令数据集,评估LLM在复杂约束下的指令遵循能力 |
instruction following |
|
|
| 8 |
Effectiveness of Zero-shot-CoT in Japanese Prompts |
比较日英零-shot CoT 提示的有效性 |
chain-of-thought |
|
|
| 9 |
PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts |
PFDial:基于UML流程图的结构化对话指令微调方法,提升流程驱动对话系统性能 |
large language model |
✅ |
|
| 10 |
DependEval: Benchmarking LLMs for Repository Dependency Understanding |
DependEval:用于评估LLM在代码仓库依赖理解能力的分层基准测试 |
large language model |
|
|
| 11 |
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation |
FEA-Bench:用于评估代码大模型在仓库级别特征实现的代码生成能力基准 |
large language model |
|
|
| 12 |
Enhancing NLP Robustness and Generalization through LLM-Generated Contrast Sets: A Scalable Framework for Systematic Evaluation and Adversarial Training |
利用LLM生成对抗样本集,提升NLP模型的鲁棒性和泛化能力 |
large language model |
|
|
| 13 |
Evaluating and Aligning Human Economic Risk Preferences in LLMs |
评估并对齐LLM中人类经济风险偏好,提升决策合理性 |
large language model |
|
|
| 14 |
BingoGuard: LLM Content Moderation Tools with Risk Levels |
BingoGuard:构建具备风险等级评估能力的大语言模型内容审核工具 |
large language model |
|
|
| 15 |
SafeSpeech: A Comprehensive and Interactive Tool for Analysing Sexist and Abusive Language in Conversations |
SafeSpeech:一个用于分析对话中性别歧视和辱骂性语言的综合交互式工具 |
large language model |
|
|