| 1 |
Capabilities of GPT-5 across critical domains: Is it the next breakthrough? |
比较GPT-4与GPT-5在关键领域的能力,揭示其潜在突破 |
large language model multimodal |
|
|
| 2 |
STEM: Efficient Relative Capability Evaluation of LLMs through Structured Transition Samples |
提出STEM方法以高效评估大型语言模型的相对能力 |
large language model |
|
|
| 3 |
J6: Jacobian-Driven Role Attribution for Multi-Objective Prompt Optimization in LLMs |
提出J6以解决大型语言模型多目标优化问题 |
large language model |
|
|
| 4 |
Mitigating Jailbreaks with Intent-Aware LLMs |
提出Intent-FT以解决大语言模型的越狱攻击问题 |
large language model |
✅ |
|
| 5 |
Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation |
提出FineCE以解决LLM生成过程中的置信度估计问题 |
large language model |
|
|
| 6 |
CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection |
提出CAMF框架以解决机器生成文本检测问题 |
large language model |
|
|
| 7 |
CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures |
提出CORE指标以量化多智能体LLM交互质量 |
large language model |
✅ |
|