| 1 |
Vis-CoT: A Human-in-the-Loop Framework for Interactive Visualization and Intervention in LLM Chain-of-Thought Reasoning |
Vis-CoT:人机协同交互式可视化LLM思维链推理框架 |
large language model chain-of-thought |
|
|
| 2 |
CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models |
提出因果注意力调整(CAT)方法,将细粒度因果知识注入大型语言模型。 |
large language model |
✅ |
|
| 3 |
Trusted Uncertainty in Large Language Models: A Unified Framework for Confidence Calibration and Risk-Controlled Refusal |
提出UniCR框架,通过校准不确定性证据实现大语言模型风险可控的拒绝回答。 |
large language model |
|
|
| 4 |
On the Alignment of Large Language Models with Global Human Opinion |
提出基于世界价值观调查的框架,评估大语言模型与全球人类意见的对齐程度。 |
large language model |
✅ |
|
| 5 |
Can Large Language Models Master Complex Card Games? |
探索LLM在复杂卡牌游戏中的能力:通过精调实现类人智能 |
large language model |
✅ |
|
| 6 |
WATCHED: A Web AI Agent Tool for Combating Hate Speech by Expanding Data |
提出WATCHED,一种结合LLM与专业工具的AI Agent,用于辅助内容审核员打击网络仇恨言论。 |
large language model chain-of-thought |
|
|
| 7 |
ShortageSim: Simulating Drug Shortages under Information Asymmetry |
ShortageSim:首个信息不对称下药品短缺监管干预的模拟框架 |
large language model |
|
|
| 8 |
Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs |
重新审视LLM的Prompt敏感性:评估方法伪像还是模型缺陷? |
large language model |
|
|
| 9 |
Where Should I Study? Biased Language Models Decide! Evaluating Fairness in LMs for Academic Recommendations |
提出多维评估框架以解决语言模型推荐中的偏见问题 |
large language model |
|
|
| 10 |
Benchmarking the Detection of LLMs-Generated Modern Chinese Poetry |
构建现代中文诗歌检测基准,评估现有模型在识别LLM生成诗歌上的能力。 |
large language model |
|
|
| 11 |
Do Retrieval Augmented Language Models Know When They Don't Know? |
研究检索增强语言模型(RALM)的拒答能力,并提出改进方案以平衡拒答与正确回答。 |
large language model |
|
|
| 12 |
Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA |
提出Reason-KE框架,通过显式推理链实现LLM在多跳QA中鲁棒的知识编辑 |
large language model |
|
|
| 13 |
LLMs cannot spot math errors, even when allowed to peek into the solution |
LLM难以发现数学解题步骤中的错误,即使允许查看参考答案 |
large language model |
|
|
| 14 |
LongCat-Flash Technical Report |
LongCat-Flash:一个具有高效计算和高级Agent能力的5600亿参数MoE语言模型 |
foundation model |
✅ |
|
| 15 |
Natural Context Drift Undermines the Natural Language Understanding of Large Language Models |
提出框架分析自然文本演变对LLM问答能力的影响 |
large language model |
|
|
| 16 |
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition |
提出序列化输出提示以提升多说话者语音识别性能 |
large language model |
|
|
| 17 |
Assessing Large Language Models on Islamic Legal Reasoning: Evidence from Inheritance Law Evaluation |
评估大型语言模型在伊斯兰继承法推理中的表现 |
large language model |
✅ |
|
| 18 |
REFRAG: Rethinking RAG based Decoding |
提出REFRAG以解决RAG解码效率问题 |
large language model |
|
|