| 1 |
From Answers to Rationales: Self-Aligning Multimodal Reasoning with Answer-Oriented Chain-of-Thought |
提出SMART框架,通过答案导向的思维链自对齐多模态推理,提升模型泛化性和鲁棒性。 |
large language model multimodal chain-of-thought |
✅ |
|
| 2 |
Mixture of Reasonings: Teach Large Language Models to Reason with Adaptive Strategies |
提出混合推理(MoR)框架,提升大语言模型在复杂任务中的自适应推理能力 |
large language model chain-of-thought |
|
|
| 3 |
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks |
提出SciArena以解决科学文献任务评估的不足问题 |
foundation model |
|
|
| 4 |
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America |
提出La Leaderboard,用于评估西班牙语及其变体的LLM性能 |
large language model |
|
|
| 5 |
TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation |
TransLaw:多智能体协同翻译框架,用于香港法律判决的LLM基准测试 |
large language model |
|
|
| 6 |
AI Analyst: Framework and Comprehensive Evaluation of Large Language Models for Financial Time Series Report Generation |
AI分析师:提出利用大型语言模型生成金融时间序列报告的框架与综合评估方法 |
large language model |
|
|
| 7 |
Leveraging Large Language Models for Spontaneous Speech-Based Suicide Risk Detection |
利用大型语言模型进行基于语音的自发性自杀风险检测 |
large language model |
|
|
| 8 |
GAF-Guard: An Agentic Framework for Risk Management and Governance in Large Language Models |
GAF-Guard:面向大语言模型风险管理与治理的Agent框架 |
large language model |
✅ |
|
| 9 |
MassTool: A Multi-Task Search-Based Tool Retrieval Framework for Large Language Models |
MassTool:一种面向大语言模型的多任务搜索式工具检索框架 |
large language model |
✅ |
|
| 10 |
`For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts |
针对自杀和自残场景,提出多步Prompt对抗攻击方法,成功破解LLM安全防护。 |
large language model |
|
|
| 11 |
Mathematics Isn't Culture-Free: Probing Cultural Gaps via Entity and Scenario Perturbations |
通过实体和场景扰动探测文化差异对数学问题求解的影响 |
large language model |
|
|
| 12 |
Stylometry recognizes human and LLM-generated texts in short samples |
文体学可有效区分人类与LLM生成的短文本,解决模型归属与AI伦理问题 |
large language model |
|
|
| 13 |
A Comparative Study of Competency Question Elicitation Methods from Ontology Requirements |
对比本体需求中能力问题获取方法,揭示LLM生成CQ的优劣势。 |
large language model |
|
|
| 14 |
Many LLMs Are More Utilitarian Than One |
研究表明,多智能体LLM系统在道德判断上比单智能体更倾向功利主义。 |
large language model |
✅ |
|
| 15 |
LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing |
LitBench:用于可靠评估创意写作的基准和数据集 |
large language model |
✅ |
|
| 16 |
Transferable Modeling Strategies for Low-Resource LLM Tasks: A Prompt and Alignment-Based Approach |
提出一种基于Prompt和对齐的迁移学习策略,用于解决低资源LLM任务。 |
large language model |
|
|