| 1 |
P-CoT: A Pedagogically-motivated Participatory Chain-of-Thought Prompting for Phonological Reasoning in LLMs |
提出P-CoT提示方法,提升LLM在音韵推理任务上的性能 |
large language model chain-of-thought |
|
|
| 2 |
Argument Quality Annotation and Gender Bias Detection in Financial Communication through Large Language Models |
利用大语言模型评估金融文本论证质量并检测性别偏见 |
large language model |
|
|
| 3 |
Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models |
利用大型语言模型实现金融审计中监管合规的自动化验证 |
large language model |
|
|
| 4 |
Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language |
提出德语性别偏见评估数据集,揭示多语言LLM中的独特挑战 |
large language model |
|
|
| 5 |
Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning |
Agentar-Fin-R1:通过领域知识、高效训练和高级推理增强金融智能 |
large language model foundation model |
✅ |
|
| 6 |
Can External Validation Tools Improve Annotation Quality for LLM-as-a-Judge? |
提出工具增强的AI评估系统,提升LLM在事实性、数学和代码任务上的评估质量。 |
large language model |
✅ |
|
| 7 |
Obscured but Not Erased: Evaluating Nationality Bias in LLMs via Name-Based Bias Benchmarks |
提出基于姓名的偏见评测方法,揭示LLM中隐藏的国籍偏见问题 |
large language model |
|
|
| 8 |
LingBench++: A Linguistically-Informed Benchmark and Reasoning Framework for Multi-Step and Cross-Cultural Inference with LLMs |
LingBench++:一个语言学驱动的LLM多步推理与跨文化推断基准 |
large language model |
|
|
| 9 |
Test-Time-Matching: Decouple Personality, Memory, and Linguistic Style in LLM-based Role-Playing Language Agent |
提出Test-Time-Matching框架,无需训练即可实现LLM角色扮演语言代理的个性化定制。 |
large language model |
|
|
| 10 |
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning |
提出线程推理模型TIM,突破LLM上下文长度限制,实现长程推理 |
large language model |
|
|
| 11 |
How Deep Is Representational Bias in LLMs? The Cases of Caste and Religion |
系统审计GPT-4 Turbo以揭示LLMs中的表现偏见 |
large language model |
✅ |
|
| 12 |
PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization |
提出PICACO以解决大语言模型的多元价值对齐问题 |
large language model |
|
|
| 13 |
The Ever-Evolving Science Exam |
提出EESE:一个动态演进的科学考试基准,用于可靠评估基础模型的科学理解能力。 |
foundation model |
✅ |
|
| 14 |
ICR Probe: Tracking Hidden State Dynamics for Reliable Hallucination Detection in LLMs |
提出ICR Probe,通过追踪LLM隐状态动态变化实现可靠的幻觉检测 |
large language model |
|
|
| 15 |
Towards Enforcing Company Policy Adherence in Agentic Workflows |
提出一种可执行公司策略的Agent工作流框架,解决LLM Agent策略遵循问题 |
large language model |
|
|
| 16 |
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction |
提出AOE基准,评估LLM从复杂文档中抽取结构化表格信息的能力 |
large language model |
|
|
| 17 |
iShumei-Chinchunmei at SemEval-2025 Task 4: A balanced forgetting and retention multi-task framework using effective unlearning loss |
提出有效遗忘损失,平衡LLM的遗忘与保留能力,解决敏感内容擦除问题。 |
large language model |
|
|
| 18 |
Towards Compute-Optimal Many-Shot In-Context Learning |
针对长文本In-Context Learning,提出计算优化的多示例选择策略 |
large language model |
|
|