| 1 |
Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training |
Qalb:面向2.3亿使用者的先进乌尔都语大语言模型,通过系统性持续预训练实现 |
large language model foundation model |
|
|
| 2 |
Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques |
通过高级Prompt工程提升大语言模型在情感分类和反讽检测中的性能 |
large language model chain-of-thought |
|
|
| 3 |
From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding |
提出FRTR框架,通过检索增强多模态方法提升电子表格理解能力 |
large language model multimodal |
|
|
| 4 |
Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models |
提出生成增强生成(GAG)框架,用于大语言模型中私有知识注入,提升领域性能。 |
large language model multimodal |
|
|
| 5 |
BenchOverflow: Measuring Overflow in Large Language Models via Plain-Text Prompts |
BenchOverflow:通过纯文本提示测量大型语言模型中的过度输出问题 |
large language model |
|
|
| 6 |
PATS: Personality-Aware Teaching Strategies with Large Language Model Tutors |
PATS:基于大语言模型和性格感知的教学策略,提升个性化辅导效果 |
large language model |
|
|
| 7 |
Nationality and Region Prediction from Names: A Comparative Study of Neural Models and Large Language Models |
比较神经模型与大语言模型在姓名预测国籍和区域任务上的性能差异 |
large language model |
|
|
| 8 |
Analyzing Bias in False Refusal Behavior of Large Language Models for Hate Speech Detoxification |
分析大型语言模型在仇恨言论解毒中错误拒绝行为的偏见 |
large language model |
|
|
| 9 |
A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding |
提出XMPIE:一个用于多模态成语理解的并行跨语言基准数据集 |
multimodal |
|
|
| 10 |
It's All About the Confidence: An Unsupervised Approach for Multilingual Historical Entity Linking using Large Language Models |
提出MHEL-LLaMo,一种基于置信度的无监督多语言历史实体链接方法。 |
large language model |
|
|
| 11 |
sui-1: Grounded and Verifiable Long-Form Summarization |
提出sui-1模型,通过可溯源引用的长文本摘要解决现有大语言模型摘要不忠实问题。 |
large language model chain-of-thought |
|
|
| 12 |
Prompt-Based Clarity Evaluation and Topic Detection in Political Question Answering |
基于Prompt设计的政治问答清晰度评估与主题检测方法研究 |
large language model chain-of-thought |
|
|
| 13 |
STAR: Detecting Inference-time Backdoors in LLM Reasoning via State-Transition Amplification Ratio |
STAR:通过状态转移放大率检测LLM推理时后门攻击 |
chain-of-thought |
|
|
| 14 |
Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs |
提出T-SPIN,通过三元组损失和熵约束,稳定高效地进行LLM的自博弈微调。 |
large language model |
|
|
| 15 |
Modeling LLM Agent Reviewer Dynamics in Elo-Ranked Review System |
提出基于Elo评分的LLM审稿人动态模型以提升审稿决策准确性 |
large language model |
✅ |
|
| 16 |
Moral Lenses, Political Coordinates: Towards Ideological Positioning of Morally Conditioned LLMs |
通过道德价值观引导,研究道德条件LLM的意识形态定位 |
large language model |
|
|
| 17 |
Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis |
提出LogiSafetyGen框架与LogiSafetyBench基准,评估LLM工具调用中的隐式合规性。 |
large language model |
|
|
| 18 |
Inferring Latent Intentions: Attributional Natural Language Inference in LLM Agents |
提出Att-NLI框架,提升LLM在多智能体环境中基于意图的推理能力 |
large language model |
|
|
| 19 |
RAGShaper: Eliciting Sophisticated Agentic RAG Skills via Automated Data Synthesis |
RAGShaper:通过自动数据合成提升Agentic RAG的复杂技能 |
large language model |
|
|
| 20 |
How Order-Sensitive Are LLMs? OrderProbe for Deterministic Structural Reconstruction |
提出OrderProbe基准,用于评估LLM在中文、日文、韩文四字表达式上的结构重构能力。 |
large language model |
|
|
| 21 |
CLaS-Bench: A Cross-Lingual Alignment and Steering Benchmark |
CLaS-Bench:提出跨语言对齐与操控基准,评估LLM多语言操控能力。 |
large language model |
|
|
| 22 |
AgriAgent: Contract-Driven Planning and Capability-Aware Tool Orchestration in Real-World Agriculture |
AgriAgent:面向农业场景的合同驱动规划与能力感知工具编排 |
multimodal |
|
|
| 23 |
WISE-Flow: Workflow-Induced Structured Experience for Self-Evolving Conversational Service Agents |
WISE-Flow:通过工作流诱导结构化经验,实现对话服务Agent的自我进化 |
large language model |
|
|