| 1 |
Agent-RewardBench: Towards a Unified Benchmark for Reward Modeling across Perception, Planning, and Safety in Real-World Multimodal Agents |
提出Agent-RewardBench以解决多模态智能体奖励建模问题 |
large language model multimodal |
|
|
| 2 |
Towards Transparent AI: A Survey on Explainable Large Language Models |
综述可解释大型语言模型的研究进展与挑战 |
large language model |
|
|
| 3 |
Evaluating List Construction and Temporal Understanding capabilities of Large Language Models |
提出TLQA基准以解决大语言模型的时间理解与列表构建问题 |
large language model |
✅ |
|
| 4 |
Potemkin Understanding in Large Language Models |
提出形式框架以评估大型语言模型的理解能力 |
large language model |
|
|
| 5 |
Weak-to-Strong GraphRAG: Aligning Weak Retrievers with Large Language Models for Graph-based Retrieval Augmented Generation |
提出Refined Graph-based RAG以解决弱检索器与大语言模型对齐问题 |
large language model |
|
|
| 6 |
Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis |
应用综合信息理论分析大语言模型的意识表现 |
large language model |
|
|
| 7 |
Large Language Models Acing Chartered Accountancy |
提出CA-Ben基准以评估大型语言模型在会计领域的能力 |
large language model |
|
|
| 8 |
Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models |
提出ProCI框架以解决隐性混淆问题 |
large language model |
|
|
| 9 |
Structuralist Approach to AI Literary Criticism: Leveraging Greimas Semiotic Square for Large Language Models |
提出GLASS框架以提升LLM的文学批评能力 |
large language model |
|
|
| 10 |
Enhancing Automatic Term Extraction with Large Language Models via Syntactic Retrieval |
提出基于句法检索的提示策略以增强自动术语提取 |
large language model |
|
|
| 11 |
Theory of Mind in Action: The Instruction Inference Task |
提出Instruction Inference任务以评估代理的心智理论能力 |
large language model chain-of-thought |
|
|
| 12 |
Leveraging LLM-Assisted Query Understanding for Live Retrieval-Augmented Generation |
提出Omni-RAG以解决复杂用户查询处理问题 |
large language model chain-of-thought |
|
|
| 13 |
Detecting Referring Expressions in Visually Grounded Dialogue with Autoregressive Language Models |
提出文本自回归模型以解决视觉对话中的指称表达检测问题 |
large language model multimodal |
|
|
| 14 |
Small Encoders Can Rival Large Decoders in Detecting Groundedness |
提出轻量级编码器以解决大型解码器在基础性检测中的不足 |
large language model |
✅ |
|
| 15 |
Exploring the Structure of AI-Induced Language Change in Scientific English |
探讨AI引发的科学英语语言结构变化 |
large language model |
|
|
| 16 |
Domain Knowledge-Enhanced LLMs for Fraud and Concept Drift Detection |
提出领域知识增强的LLM框架以解决欺诈和概念漂移检测问题 |
large language model |
|
|
| 17 |
Exploring the change in scientific readability following the release of ChatGPT |
分析ChatGPT发布后科学论文可读性变化 |
large language model |
|
|
| 18 |
(Fact) Check Your Bias |
研究语言模型偏见对事实核查结果的影响 |
large language model |
✅ |
|
| 19 |
Text2Cypher Across Languages: Evaluating and Finetuning LLMs |
提出多语言Text2Cypher评估与微调方法以提升数据库查询生成 |
large language model |
|
|
| 20 |
Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning |
提出Double-Checker以增强慢思维LLMs的推理能力 |
large language model |
✅ |
|
| 21 |
Cat and Mouse -- Can Fake Text Generation Outpace Detector Systems? |
探讨假文本生成与检测系统的博弈关系 |
large language model |
|
|
| 22 |
Prompt-Guided Turn-Taking Prediction |
提出基于文本提示的动态轮流预测模型以改善对话系统 |
large language model |
|
|
| 23 |
MT2-CSD: A New Dataset and Multi-Semantic Knowledge Fusion Method for Conversational Stance Detection |
提出MT2-CSD数据集与LLM-CRAN方法以解决对话立场检测问题 |
large language model |
|
|
| 24 |
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language |
提出FineWeb2以解决多语言预训练数据处理问题 |
large language model |
|
|