| 1 |
Med-RewardBench: Benchmarking Reward Models and Judges for Medical Multimodal Large Language Models |
提出Med-RewardBench以解决医疗多模态大语言模型评估问题 |
large language model multimodal |
|
|
| 2 |
Challenges and Applications of Large Language Models: A Comparison of GPT and DeepSeek family of models |
比较GPT与DeepSeek模型以应对大型语言模型的挑战 |
large language model |
|
|
| 3 |
Evaluating Large Language Models for Financial Reasoning: A CFA-Based Benchmark Study |
提出基于CFA的基准研究以评估大型语言模型在金融推理中的表现 |
large language model |
|
|
| 4 |
Exploring Reasoning-Infused Text Embedding with Large Language Models for Zero-Shot Dense Retrieval |
提出RITE以解决复杂查询的文档检索问题 |
large language model |
|
|
| 5 |
Beyond the Surface: Probing the Ideological Depth of Large Language Models |
提出意识深度概念以分析大型语言模型的政治倾向 |
large language model |
|
|
| 6 |
Is this chart lying to me? Automating the detection of misleading visualizations |
提出Misviz基准以自动检测误导性可视化 |
large language model multimodal |
|
|
| 7 |
Morae: Proactively Pausing UI Agents for User Choices |
提出Morae以解决盲人及低视力用户的UI选择问题 |
multimodal |
|
|
| 8 |
Going over Fine Web with a Fine-Tooth Comb: Technical Report of Indexing Fine Web for Problematic Content Search and Retrieval |
提出基于ElasticSearch的框架以提升LLM训练数据索引与分析 |
large language model |
|
|
| 9 |
PiCSAR: Probabilistic Confidence Selection And Ranking for Reasoning Chains |
提出PiCSAR以解决推理链评分问题 |
large language model |
|
|
| 10 |
Reasoning-Intensive Regression |
提出MENTAT以解决推理密集型回归问题 |
large language model |
|
|
| 11 |
QZhou-Embedding Technical Report |
提出QZhou-Embedding以提升文本嵌入模型的表示能力 |
foundation model |
|
|
| 12 |
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning |
提出Middo框架以解决LLM训练数据优化问题 |
large language model |
✅ |
|
| 13 |
A Survey on Current Trends and Recent Advances in Text Anonymization |
综述文本匿名化技术以应对隐私保护挑战 |
large language model |
|
|
| 14 |
COCORELI: Cooperative, Compositional Reconstitution \& Execution of Language Instructions |
提出COCORELI以解决复杂指令执行中的局限性问题 |
large language model |
|
|
| 15 |
Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing |
基于多模态视觉的发票处理策略比较研究 |
large language model |
|
|
| 16 |
Normality and the Turing Test |
通过正常性概念重新审视图灵测试 |
large language model |
|
|
| 17 |
Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance |
提出核心参数隔离微调框架以提升大语言模型性能 |
large language model |
|
|
| 18 |
Discovering Semantic Subdimensions through Disentangled Conceptual Representations |
提出解耦概念表示模型以发现语义子维度 |
large language model |
|
|
| 19 |
Automatic Reviewers Fail to Detect Faulty Reasoning in Research Papers: A New Counterfactual Evaluation Framework |
提出自动化反事实评估框架以检测研究论文中的逻辑缺陷 |
large language model |
|
|