| 1 |
Do LLMs Understand Romanian Driving Laws? A Study on Multimodal and Fine-Tuned Question Answering |
评估LLM在罗马尼亚驾驶法规问答任务中的表现,并探索领域微调和多模态输入的影响。 |
large language model multimodal |
|
|
| 2 |
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning |
提出逻辑基础评估框架以解决多模态推理瓶颈问题 |
large language model multimodal |
|
|
| 3 |
A Cross-Lingual Analysis of Bias in Large Language Models Using Romanian History |
通过罗马尼亚历史案例,跨语言分析大型语言模型中的偏见 |
large language model |
|
|
| 4 |
Spiral of Silence in Large Language Model Agents |
提出LLM Agent螺旋沉默评估框架,揭示历史和人设信号对群体意见的影响 |
large language model |
|
|
| 5 |
Assessing Large Language Models in Updating Their Forecasts with New Information |
EVOLVECAST框架评估LLM在接收新信息后预测更新能力,揭示其保守偏差 |
large language model |
|
|
| 6 |
Jackal: A Real-World Execution-Based Benchmark Evaluating Large Language Models on Text-to-JQL Tasks |
提出Jackal:一个基于真实执行的文本到JQL的大语言模型评测基准 |
large language model |
|
|
| 7 |
GEAR: A General Evaluation Framework for Abductive Reasoning |
提出GEAR:通用且无标签的归纳推理评估框架,并用于提升LLM的推理能力。 |
large language model instruction following |
|
|
| 8 |
Large-Scale Constraint Generation -- Can LLMs Parse Hundreds of Constraints? |
提出大规模约束生成问题LSCG,并设计FoCusNet提升LLM在复杂约束下的解析能力。 |
large language model chain-of-thought |
|
|
| 9 |
Vision-Grounded Machine Interpreting: Improving the Translation Process through Visual Cues |
提出视觉 grounding 的机器同声传译方法,利用视觉信息提升翻译质量 |
multimodal visual grounding |
|
|
| 10 |
EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos |
提出EduVidQA数据集,利用多模态大语言模型解决讲座视频问答难题 |
large language model multimodal |
|
|
| 11 |
Open-DeBias: Toward Mitigating Open-Set Bias in Language Models |
提出Open-DeBias,缓解语言模型中开放域偏见问题 |
large language model zero-shot transfer |
|
|
| 12 |
VIVA+: Human-Centered Situational Decision-Making |
VIVA+:面向以人为本场景的认知驱动多模态决策基准 |
large language model multimodal |
|
|
| 13 |
The Role of Logic and Automata in Understanding Transformers |
综述Transformer能力:逻辑、自动机与电路复杂性的视角 |
large language model |
|
|
| 14 |
From Personal to Collective: On the Role of Local and Global Memory in LLM Personalization |
提出LoGo框架,融合局部与全局记忆,提升LLM个性化效果 |
large language model |
|
|
| 15 |
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding |
DiffuSpec:利用扩散语言模型解锁推测解码,显著提升LLM推理速度。 |
large language model |
|
|
| 16 |
Aligning LLMs for Multilingual Consistency in Enterprise Applications |
提出批量对齐微调策略,解决企业应用中LLM多语言一致性问题 |
large language model |
|
|
| 17 |
Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Distributional Semantics |
提出基于语用推理的道德推理获取方法,提升LLM的道德推理泛化能力 |
large language model |
|
|
| 18 |
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning |
提出Q-Tuning,联合优化样本和Token剪枝,提升大模型SFT效率。 |
large language model |
|
|
| 19 |
Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions |
提出KAPPA以解决大语言模型在多项选择题上的知识预测差距问题 |
large language model |
|
|
| 20 |
Evaluating Program Semantics Reasoning with Type Inference in System F |
提出TF-Bench基准测试,评估LLM在System F类型推断中的程序语义推理能力 |
large language model |
|
|
| 21 |
LLM Hallucination Detection: HSAD |
提出HSAD,通过频域分析LLM隐藏层信号以检测幻觉 |
large language model |
|
|