| 1 |
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning |
提出RBF++以量化和优化链式思维推理的边界问题 |
large language model multimodal chain-of-thought |
✅ |
|
| 2 |
KIT's Offline Speech Translation and Instruction Following Submission for IWSLT 2025 |
提出基于大语言模型的离线语音翻译与指令跟随方法 |
large language model instruction following |
|
|
| 3 |
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models |
提出FlightGPT以解决无人机视觉语言导航中的多模态融合与可解释性问题 |
VLN multimodal chain-of-thought |
|
|
| 4 |
Are Large Language Models Good at Detecting Propaganda? |
评估大型语言模型在宣传检测中的有效性 |
large language model |
|
|
| 5 |
Krikri: Advancing Open Large Language Models for Greek |
提出Llama-Krikri-8B以提升希腊语大语言模型性能 |
large language model |
|
|
| 6 |
Simulation Agent: A Framework for Integrating Simulation and Large Language Models for Enhanced Decision-Making |
提出模拟代理框架以解决复杂决策问题 |
large language model |
|
|
| 7 |
From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery |
提出三层次分类法以提升大语言模型在科学发现中的自主性 |
large language model |
✅ |
|
| 8 |
SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science |
提出SeedBench以解决种子科学领域的评估问题 |
large language model |
|
|
| 9 |
ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models |
提出ToolSpectrum以解决工具选择个性化问题 |
large language model |
✅ |
|
| 10 |
Role-Playing Evaluation for Large Language Models |
提出角色扮演评估基准以解决大语言模型评估难题 |
large language model |
✅ |
|
| 11 |
The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation |
研究语言多样性对大语言模型翻译微调的影响 |
large language model |
|
|
| 12 |
Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset |
提出多模态语音特征评估青少年自杀风险的方法 |
multimodal |
|
|
| 13 |
An Empirical Study of Many-to-Many Summarization with Large Language Models |
提出多对多摘要生成方法以提升多语言处理能力 |
large language model |
|
|
| 14 |
I'll believe it when I see it: Images increase misinformation sharing in Vision-Language Models |
提出视觉内容影响VLM信息分享倾向的研究 |
large language model multimodal |
✅ |
|
| 15 |
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information |
提出SAKURA基准以评估大音频语言模型的多跳推理能力 |
large language model multimodal |
|
|
| 16 |
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning |
提出TREA数据集以评估音频语言模型的时间推理能力 |
large language model multimodal |
|
|
| 17 |
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix |
提出MMAR基准以评估音频语言模型的深度推理能力 |
large language model chain-of-thought |
|
|
| 18 |
SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs |
提出SQLForge以增强LLM的文本到SQL推理能力 |
large language model |
|
|
| 19 |
Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading |
评估GPT在盲评大学算法课程中的表现 |
large language model |
|
|
| 20 |
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents |
提出引导搜索策略以解决非可序列化环境中的软件工程问题 |
large language model |
|
|
| 21 |
What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts |
提出需求感知的提示优化机制以解决LLM提示不足问题 |
instruction following |
|
|
| 22 |
Sense and Sensitivity: Examining the Influence of Semantic Recall on Long Context Code Reasoning |
提出SemTrace以解决长上下文代码推理中的语义回忆问题 |
large language model |
|
|
| 23 |
GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection |
提出GUARD框架以解决大语言模型的选择性遗忘问题 |
large language model |
|
|
| 24 |
Rank, Chunk and Expand: Lineage-Oriented Reasoning for Taxonomy Expansion |
提出LORex框架以解决税onomies扩展中的噪声与上下文限制问题 |
PaLM-E |
|
|
| 25 |
What's in a prompt? Language models encode literary style in prompt embeddings |
提出通过提示嵌入分析文学风格的语言模型方法 |
large language model |
|
|
| 26 |
RAR: Setting Knowledge Tripwires for Retrieval Augmented Rejection |
提出RAR方法以解决大语言模型内容审核问题 |
large language model |
|
|
| 27 |
HeteroSpec: Leveraging Contextual Heterogeneity for Efficient Speculative Decoding |
提出HeteroSpec以解决自回归解码效率低下问题 |
large language model |
|
|
| 28 |
Are LLMs Better Formalizers than Solvers on Complex Problems? |
评估LLM在复杂问题中的形式化能力与求解能力 |
large language model |
|
|
| 29 |
Positional Fragility in LLMs: How Offset Effects Reshape Our Understanding of Memorization Risks |
提出位置脆弱性理论以评估大语言模型的记忆风险 |
large language model |
|
|
| 30 |
What if Deception Cannot be Detected? A Cross-Linguistic Study on the Limits of Deception Detection from Text |
提出基于信念的欺骗框架以重新审视文本欺骗检测 |
large language model |
|
|
| 31 |
Language-Specific Latent Process Hinders Cross-Lingual Performance |
提出语言特定潜在过程以解决跨语言性能问题 |
large language model |
|
|