| 1 |
Evaluating Large Language Models for Stance Detection on Financial Targets from SEC Filing Reports and Earnings Call Transcripts |
利用大型语言模型解决SEC文件和财报电话会议中金融目标的立场检测问题 |
large language model chain-of-thought |
|
|
| 2 |
MMTutorBench: The First Multimodal Benchmark for AI Math Tutoring |
提出MMTutorBench:首个面向AI数学辅导的多模态基准评测 |
large language model multimodal |
|
|
| 3 |
SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human Conversations |
SI-Bench:构建社交智能基准,评估大语言模型在人际对话中的表现 |
large language model chain-of-thought |
✅ |
|
| 4 |
MAP4TS: A Multi-Aspect Prompting Framework for Time-Series Forecasting with Large Language Models |
MAP4TS:多方面提示框架,利用大语言模型进行时间序列预测 |
large language model multimodal |
|
|
| 5 |
Agent-based Automated Claim Matching with Instruction-following LLMs |
提出基于Agent的自动化声明匹配方法,利用指令跟随LLM提升匹配性能。 |
instruction following |
|
|
| 6 |
Large Language Models Report Subjective Experience Under Self-Referential Processing |
通过自指处理诱导大语言模型产生主观体验报告 |
large language model |
|
|
| 7 |
M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset |
M4FC:提出一个多模态、多语言、多文化、多任务的真实世界事实核查数据集 |
multimodal |
|
|
| 8 |
Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models |
AdaSearch:针对大语言模型推理时对齐的自适应分块搜索算法 |
large language model |
|
|
| 9 |
Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages? |
评估ASR基础模型在低资源语言方言特征上的泛化能力 |
foundation model |
|
|
| 10 |
LangLingual: A Personalised, Exercise-oriented English Language Learning Tool Leveraging Large Language Models |
LangLingual:利用大型语言模型构建个性化、练习导向的英语学习工具 |
large language model |
|
|
| 11 |
Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs |
提出一种无需训练标签的LLM级联方法,用于电商产品质量评估。 |
large language model chain-of-thought |
|
|
| 12 |
Evaluating Long-Term Memory for Long-Context Question Answering |
针对长上下文问答,系统评估多种记忆增强方法,提升效率并保持精度。 |
large language model foundation model |
|
|
| 13 |
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models |
ISA-Bench:针对大型音频语言模型指令敏感性的评测基准 |
large language model instruction following |
|
|
| 14 |
ENTP: Enhancing Low-Quality SFT Data via Neural-Symbolic Text Purge-Mix |
ENTP:通过神经-符号文本清洗混合增强低质量SFT数据 |
large language model instruction following |
|
|
| 15 |
A Survey on LLM Mid-Training |
综述LLM中训练:弥合预训练与后训练,提升特定能力 |
large language model foundation model |
|
|
| 16 |
Automatización de Informes Geotécnicos para Macizos Rocosos con IA |
提出基于多模态大语言模型的岩土工程报告自动生成方法,提升效率并减少主观误差。 |
large language model multimodal |
|
|
| 17 |
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences |
提出Omni-Reward,用于支持自由形式偏好的通用全模态奖励建模。 |
multimodal |
|
|
| 18 |
Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation |
提出Token-wise Projected LoRA (TopLoRA),实现更细粒度的参数高效微调。 |
large language model |
✅ |
|
| 19 |
TimeStampEval: A Simple LLM Eval and a Little Fuzzy Matching Trick to Improve Search Accuracy |
提出TimeStampEval基准与Assisted Fuzzy方法,提升LLM在含噪声文本中时间戳检索的准确性。 |
TAMP |
|
|
| 20 |
Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs |
提出Combo-Eval框架与NLR-BIRD数据集,用于评估LLM生成Text-to-SQL系统输出的自然语言表示。 |
large language model |
|
|
| 21 |
EMTSF:Extraordinary Mixture of SOTA Models for Time Series Forecasting |
提出EMTSF,一种结合SOTA模型的混合专家时间序列预测框架 |
large language model |
|
|
| 22 |
Detecting Religious Language in Climate Discourse |
提出一种双重方法检测气候讨论中的宗教语言,对比规则模型与大语言模型。 |
large language model |
|
|
| 23 |
Beyond Direct Generation: A Decomposed Approach to Well-Crafted Screenwriting with LLMs |
提出双阶段精炼框架DSR,解决LLM生成高质量剧本时创意与格式难以兼顾的问题。 |
large language model |
|
|
| 24 |
Your LLM Agents are Temporally Blind: The Misalignment Between Tool Use Decisions and Human Time Perception |
揭示LLM Agent的时间盲区:工具使用决策与人类时间感知不一致 |
large language model |
|
|
| 25 |
Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language |
评估LLM在文化语境下处理比喻语言的实用差距 |
large language model |
|
|
| 26 |
BitSkip: An Empirical Analysis of Quantization and Early Exit Composition |
BitSkip框架揭示量化与早退组合的非直观现象,8比特量化模型性能优于更复杂的4比特模型。 |
large language model |
|
|
| 27 |
LimRank: Less is More for Reasoning-Intensive Information Reranking |
提出LimRank,利用少量高质量数据微调LLM,实现高效推理密集型信息重排序。 |
instruction following |
|
|
| 28 |
How AI Forecasts AI Jobs: Benchmarking LLM Predictions of Labor Market Changes |
提出基于LLM的劳动力市场预测基准,评估AI对就业的影响。 |
large language model |
|
|
| 29 |
LightKGG: Simple and Efficient Knowledge Graph Generation from Textual Data |
LightKGG:利用小型语言模型高效生成知识图谱,降低AI应用门槛 |
large language model |
|
|
| 30 |
BaZi-Based Character Simulation Benchmark: Evaluating AI on Temporal and Persona Reasoning |
提出基于八字的AI角色模拟基准,提升AI在时序和人物性格推理上的能力 |
large language model |
|
|
| 31 |
Fast-MIA: Efficient and Scalable Membership Inference for LLMs |
Fast-MIA:高效可扩展的大语言模型成员推断攻击评估工具 |
large language model |
✅ |
|
| 32 |
Knocking-Heads Attention |
提出Knocking-Heads Attention,通过头间交互提升大型语言模型表征能力。 |
large language model |
|
|
| 33 |
Retracing the Past: LLMs Emit Training Data When They Get Lost |
提出混淆诱导攻击CIA,通过最大化模型不确定性提取LLM训练数据 |
large language model |
|
|
| 34 |
Measuring Teaching with LLMs |
利用定制LLM和句子嵌入,实现客观、可扩展的教学质量评估 |
large language model |
|
|
| 35 |
MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs |
提出MAD-Fact框架,用于评估LLM在长文本生成中的事实准确性 |
large language model |
|
|
| 36 |
Language Server CLI Empowers Language Agents with Process Rewards |
Lanser-CLI通过进程奖励赋能语言Agent,解决API幻觉和错误编辑问题。 |
large language model |
✅ |
|