| 1 |
$R^2$-dLLM: Accelerating Diffusion Large Language Models via Spatio-Temporal Redundancy Reduction |
提出$R^2$-dLLM以解决扩散大语言模型解码冗余问题 |
large language model |
|
|
| 2 |
Cross-Model Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Across Three Large Language Models |
评估大型语言模型在重复生成运动处方时的跨模型一致性 |
large language model |
|
|
| 3 |
Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI |
研究表明大型语言模型影响同行评审意见,导致评审关注点从深度评估转向表面清晰度。 |
large language model |
|
|
| 4 |
IndiaFinBench: An Evaluation Benchmark for Large Language Model Performance on Indian Financial Regulatory Text |
IndiaFinBench:首个面向印度金融监管文本的大语言模型评测基准 |
large language model |
✅ |
|
| 5 |
Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest |
全面评估大型语言模型在社交媒体分析三大任务中的能力,并建立可复现的基准。 |
large language model |
|
|
| 6 |
Are Large Language Models Economically Viable for Industry Deployment? |
提出EDGE-EVAL,弥合LLM工业部署评估中经济性与效率差距 |
large language model |
|
|
| 7 |
Do Emotions Influence Moral Judgment in Large Language Models? |
研究表明:情感会影响大语言模型的道德判断,且模型能力越强影响越小 |
large language model |
|
|
| 8 |
AlignCultura: Towards Culturally Aligned Large Language Models? |
提出AlignCultura,旨在提升大语言模型在文化维度上的对齐能力 |
large language model |
|
|
| 9 |
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment |
提出ReTAS,通过辩证对齐解决多智能体系统中行动者-观察者不对称性问题 |
large language model chain-of-thought |
|
|
| 10 |
Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language |
提出Chat2Workflow基准,用于评估大语言模型生成可执行可视化工作流的能力。 |
large language model |
✅ |
|
| 11 |
LePREC: Reasoning as Classification over Structured Factors for Assessing Relevance of Legal Issues |
提出LePREC框架,通过结构化推理提升LLM在法律问题相关性评估中的精度。 |
large language model |
|
|
| 12 |
Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms |
探索Agent范式下小型语言模型的部署权衡,提升资源受限场景性能 |
large language model |
|
|
| 13 |
Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs |
LocQA揭示多语言LLM中隐含的地域和全局偏见 |
large language model |
|
|
| 14 |
CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks |
CulturALL:提出一个多语言文化常识基准,用于评估LLM在真实场景中的能力。 |
large language model |
|
|
| 15 |
ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning |
提出ShadowPEFT,通过深度共享的影子网络实现参数高效的LLM微调。 |
large language model |
|
|
| 16 |
SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning |
SAMoRA:提出语义感知的LoRA专家混合模型,用于任务自适应学习 |
large language model |
✅ |
|
| 17 |
Discovering a Shared Logical Subspace: Steering LLM Logical Reasoning via Alignment of Natural-Language and Symbolic Views |
提出一种基于对齐自然语言和符号视图的LLM逻辑推理方法 |
large language model |
|
|
| 18 |
Epistemic orientation in parliamentary discourse is associated with deliberative democracy |
提出基于LLM的EMI指标,评估议会言论的认知倾向与审议民主的关联性 |
large language model |
|
|
| 19 |
Bangla Key2Text: Text Generation from Keywords for a Low Resource Language |
Bangla Key2Text:为低资源语言孟加拉语构建关键词到文本生成的大规模数据集。 |
large language model |
|
|
| 20 |
Beyond Rating: A Comprehensive Evaluation and Benchmark for AI Reviews |
提出Beyond Rating框架,从文本论证角度全面评估AI评审质量,解决现有benchmark仅关注评分预测的局限性。 |
large language model |
|
|
| 21 |
What Makes an LLM a Good Optimizer? A Trajectory Analysis of LLM-Guided Evolutionary Search |
通过轨迹分析揭示LLM作为优化器的内在机制,助力进化搜索系统设计 |
large language model |
|
|
| 22 |
Lost in Translation: Do LVLM Judges Generalize Across Languages? |
提出MM-JudgeBench,评估LVLM评判模型在多语言环境下的泛化能力。 |
multimodal |
|
|
| 23 |
DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing |
DASH-KV:通过非对称KV缓存哈希加速长文本LLM推理 |
large language model |
✅ |
|
| 24 |
Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation |
提出基于计算论证的框架,评估LLM生成的议会辩论摘要在论证内容上的忠实性。 |
large language model |
|
|
| 25 |
HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing |
HarDBench:提出基于草稿的协同写作越狱攻击评测基准,保障人机协作安全 |
large language model |
✅ |
|
| 26 |
Headlines You Won't Forget: Can Pronoun Insertion Increase Memorability? |
研究人称代词插入对新闻标题记忆效果的影响,并评估LLM自动插入的质量。 |
large language model |
|
|
| 27 |
Mind the Unseen Mass: Unmasking LLM Hallucinations via Soft-Hybrid Alphabet Estimation |
提出SHADE,通过软混合字母表估计解决LLM在小样本下的幻觉问题。 |
large language model |
|
|
| 28 |
Construction of Knowledge Graph based on Language Model |
提出基于轻量级语言模型的超关系知识图谱构建框架 |
large language model |
|
|
| 29 |
Detoxification for LLM: From Dataset Itself |
提出HSPD框架,通过数据清洗从源头减少LLM的毒性,提升模型安全性。 |
large language model |
✅ |
|
| 30 |
HoWToBench: Holistic Evaluation for LLM's Capability in Human-level Writing using Tree of Writing |
提出HoWToBench基准与Tree-of-Writing评估方法,解决LLM写作能力评估难题。 |
large language model |
|
|