| 1 |
Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning |
提出符号基础方法以解决抽象视觉推理瓶颈问题 |
large language model multimodal visual grounding |
|
|
| 2 |
Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems |
提出DiffMAS框架,端到端优化多智能体语言系统的隐式通信。 |
large language model |
|
|
| 3 |
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion |
GS-Quant:用于知识图谱补全的粒度语义和生成结构量化方法 |
large language model |
✅ |
|
| 4 |
How English Print Media Frames Human-Elephant Conflicts in India |
利用Web规模文本分析揭示印度英语媒体中人象冲突的负面框架 |
large language model |
|
|
| 5 |
Ideological Bias in LLMs' Economic Causal Reasoning |
揭示LLM在经济因果推理中存在的意识形态偏见,尤其是在干预导向与市场导向观点对立时。 |
large language model |
|
|
| 6 |
Spatial Metaphors for LLM Memory: A Critical Analysis of the MemPalace Architecture |
MemPalace:一种基于空间隐喻的LLM记忆系统,虽有夸大但具架构洞见 |
large language model |
|
|
| 7 |
Can MLLMs "Read" What is Missing? |
提出MMTR-Bench基准,评估多模态大语言模型从视觉上下文重建文本的能力 |
large language model multimodal instruction following |
✅ |
|
| 8 |
ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures |
提出ReCAPA框架,通过分层预测校正缓解视觉-语言-动作系统中级联失效问题 |
vision-language-action VLA large language model |
|
|
| 9 |
Foundation models for discovering robust biomarkers of neurological disorders from dynamic functional connectivity |
提出RE-CONFIRM框架与Hub-LoRA微调方法,提升脑疾病生物标志物识别的鲁棒性。 |
foundation model |
✅ |
|
| 10 |
Transient Turn Injection: Exposing Stateless Multi-Turn Vulnerabilities in Large Language Models |
提出瞬态轮次注入攻击,揭示大语言模型中无状态多轮对话漏洞 |
large language model |
|
|
| 11 |
Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation |
提出基于LLM数据增强和类别感知MoE的在线招聘方法,提升人岗匹配效果。 |
large language model chain-of-thought |
|
|
| 12 |
CoFEE: Reasoning Control for LLM-Based Feature Discovery |
CoFEE:基于LLM的特征发现推理控制框架,提升特征质量与效率 |
large language model |
|
|
| 13 |
Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework |
提出ESRRSim框架,评估大型语言模型中涌现的战略推理风险 |
large language model |
|
|
| 14 |
Ethics Testing: Proactive Identification of Generative AI System Harms |
提出伦理测试,主动识别生成式AI系统中的潜在危害 |
large language model |
|
|
| 15 |
Memanto: Typed Semantic Memory with Information-Theoretic Retrieval for Long-Horizon Agents |
Memanto:面向长时程Agent的类型化语义记忆与信息论检索 |
large language model |
|
|
| 16 |
Call-Chain-Aware LLM-Based Test Generation for Java Projects |
CAT:一种调用链感知的LLM测试生成方法,提升Java项目测试覆盖率。 |
large language model |
|
|
| 17 |
Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows |
提出Tool Attention机制,通过动态工具门控和延迟模式加载,消除可扩展Agent工作流中的MCP/Tools Tax。 |
large language model |
✅ |
|
| 18 |
Thinking with Reasoning Skills: Fewer Tokens, More Accuracy |
提出可重用推理技能以提高推理准确性和效率 |
chain-of-thought |
|
|
| 19 |
DryRUN: On the Role of Public Tests in LLM-Driven Code Generation |
DryRUN:无需公共测试用例,LLM自主生成代码并纠错 |
large language model |
|
|
| 20 |
Unbiased Prevalence Estimation with Multicalibrated LLMs |
提出基于多重校准的大语言模型,解决类别流行度估计中的偏差问题 |
large language model |
|
|
| 21 |
Efficient Agent Evaluation via Diversity-Guided User Simulation |
提出DIVERT,通过多样性引导的用户模拟高效评估LLM客服Agent |
large language model |
|
|
| 22 |
SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference |
SparKV:面向设备端LLM推理的开销感知KV缓存加载框架 |
large language model |
|
|
| 23 |
SQLyzr: A Comprehensive Benchmark and Evaluation Platform for Text-to-SQL |
SQLyzr:一个全面的Text-to-SQL基准测试与评估平台 |
large language model |
✅ |
|
| 24 |
Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models |
DAVinCI框架通过双重归因与验证提升语言模型生成声明的事实可靠性。 |
large language model |
|
|
| 25 |
Doubly Saturated Ramsey Graphs: A Case Study in Computer-Assisted Mathematical Discovery |
结合SAT求解器与LLM生成代码,发现双重饱和Ramsey图的无限族 |
large language model |
|
|