cs.CV（2024-11-29）

📊 共 35 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (14 🔗7) 支柱九：具身大模型 (Embodied Foundation Models) (10 🔗5) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗3) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱一：机器人控制 (Robot Control) (1) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (14 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting	GuardSplat：高效鲁棒的3D高斯溅射水印方案，保护3D资产版权	3D gaussian splatting 3DGS gaussian splatting	✅
2	TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting	TexGaussian：利用基于八叉树的3D高斯溅射生成高质量PBR材质	3D gaussian splatting gaussian splatting splatting	✅
3	GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding	提出GREAT框架以解决开放词汇3D物体可用性定位问题	open-vocabulary open vocabulary affordance	✅
4	T-3DGS: Removing Transient Objects for 3D Scene Reconstruction	T-3DGS：提出一种移除瞬态对象的3D场景重建方法	3DGS gaussian splatting splatting
5	Tortho-Gaussian: Splatting True Digital Orthophoto Maps	TOrtho-Gaussian：正射高斯溅射生成真数字正射影像地图	3D gaussian splatting 3DGS gaussian splatting
6	Gaussian Splashing: Direct Volumetric Rendering Underwater	Gaussian Splashing：水下场景的快速体积渲染方法，提升渲染速度和细节清晰度。	depth estimation 3D gaussian splatting 3DGS	✅
7	Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding	提出FreeGS以解决无监督3D场景理解中的语义一致性问题	3D gaussian splatting 3DGS gaussian splatting	✅
8	ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model	提出ROSE以解决开放集密集分割问题	open-vocabulary open vocabulary multimodal
9	MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications	MonoPP：利用平面视差几何实现汽车应用中度量尺度自监督单目深度估计	depth estimation monocular depth
10	DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering	DeSplat：提出基于分解高斯溅射的无干扰物渲染方法	gaussian splatting splatting	✅
11	Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction	Uni-SLAM：不确定性感知的神经隐式SLAM，用于实时稠密室内场景重建	visual SLAM scene reconstruction	✅
12	LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis	LokiTalk：学习细粒度和泛化的人脸对应关系，增强基于NeRF的说话头合成	NeRF neural radiance field
13	Quantifying the synthetic and real domain gap in aerial scene understanding	提出基于多模型共识和深度结构的度量方法，量化合成与真实航拍场景的领域差异。	scene understanding
14	Incremental Multi-Scene Modeling via Continual Neural Graphics Primitives	提出C-NGP，通过持续学习将多个场景增量式建模到单个神经辐射场中	NeRF neural radiance field

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings	提出动态视觉Token退出机制(DyVTE)，加速多模态大语言模型的推理。	large language model multimodal	✅
16	SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters	提出SOLAMI框架，用于3D自主角色沉浸式社交视觉-语言-动作建模	vision-language-action VLA multimodal
17	Interleaved-Modal Chain-of-Thought	提出交错模态思维链(ICoT)，提升视觉语言模型在复杂推理任务中的性能。	large language model multimodal chain-of-thought
18	GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis	GalaxAlign：模仿公民科学家多模态指导的星系形态分析方法	foundation model multimodal	✅
19	DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness	DLaVA：一种用于答案定位的文档语言和视觉助手，提升了解释性和可信度	large language model multimodal chain-of-thought	✅
20	Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise	提出CUFIT：一种面向带噪医学图像分类的视觉基础模型课程微调方法	foundation model
21	Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation	Sparrow：一种基于文本到图像增强的数据高效视频-LLM方法	large language model multimodal	✅
22	SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks	SURE-VQA框架：系统评估医学VQA任务中视觉-语言模型的鲁棒性	large language model multimodal	✅
23	STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training	STEP：时空图引导的自训练增强视频大语言模型组合推理能力	large language model chain-of-thought
24	LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos	提出LongVALE基准，用于长视频时序感知的全模态理解	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
25	ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration	ReconDreamer：通过在线修复构建世界模型，提升驾驶场景重建质量	world model dreamer 3DGS
26	SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders	SkelMamba：一种用于神经系统疾病骨骼动作识别的高效状态空间模型	Mamba SSM state space model
27	FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation	FlowCLAS：利用对比学习增强归一化流，用于异常分割	contrastive learning foundation model
28	Pretrained Reversible Generation as Unsupervised Visual Representation Learning	提出预训练可逆生成（PRG）用于无监督视觉表征学习，提升下游任务性能。	flow matching representation learning	✅
29	DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation	提出DELT以解决数据集蒸馏中的多样性不足问题	distillation	✅
30	FairDD: Fair Dataset Distillation	提出FairDD框架，解决数据集蒸馏中对受保护属性的偏见问题。	distillation	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
31	SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens	提出基于尺度自适应Token的SAT-HMR，用于实时多人3D人体网格估计。	HMR
32	FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling	FreeCloth：提出自由形态生成方法，增强复杂服装人体建模效果	SMPL

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation	SIMS：提出检索增强脚本生成方法，模拟风格化人-场景交互	locomotion motion planning human-scene interaction

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
34	MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks	MoTe：学习运动-文本扩散模型，解决多任务运动生成问题	text-to-motion text-driven motion motion generation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
35	The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications	提出SASS，解决城市街景应用中分布式异构传感器数据融合与实时处理难题。	spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页