We develop computer vision methods, including 3D object detection, hand pose estimation, geo-localization, and indoor 3D reconstruction, with application to augmented reality and robotics.
We propose an efficient transformer based architecture for 3D pose estimation of two-hands and object during complex interaction from a single RGB image.
We propose a novel method for reconstructing floor plans from noisy 3D point clouds that proposes Monte Carlo Tree Search (MCTS) with integrated refinement step to solve this problem.
We explore how a general AI algorithm can be used for 3D scene understanding in order to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans.
We develop a method for automatic hand-object 3D pose annotation when captured with one or more RGBD cameras. We create large scale hand-object dataset using this method and make it public along with baseline results for hand pose estimation from single RGB image.
We introduce a novel method for estimating 3D Room Layout from a single image.
While acquiring annotations for color images is a difficult task, we introduce a novel learning method for 3D pose estimation from color images.
We introduce a novel approach for object 3D pose estimation, which is inherently robust to partial occlusions of the object.
We present a scalable approach to retrieve 3D models for objects in the wild. Our method builds on the fact that knowing the object pose significantly reduces the complexity of the task.
We propose a simple and efficient method for exploiting synthetic images when training a Deep Network to predict a 3D pose from an image.
We propose a simple and efficient method for physics-based hand object interaction in VR.
Given simple 2.5D city maps, we show how to exploit recent results in semantic segmentation to efficiently track a camera in urban environments.
BB8 is a novel method for 3D object detection and pose estimation from color images only. It predicts the 3D poses of the objects in the form of 2D projections of the 8 corners of their 3D bounding boxes.
We propose a novel illumination normalization method that lets us learn to detect objects and estimate their 3D poses under challenging illumination condition from very few training samples.
We introduce novel methods for predicting the 3D joint locations of a hand given a depth map using Convolutional Neural Networks (CNN).
We propose methods for accurate camera pose estimation in urban environments from single images and 2.5D maps made of the surrounding buildings’ outlines and their heights.
We present a method for large-scale geo-localization and global tracking of mobile devices in urban outdoor environments.
We introduce a simple but powerful approach to computing descriptors for object views that efficiently capture both the object identity and 3D pose.
We present a method that estimates in real-time and under challenging conditions the 3D pose of a known object.
We introduce a learning-based approach to detect repeatable keypoints under drastic imaging changes of weather and lighting conditions to which state-of-the-art keypoint detectors are surprisingly sensitive.
We propose an approach to detect flying objects such as UAVs and aircrafts when they occupy a small portion of the field of view, possibly moving against complex backgrounds, and are filmed by a camera that itself moves.
We propose a novel approach to synthesizing images that are effective for training object detectors.