In the DACH project V-MAV, the aim was to improve image-based algorithms used to control Micro Aerial Vehicles (MAVs). The three partners, TU Graz, TU Munich and ETH Zurich, worked on localization and pose estimation for MAVs using multi-camera systems and visual-inertial systems (i.e., camera systems also using accelerometer, gyroscope and compass data), embedded image processing algorithms (i.e., usage of specifically designed hardware for image processing, with which it is possible to make the MAV smaller and the image processing faster) and on mapping of the environment using images taken with MAVs. The mapping part also made use of additional scene meta-information (e.g., semantic information – which part of the scene is a tree, which part is a house) in order to improve the mapping result.
In our project part, we mainly focused on visual localization and mapping. Our investigations in visual localization resulted in an image-based localization method, which runs in real-time and, hence, can be used for navigating an MAV. Our system mainly uses vertical lines to compute the camera movement. Such lines occur very often in man-made environments (e.g., at windows, doors, building outlines) and, hence, can be used especially for such environments to improve the localization results. In order to detect vertical lines in a fast way, we used an Inertial Measurement Unit (IMU), which delivers accelerometer and gyroscope data, to detect the gravity direction and consecutively detected lines which are parallel to the gravity direction. In a next step, we incorporated the IMU information directly into our localization algorithm in order to improve the localization quality.
In the area of 3D mapping, we investigated in several reconstruction techniques to create compact models, which can be transmitted easily via network, and visually appealing 3D models especially for urban environments. Our first algorithm delivered very compact and visually appealing representations of buildings and specific scene structures. To get similar results for arbitrary urban environments, we developed an additional approach, which detects planes in the scene and makes these plane surfaces selectable in the final reconstruction process. By adjusting the reconstruction parameters, it is possible to adjust how precisely the reconstruction should follow the planes. Additionally, we used semantic information (i.e., which parts in the image are trees, buildings, streets) to improve the 3D reconstruction result. We used artificial intelligence methods to segment images into different semantic classes. Then, using this semantic information, we adjusted the 3D reconstruction parameters depending on the semantic class (i.e., a façade should be planar, a tree should have a smooth surface) and showed that this improves the reconstruction result.