The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird's eye) view acquired at an altitude of 5 to 30 meters above ground. A high resolution camera was used to acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly available images and the test set is made up of 200 private images.
Person Detection
For the task of person detection the dataset contains bounding box annotations of the training and test set.
Semantic Segmentation
We prepared pixel-accurate annotation for the same training and test set. The complexity of the dataset is limited to 20 classes as listed in the following table.
Table 1: Semanic classes of the Drone Dataset
- tree
- gras
- other vegetation
- dirt
- gravel
| - rocks
- water
- paved area
- pool
- person
| | - fence
- fence-pole
- window
- door
- obstacle
|
Additional Data Available
- High resolution images at 1Hz
- Fish-eye stereo images at 5Hz with synchronized IMU measurements
- Thermal images at 1Hz
- Ground control points
- 3D ground truth of 3 houses acquired by a total station