Bases de datos
GEINTRA Overhead ToF People Detection (GOTPD1 & GOTPD2) Databases
Introduction
The GOTPD1 & GOTPD2 (Geintra Overhead ToF People Detection datasets) are multimodal databases (depth, and infrared data in the case of GOTPD1; and depth, infrared and RGB data in the case of GOTPD2 (RGB only for frontal camera)) of recordings from a Kinect 2 camera located in overhead/frontal position, monitoring people movements under/in front of it, and it was designed to fulfill the following objectives:
- Allow evaluation and fine tuning of the ToF data adquisition system in the GEINTRA research group.
- Allow the evaluation of people detection, headgear accesories identification and human activity detection algorithms based on data generated by ToF cameras (Including RGB, depth and infrared) placed in overhead and frontal positions.
- Provide quality data to the research community in people detection and identification tasks.
The people detection task (and the data provided) can also be extended to practical applications such as video-surveillance, access control, people flow analysis, behaviour analysis or event capacity management.
Sample videos
We provide here some video examples generated from recordings from the database (depth only):
Additional information
You can access full details on the dataset in the corresponding document, and get a data sample (see the documentation for references on file formats). The documents refer mainly to the GOTPD1 dataset.
If you want to get a copy of the datasets, please contact javier.maciasguarasa@uah.es or download it from kaggle. Suplemental data (23 additional sequences) that we have used to extend the GOTPD1 data to face DNN based approaches can be found at this additional link (we stored it there as kaggle does not support datasets over 20Gbytes).
Licensing
GEINTRA Overhead ToF People Detection (GOTPD1&GOTPD2) Databases by Javier Macias-Guarasa, Cristina Losada-Gutierrez, David Fuentes-Jimenez, Raquel Garcia-Jimenez, Carlos A. Luna, Alvaro Fernandez-Rincon, and Manuel Mazo is distributed as-is, and licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
If you make use of this databases and/or its related documentation, you are kindly requested to cite the following papers:
- Carlos A. Luna, Cristina Losada-Gutierrez, David Fuentes-Jimenez, Alvaro Fernandez-Rincon, Manuel Mazo, Javier Macias-Guarasa
Robust People Detection Using Depth Information from an Overhead Time-of-Flight Camera
Expert Systems with Applications, Available online 26 November 2016, ISSN 0957-4174
http://dx.doi.org/10.1016/j.eswa.2016.11.019 (http://www.sciencedirect.com/science/article/pii/S0957417416306480) - C. A. Luna, Macias-Guarasa, J., Losada-Gutierrez, C., Marron-Romera, M., Mazo, M., Luengo-Sanchez, S., y Macho-Pedroso, R.,
Headgear Accessories Classification Using an Overhead Depth Sensor
Sensors, vol. 17, 2017.
http://dx.doi.org/10.3390/s17081845
GEINTRA Person Detection Database (GFPD)
Introduction
The Depth Person detection database (GFPD-UAH) is a multimodal database (depth, and infrared data) ,of recordings from a Intel Realsense D435 and a Kinect 2 depth cameras located in frontal elevated position, monitoring people movements, and it was designed to fulfill the objective of :
- Allow evaluation and fine tuning of the ToF data adquisition system in the GEINTRA research group.
- Allow the evaluation of people detection, headgear accesories identification and human activity detection algorithms based on data generated by ToF cameras (Including RGB, depth and infrared) placed in overhead and frontal positions.
- Provide quality data to the research community in people detection and identification tasks.
The people detection task (and the data provided) can also be extended to practical applications such as video-surveillance, access control, people flow analysis, behaviour analysis or event capacity management.
Sample video
To give you an idea on what to expect, you can have a look at the following video we prepared from the data(https://www.youtube.com/watch?v=ugaDzk5Ua9M).
Database info
GFPD is composed of sequences comprising a broad variety of conditions, with scenarios comprising:
- Single and multiple persons
- Persons with and without accessories (hats, caps)
- Persons with different complexity, height, hair color, and hair
- configuration
- Persons actively moving and performing additional actions (such as
- using their mobile phones, moving their fists up and down, moving
- their arms, etc.).
The depth information is stored in general 16 bit images(.png), with each pixel distance represented in millimeters.
File naming conventions
To ease adapting the experimental setup for specific tasks, we have designed a (verbose) naming convention for the file names. Each file is named following this structure: seq-PXX-MYY-AUUUU-GXX-CWW-SVVV, where:
- PXX : Number of persons in the scene. XX is the maximum number of people than can be seen simultaneously in the scene. Note that there may be multiple users recorded in a given sequence, but at most XX will be seen at the same time.
- MYY : Movement information. YY is written in decimal but is meant to refer to a bitmask, with the following convention: – 00 N/A 4– 01 static – 02 mostly regular around scene – 04 mostly random – 08 reduced movements (almost static, probably turning on)
Filename extensions
The distributed filenames have an extension that identifies their type, as follows:
-
png: Depth and GT information file.
Depth Camera Specifications
1. The first camera used in our recordings is a Kinect 2 for windows device, with the following main characteristics :
- Depth Depth Technology: Time of Flight
- Depth sensing – 512 x 424 – 30 Hz – FOV: 70 x 60 – One mode:
- 0.5–4.5 meters
- RGB Sensor 1080p color camera – 30 Hz (15 Hz in low light)
- Active infrared (IR) capabilities – 512 x 424 – 30 Hz
- Microphone array (4 microphones)
- Intrinsic parameters obtained from calibration:
fx=367.286994337726;
fy=367.286855347968;
cx=255.165695200749;
cy=211.824600345805;
2. The second camera used for the recordings is an intel realsense D435 with the following characteristics:
- Depth Depth Technology: Active IR Stereo
- Depth sensing up to 1280x720 - 90 FPS- FOV 86° × 57° (±3°) maximum approximate depth 10 meters. Accuracy varies depending on calibration, scene, and lighting condition.
- RGB Sensor up to 1920 × 1080- 30FPS - FOV 69.4° × 42.5° × 77° (± 3°)
- Intrinsic parameters obtained from calibration:
fx=915.5;
fy=915.5;
cx=645.5;
cy=366.3;
Disclaimer and Licensing
GEINTRA Person Detection Database by David Fuentes-Jimenez. Cristina Losada-Gutierrez. David Casillas-Perez, Javier Macias-Guarasa. Roberto Martin-Lopez. Daniel Pizarro and Carlos A. Luna is distributed as-is, and is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
If you make use of this databases and/or its related documentation, you are kindly requested to cite the paper:
- David Fuentes-Jimenez. Cristina Losada-Gutierrez. David Casillas-Perez, Javier Macias-Guarasa. Roberto Martin-Lopez. Daniel Pizarro and Carlos A. Luna. Towards Dense People Detection with Deep Learning and Depth images, 2020, 2007.07171. (https://arxiv.org/abs/2007.07171)
GEINTRA Synthetic Depth People Detection Dataset (GESDPD)
Introduction
The GESDPD is a depth images database containing 22000 frames , that simulates to have been taken with a sensor in an elevated front position, in an indoor environment. It was designed to fulfill the
following objectives:
-
Allow the train and evaluation of people detection algorithms based on depth , or RGB-D data, without the need of manually labeling.
-
Provide quality synthetic data to the research community in people detection tasks.
The people detection task can also be extended to practical applications such as video-surveillance,
access control, people flow analysis, behaviour analysis or event capacity management.
General contents
GESDPD is composed of 22000 depth synthetic images, that simulates to have been taken with a sensor in an elevated front position, in a rectangular, indoor working environment. These have been generated using the simulation software Blender.
The synthetic images show a room with different persons walking in different directions. The camera perspective is not stationary, it moves around the room along the database, which avoids a constant background. Some examples of the different views are shown in the next figures.
Quantitative details on the database content are provided below
- Number of frames: 22000
- Number of different people: 4 (3 men and 1 woman)
- Number of labeled people: 20800
- Image resolution: 320 × 240 pixels
For each image, we provide the depth map and the ground truth including the position of each person in the scene.
To give you an idea on what to expect, the next figure shows some examples of images from the dataset. In this figure, depth values are represented in millimeters, using a colormap.
Geometry details
As it has been said before, the dataset simulates to have been taken with a sensor in an elevated front position, in a rectangular indoor working environment. Specifically, the camera was placed at a height of 3 meters, and it rotates along the sequence. Regarding the room (whose distribution is shown in figure the next figure), its dimensions are 8.56 × 5.02m, and it has a height of 3.84m.
File Formats
Depth data
The depth information (distance to the camera plane) in stored as a .png image, in which each pixel represent the depth value in millimeters as a (little endian) unsigned integer of two bytes. Its values range from 0 to 15000.
Position Ground Truth Data
The ground truth information is also provided as a .png file, with the same dimensions that the generated images (320 × 240 pixels). The ground truth files have in their names the same number than the
corresponding depth files.
For labeling people positions, there have been placed Gaussian functions over the centroid of the head of each person in the scene, so that the centroid corresponds to the 2D position of the center of the head and has a normalized value of one. The standard deviation has a value of 15 pixels for all the Gaussians, regardless of the size of each head and the distance from the head to the camera. This value has been calculated based on an estimated value of the average diameter of a person head, taking into account anthropometric considerations.
It is worth to highlight that, when two heads are very closely or overlapping with each other, instead of adding both Gaussian functions, the maximum value of them prevail. That modification provides a set
of Gaussians that are always separated, so that the CNN can learn to generate that separation between Gaussians in its output. The next figure shows an example of two Gaussian functions.
Disclaimer, Licensing, Request and Contributions
Licensing and disclaimer
The GEINTRA Synthetic Depth People Detection (GESDPD) Database by David Fuentes-Jimenez. Roberto Martin-Lopez, Cristina Losada-Gutierrez, Javier Macias-Guarasa and Carlos A. Luna is distributed as-is, and licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
If you make use of this databases and/or its related documentation, you are kindly requested to cite the following papers:
- David Fuentes-Jimenez. Cristina Losada-Gutierrez. David Casillas-Perez, Javier Macias-Guarasa. Roberto Martin-Lopez. Daniel Pizarro and Carlos A. Luna. Towards Dense People Detection with Deep Learning and Depth images, 2020, 2007.07171. (https://arxiv.org/abs/2007.07171)
- R. M. López, D. F. Jiménez, C. L. Gutiérrez, and C. L. Vázquez, “Detección de personas en imágenes de profundidad mediante redes neuronales convolucionales,” in Libro de actas del XXVI Seminario Anual de Automática, Electrónica Industrial e Instrumentación, 2019, pp. 114–119.
Also, if you derive additional data, information, publications, etc., using GESDPD, please tell us so that we can also publicite your contributions.