People Detection benchmark repository

Content


Dataset: The chosen dataset has been extracted from the Change detection dataset 2012 / 2014 [1]. It provides a realistic, camera-captured, diverse set of videos. They have been selected to cover a wide range of detection challenges and are representative of typical indoor and outdoor visual data captured today in surveillance, smart environment, and video database scenarios. The Change detection dataset 2012 and 2014 includes the following challenges: baseline, dynamic background, camera jitter, intermittent object motion, shadows, thermal signatures, bad weather, low framerate, night videos, PTZ and turbulance. The proposed People detection challenge includes 19 selected sequences from the whole original dataset (53 sequences). Each sequence is accompanied by accurate people detection ground-truth. We have grouped all the test sequences into different complexity categories depending on two aspects: the people classification and background complexity. Following table includes a description of each sequence in terms of complexity and length:

Video

Background complexity

Classification complexity

#Frames

1

f

office

Baseline

Low

2050

2

f

pedestrians

Baseline

Low

1099

3

f

PETS2006

Baseline

Medium

1200

4

f

fall

Dynamic Background

High

4000

5

f

overpass

Dynamic Background

Medium

3000

6

f

badminton

Camera Jitter

Medium

1150

7

f

sidewalk

Camera Jitter

High

1200

8

f

abandonedBox

Intermittent Object Motion

High

4500

9

f

sofa

Intermittent Object Motion

Low

2750

10

f

tramstop

Intermittent Object Motion

High

3200

11

f

winterdriveway

Intermittent Object Motion

High

2500

12

f

backdoor

Shadow

Low

2000

13

f

busStation

Shadow

Low

1250

14

f

copyMachine

Shadow

High

3400

15

f

cubicle

Shadow

Medium

7400

16

f

peopleInShade

Shadow

Low

1199

17 f
skating
Bad Weather
High
3900
18 f
wetSnow
Bad Weather
High
3500
19 f
zoomInZoomOut
PTZ
Medium
1130

 

 

 

 

Total

50428

Ground Truth: In complex environments with multiple people and partial occlusions, it is often not obvious where to draw the line and decide whether a person should be annotated or not. In our set of sequences, people “occur” in every state of occlusion, from fully visible to just one single body part visible. We therefore decided to annotate all those cases where a human could clearly detect the person. As a consequence, all people were annotated as a single entity (blob) covering the connected or disconnected visible parts of them whenever at least the head or most of the torso is visible. To carry out the annotation task, we have used the Video Image Annotation Tool.

Evaluation metrics: In order to evaluate different people detection approaches, we need to quantify the different performance results. Global sequence performance is usually measured in terms of Precision-Recall (PR) curves [2]. In order to evaluate not only the yes/no detection decision but also the precise persons locations and extents, we use three evaluation criteria, defined by [3], that allow comparing hypotheses at different scales: relative distance, cover and overlap. A detection is considered true if d_r≤0.5 (corresponding to a deviation up to 25% of the true object size) and cover and overlap are both above 50%. Only one hypothesis per object is accepted as correct, so any additional hypothesis on the same object is considered as a false positive. The integrated Average Precision (AP) is generally used to summarize the overall performance, represented geometrically as the area under the PR curve (AUC-PR). For more details please refer to the "Evaluation " section.

Results: In this section, we describe the experiments performed over the experimental dataset and including different approaches from the state of the art. In order to evaluate all the different approaches over the proposed dataset. Firstly, we present the experimental results for each video sequence. And, after that, we present the results for each proposed different complexity categories: background complexity and classification complexity. In addition, we make use of the average AUC and algorithm ranking along each evaluation. For more details please refer to the "Results" section.

Main reference:

Associated references:

[1] 2012/2014 IEEE Change Detection Workshop, http://www.changedetection.net/.

[2] B. Leibe, A. Leonardis, and B. Schiele, “Robust object detection with interleaved categorization and segmentation,” IJCV, vol. 77(1-3), pp. 259–289, 2008.

[3] B. Leibe, E. Seemann, and B. Schiele. Pedestrian detection in crowded scenes. In Proc. of CVPR, 2005, pp. 878-885.

Recall that this people detection benchmark is available just for research purposes.

 

Work partially supported by project TEC2011-25995 EventVideo.