section presents, via the left menu, a description of the test
sequences for each category along with frame samples, low resolution
video previews and the event annotations. Annotations have
been done using the VIPER toolkit. The video files
have been coded using the MPEG-1 codec in order to be compatible
with the VIPER
dataset contains 17 sequences taken using a stationary camera at
resolution of 320x240 at 12 fps. The dataset is focused on two types of human-related events: interactions and activities. In
particular, two activities (Hand Up and Walking) and three human-object
interactions (Leave, Get and Use object) have been annotated.
have grouped all the test sequences into three categories according to
a subjective estimation of the analysis complexity considering:
Foreground complexity (S1), defined as the complexity to extract the foreground due to
the presence of edges, multiple textures, lighting changes,
reflections, shadows and objects belonging to the background.
Tracking complexity (S2),
defined as the difficulty to track foreground blobs in the sequence. It
mainly differentiates crowded from less-populated sequences.
Feature complexity (S3), defined as the
difficulty to classify moving and temporally stationary foreground in a
scenario in order to extract/analyze relevant features.
Event complexity (S4),
defined as the
difficulty to detect/recognize the annotated events in a
scenario. It is related with the velocity of the event execution, the
(partial) occlusion of the action performed and the variability in
appearance of the actor.
Sample frames of such categories are shown in the following images
summary of the annotated events in the dataset
and the associated complexity of each category is available in the
complexity estimation codes are Low (L), Medium (M), High (H) and
Very High (V). The events are Leave-object (LEA), Get-object (GET),
Use-object (USE), Hand Up (HUP) and Walking (WLK).