Video Processing and Understanding Lab
Universidad Autónoma de Madrid
Escuela Politécnica Superior

PID2021-125051OB-I00 HVD (2022-2025)
Harvesting Visual Data: enabling computer vision in unfavourable data scenarios

HVD_logo
Supported by the
Ministerio de Ciencia e Innovación
of the Spanish Goverment
MCI_logo
Home

Project proposal overview

Computer Vision (CV) is relied on for the ongoing revolution of Artificial Intelligence (AI) applications based on visual information processing. Together with the AI field, Computer Vision is currently booming, being the focus of unprecedent initiatives and funding efforts targeting to boost a transformational technology. The main reason for this explosive growth has been the development of Machine Learning schemes (mainly Deep Convolutional Neural Networks-CNNs) that are allowing to reach success rates similar to the human ones in problems requiring the analysis of visual information (e.g., object detection or semantic segmentation [1][2]). In fact, the multitude of reports (ocde.org or European Commission ), that declare Artificial Intelligence (AI) as the currently most impacting General-Purpose Technology (GPT), rely on Computer Vision for the current revolution of applications based on visual information processing.

The time is now, as effective solutions based on CV are not in a visible future, but are currently a reality, and are already transforming and defining the world we live in. The deep learning (DL) revolution, unprecedented in the field of CV, is taking place both in the scientific and industrial fields, thanks to the broad and open availability of research ideas to confront many of the most demanded applications but also enabling new ones for the public good. However, the adaptation of existing ideas, canonical models, methods, and training procedures, encoded to specific domains has revealed new limitations, problems and challenges that endanger their fundamental promise of universality and, if unattended, may risk the society rights protected by the EU law.

Some of these problems are intrinsic to the use of AI and Deep CNNs. Our proposal in this project is to delve into some of the limitations of these approaches when applied to visual signals, namely, issues related to the availability of annotated data. Data is the condition that enables nowadays CV models. Despite having received substantially less attention than algorithms, methods, or models, increasing efforts are being made to provide large-scale accessible repositories, e.g., by encouraging fair data access and data-sharing . Our main goal is to reduce the huge dependency that AI and Deep CNNs algorithms have on the availability of large-scale annotated training data.

With respect to annotated data availability, as the current state-of-the-art solutions are mainly based on supervised learning approaches, their success requires the availability of large human-annotated datasets (such as ImageNet [3]) which hinges on a large amount of expensive supervision, time and effort. Moreover, for some sensitive domains, such as traffic collision, annotations and content are rarely available, which in practice prevents the creation of large and complete datasets. Some applications require a continuous addition of data or annotations, while conserving the previously encoded model knowledge. For example, due to the inclusion of additional tasks or additional target classes. Besides, these trained systems are usually tailored to a specific task and cannot be adapted to other tasks without re-training. To cope with these issues, a large research effort has been devoted to achieve systems that can: model the intrinsic patterns in the data without (fully) leveraging human labeling; adapt continuously the training process to the availability of new additional data; and extrapolate useful training information using complementary synthetic datasets for which an automatic annotation can be obtained. In this direction, we propose (1) to explore the capabilities of training methods using real data in absence of annotations through non-supervised and self-supervised approaches; and (2) to explore the creation and use of synthetic data to complement the training processes.

Last update 01/09/2022
Universidad Autónoma de Madrid
Escuela Politécnica Superior
Video Processing and Understanding Lab