Video Processing and Understanding Lab
Universidad Autónoma de Madrid, Escuela Politécnica Superior

SI1/PJI/2019-00414 AISEEME (2020-2022)
Aiding diagnosis by self-supervised deep learning from unlabeled medical imaging

Supported by the
Consejería de Educación e Investigación
of the
Comunidad de Madrid
Work Packages

WP1: Infrastructure and datasets

This work package aims at the initial establishment and maintenance of a development framework for the remaining work packages.

T1.1: Infrastructure update and maintenance (M1-M24)

 Arrangement and configuration of the available equipment, and the acquisition of complementary equipment, to establish the necessary infrastructure to meet the objectives of the project. 

T1.2: Collection and generation of datasets (M1-M16)

 Support to other tasks by generating train and test data and associated evaluation methodologies. It includes the selection of appropriate datasets (images and associated ground-truth) and their generation if required.


  • M1.1 Acquisition of required hardware (T1.1, M2). 
  • M1.2 Collection/generation of datasets for WP2 and WP3 (T1.2, M4).  
  • M1.3 Collection/generation of datasets for WP4 (T1.2, M13)

Deliverables (* stands for if required): 

  • D1.1 System infrastructure (T1.1, M4, M16*, M24*).
  • D1.2 Evaluation datasets (T1.2, M8, M16).

WP2: Enabling technologies

To perform a study of the current technologies for applications on self-supervised learning, pretext tasks, skin lesion assessment and lung nodule malignancy evaluation. 

T2.1: Self-supervised frameworks and pretext tasks  (M1-M4)

To compare state-of-the-art SSL approaches, exploring the influence of the CNN architecture, the pretext task and the training schedule. Object recognition will be used as the target task for comparison. 

T2.2: Skin lesion assessment (M5-M8)

 To compare state-of-the-art skin lesion assessment approaches based on deep learning. Special attention will be made to the preprocessing of input data and to temporal schemes that permit the consideration of the lesion evolution.  

T2.3: Lung nodule malignancy evaluation (M5-M8)

To compare state-of-the-art skin lung nodule detection and malignancy evaluation based on deep learning.    


  • M2.1 Implementation and performance evaluation of self-supervised approaches (T2.1, M4). 
  • M2.2 Implementation and performance evaluation of skin lesion and lung nodule malignancy approaches (T2.2, T2.3, M8). 


  • D2 Enabling technologies: algorithms and findings (M8)

WP3: Curriculum-based multi-task self-supervised learning

To select a set of pretext tasks and an associated priority ordering/categorization to build a task curriculum for guiding multi-task SSL regimes. To assess the effect of CNN architecture and training schedule on the performance of the task curriculum. To compare the performance of the predefined task curriculum against one automatically obtained through a self-paced scheme. To measure the impact of the proposed scheme for the SSL training in the use case of object recognition.  

T3.1: Empirical definition and completion of a pretext task curriculum (M9-M10)

To arrange existing pretext tasks on cognitive categories and define additional tasks if required. To define learning orderings (curricula) based on this organization and measure the impact of these curricula in the learning outcomes. 

T3.2: Evaluation of the impact of the architecture and training schedule (M11-M12)

To evaluate the dependencies between the task curriculum and the training framework. To get insights on the advantages/disadvantages of using different architectures and learning schedules given a task curriculum for the object recognition use case.

T3.3: Self-paced multi-task self-supervision (M13-M16)

To define a learning framework that permits to automatically define a pretext task curriculum for a given target task. To compare the performance of the so-obtained curriculum against those defined in T3.1 in the object recognition use case.    


  • M3.1 Empirical design of the task curriculum (T3.1, M10)
  • M3.2 Empirical design of the SSL framework (T3.2, M13)
  • M3.3 Self-paced learned design of a task curriculum (T3.3, M16)


  • D3 Design of a curriculum-based multi-task self-supervised learning regime (M16)

WP4: Use cases in medical imaging

To assess the advantages/disadvantages of the multi-task SSL frameworks designed in WP3 for vision tasks in the medical image domain. The target tasks will include two use cases: skin lesion assessment and lung nodule malignancy detection. The evaluation framework will include databases and methodologies identified in WP1, and baseline results obtained in WP2.

T4.1: Multi-task SSL approaches for skin lesion assessment  (M17-M24)

Evaluation of multi-task SSL frameworks (including a task curriculum approach) for the  assessment of skin lesions in skin images. Comparison with baseline results derived from WP2. 

T4.2: Multi-task SSL approaches for lung nodule malignancy detection  (M17-M24)

Evaluation of multi-task SSL frameworks (including a task curriculum approach) for lung nodule malignancy detection. Comparison with baseline results derived from WP2


  • M4.1 Initial results and detected limitations for the use cases (T4.1, T4.2, M20)
  • M4.2 Final results for the use cases (T4.1, T4.2, M24)


  • D4 Multi-task SSL framework for applications in medical imaging (M24)

WP5: Management and dissemination

To coordinate the project activities, follow-up progress and the dissemination.

T5.1: Management (M1-M24)

This task has the following activities: monthly assessment of project progress and achievements, control of milestone and derivable deadlines, workplan updates and corrective actions if required, and administrative issues.

T5.2: Dissemination (M1-M24)

This task coordinates the compilation and internal publication of intermediate results, as well as final publications in journals and conferences. It will also coordinate the dissemination of results via the project web page and Newsletters.


  • D5 Results report (M8 and updates every 8 months)

Universidad Autónoma de Madrid, Escuela Politécnica Superior
Video Processing and Understanding Lab