Official project site of Using a DCT-driven Loss in Attention-based Knowledge-Distillation for Scene Recognition (Under review in Elsevier Pattern Recognition).
Abstract
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowledge.
In this paper, we propose and analyse the use of a 2D frequency transform of the activation maps before transferring them. We pose that—by using global image cues rather than pixel estimates, this strategy enhances knowledge transferability in tasks such as scene recognition, defined by strong spatial and contextual relationships between multiple and varied concepts.
To validate the proposed method, a novel and extensive evaluation of the state-of-the-art in scene recognition is presented. Experimental results provide strong evidences that the proposed strategy enables the student network to better focus on the relevant image areas learnt by the teacher network, hence leading to better descriptive features and higher transferred performance than every other state-of-the-art alternative.
Proposed Method
Example of the proposed Knowledge-Distillation gangways between two ResNet architectures representing the teacher and the student models. In this case, the intermediate feature representations for the Distillation Knowledge are extracted from the basic Residual Blocks. We propose a novel matching approach based on a 2D discrete linear transform to the activation maps. This novel technique, for which we here leverage the simple yet effective Discrete Cosine Transform (DCT), allows to compare the 2D relationships captured by the transformed coefficients. In the proposed approach, the matching is moved from a pixel-to-pixel fashion to 65 a correlation in the frequency domain, where each of the coefficients integrates spatial information from the whole image.
Results
State-of-the-art Results
Reporting strong evidences that the proposed DCT-based metric enables the student network to better focus on the relevant image areas learnt by the teacher model, hence increasing the overall performance for Scene 95 Recognition.
Qualitative Activation Maps Results
Example of the obtained activation maps at different levels of depth. Top rows represent activation maps for vanilla ResNet-18 and ResNet-50 CNNs respectively. Bottom row represents the activation maps obtained by the proposed DCT Attention-based KD method when ResNet-50 acts as the teacher network and ResNet-18 acts as the student. AT activation maps are also included for comparison.
Related Work
López-Cifuentes, A., Escudero-Viñolo, M., Bescós, J., & García-Martín, Á. (2020). Semantic-aware scene recognition. Pattern Recognition, 102, 107256.
Comment: Scene recognition models enhanced by the use of context information as semantic segmentation.
López-Cifuentes, A., Escudero-Viñolo, M., Gajic A., & Bescós, J. (2021). Visualizing the Effect of Semantic Classes in the Attribution of Scene Recognition Models. In Proceedings of ICPR International Workshops and Challenges.
Comment: Visualizing the attribution of scene recognition models by perturbing the in input images with semantic segmentation.
Citation
If you find this work useful, please consider citing:
López-Cifuentes, A., Escudero-Viñolo, M., Bescós, J., & San Miguel, J. C. (2022). Using a DCT-driven Loss in Attention-based Knowledge-Distillation for Scene Recognition.
@InProceedings{Lopez2022using, author="L{\'o}pez-Cifuentes, Alejandro and Escudero-Vi{\~{n}}olo, Marcos and Besc{\'o}s, Jes{\'u}s and San Miguel, Juan Carlos", title="Using a DCT-driven Loss in Attention-based Knowledge-Distillation for Scene Recognition", }
Acknowledgement: This study has been partially supported by the Spanish Government through its TEC2017-88169-R MobiNetVideo project.