Document Type
Article
Publication Date
9-8-2021
Publication Title
Signals
Volume
2
Issue
3
First page number:
604
Last page number:
618
Abstract
Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.
Keywords
action recognition; scene recognition; action quality assessment; activity recognition; deep learning; computer vision; convolutional neural networks; multitask learning; transfer learning
Disciplines
Health Information Technology
File Format
File Size
1694 KB
Language
English
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Repository Citation
Parmar, P.,
Morris, B.
(2021).
HalluciNet-ing Spatiotemporal Representations Using a 2D-CNN.
Signals, 2(3),
604-618.
http://dx.doi.org/10.3390/signals2030037