Files
Abstract
In recent years, the success of Transformers has been demonstrated in computer vision (CV)tasks, with the Vision Transformer (ViT), which competes with CNN networks on image
classification tasks when using pre-trained models. Many of these deep learning models are
designed by experts, which takes knowledge, time, and labor costs. Neural Architecture Search
(NAS), seeks to automate the process of designing a neural network architecture. In this paper, I
propose NSGA-ViT, a multi-objective evolutionary NAS for designing Transformer-based
networks for computer vision tasks. NSGA-ViT utilizes a multi-objective genetic algorithm
(NSGA-II) to design a ViT network with two objectives: maximizing performance, and
minimizing network size. NSGA-ViT searches a search space of self-attention and convolution
operations to discover a transformer architecture which outperforms ViT on CIFAR-10 and while
containing half the parameters.