In recent years, the success of Transformers has been demonstrated in computer vision (CV)tasks, with the Vision Transformer (ViT), which competes with CNN networks on image
classification tasks when using pre-trained models. Many of these deep learning models are
designed by experts, which takes knowledge, time, and labor costs. Neural Architecture Search
(NAS), seeks to automate the process of designing a neural network architecture. In this paper, I
propose NSGA-ViT, a multi-objective evolutionary NAS for designing Transformer-based
networks for computer vision tasks. NSGA-ViT utilizes a multi-objective genetic algorithm
(NSGA-II) to design a ViT network with two objectives: maximizing performance, and
minimizing network size. NSGA-ViT searches a search space of self-attention and convolution
operations to discover a transformer architecture which outperforms ViT on CIFAR-10 and while
containing half the parameters.