Have you ever wondered how PyTorch works in depth? If the answer is positive - then this tutorial and series of articles is for you.
We start gently by introducing the concept of derivation, then move on to function optimization. From there, we build our first neural network, followed by convolutional neural networks for classification, regression, segmentation, and image generation. Finally, we conclude with object detection - YOLO.
This tutorial is created to reveal and recreate the internal workings of the famous PyTorch framework by following the principles:Introduction & motivation for gradient based optimization.
We will extend the basis by taking a derivative of a complex function in a way PyTorch does it. For that we will introduce execution graph and auto-differentiation.
Finally we will make out first framework for function minimization which supports scalars.
Example: 1D function optimization.
The framework will be extended to work on multi dimensional arrays (tensors).
Example: Logistic regression.
Logistic regression will be rewritten in a PyTorch style. For that we need to introduce Module - a basic higher-order building block. Loss function and optimization algorithms will be placed into their respective places. We will conclude with simple NN regression on 2D data.
Example: a simple NN from scratch using Module.
In order for our models to work they need data. Therefore, we will create Dataset - a class describing how a single sample is loaded, DataLoader - a class responsible for data batching and transformation classes - for data augmentation.
Example: clothes classification using NN.
In order to speed up the calculation we will implement support for GPU via CuPy.
To work with images we will explain and implement convolution and max pooling layers (for CPU and GPU). Moreover, batch normalization and dropout will be implemented and explained.
Examples: ResNet classifcation, ResNet landmark detection
2D Convolution Layer (CPU/GPU).
2D Pooling Layer (CPU/GPU).
2D Batch Normalization Layer.
An Implementation of Dropout Layer.
ResNet based network for CIFAR10 image classification.
ResNet based network for face landmark regression.
Image to image translation (e.g. autoencoders) require image upsampling opposed to convolution and max-pooling which usually reduce image size.
Therefore, transposed convolution and upsample layers will be introduced and implemented.
Examples: UNet car segmentation, DCGAN anime generation
2D transposed convolution implementation using 2D Convolution primitives.
Due to Tranpose conv may cause unwanted artifacts, we implement 2d upsampling layer.
UNet implementation using 2d transpose conv. Car segmentation.
DCGAN implementation. Anime image generation.
In order to show we support SOTA algorithms, a object detection algorithm - YOLO will be created from scratch.
Example: YOLO