Let's Create Mini PyTorch from Scratch

A step by step tutorial (NumPy / CuPy)

GitHub Icon View Project on GitHub LinkedIn Icon About the author

Have you ever wondered how PyTorch works in depth? If the answer is positive - then this tutorial and series of articles is for you.

We start gently by introducing the concept of derivation, then move on to function optimization. From there, we build our first neural network, followed by convolutional neural networks for classification, regression, segmentation, and image generation. Finally, we conclude with object detection - YOLO.

This tutorial is created to reveal and recreate the internal workings of the famous PyTorch framework by following the principles:
  1. A moderate knowledge of Python and PyTorch framework is assumed.
  2. Concepts are introduced step by step with samples.
  3. The focus is placed on computer vision samples.
  4. Sample datasets are chosen to provide fast training with acceptable results.
  5. Significant effort is put on code simplicity and readability.

Module 0 - Intro

Introduction & motivation for gradient based optimization.


Module 1 - AutoDiff

We will extend the basis by taking a derivative of a complex function in a way PyTorch does it. For that we will introduce execution graph and auto-differentiation. Finally we will make out first framework for function minimization which supports scalars.
Example: 1D function optimization.


Module 2 - AutoDiff (Tensors)

The framework will be extended to work on multi dimensional arrays (tensors).
Example: Logistic regression.

Post

Auto-diff - Tensor Support

Here we extend the scalar auto-diff engine to work with tensors.


Module 3 - Module Class

Logistic regression will be rewritten in a PyTorch style. For that we need to introduce Module - a basic higher-order building block. Loss function and optimization algorithms will be placed into their respective places. We will conclude with simple NN regression on 2D data.
Example: a simple NN from scratch using Module.

Post

Module Class

A Module class as a basic building block for layers.


Module 4 - Dataset & DataLoader

In order for our models to work they need data. Therefore, we will create Dataset - a class describing how a single sample is loaded, DataLoader - a class responsible for data batching and transformation classes - for data augmentation.
Example: clothes classification using NN.

Post

Dataset & DataLoader

Dataset & DataLoader classes.


Module 5 - 2D Convolution (and Friends)

In order to speed up the calculation we will implement support for GPU via CuPy. To work with images we will explain and implement convolution and max pooling layers (for CPU and GPU). Moreover, batch normalization and dropout will be implemented and explained.
Examples: ResNet classifcation, ResNet landmark detection

Post

2D Convolution

2D Convolution Layer (CPU/GPU).

Post

2D Pooling

2D Pooling Layer (CPU/GPU).

Post

2D Batch Norm

2D Batch Normalization Layer.

Post

Dropout

An Implementation of Dropout Layer.

Post

Sample: CIFAR10 classification - ResNet

ResNet based network for CIFAR10 image classification.

Post

Sample: Face Landmark Detection - ResNet

ResNet based network for face landmark regression.


Module 6 - Upsampling Layers

Image to image translation (e.g. autoencoders) require image upsampling opposed to convolution and max-pooling which usually reduce image size. Therefore, transposed convolution and upsample layers will be introduced and implemented.
Examples: UNet car segmentation, DCGAN anime generation

Post

Transposed 2D Convolution

2D transposed convolution implementation using 2D Convolution primitives.

Post

2D Upsampling

Due to Tranpose conv may cause unwanted artifacts, we implement 2d upsampling layer.

Post

Sample: Object Segmentation using UNet

UNet implementation using 2d transpose conv. Car segmentation.

Post

Sample: Anime Generation using DCGAN

DCGAN implementation. Anime image generation.


Module 7 - YOLO from Scratch

In order to show we support SOTA algorithms, a object detection algorithm - YOLO will be created from scratch.
Example: YOLO

Post

YOLO Object Detection from Scratch!

An implementation of YOLOv3 using BlazeFace backbone (from MediaPipe) for fast face detection.