Artificial Intelligence is different from all the other “old school” regular computer science. This is because the concepts on which technology used to work traditionally has completely changed. Earlier, a code was fed into the machine and the machine could give output of the same data that was fed. But now, the machine has started learning! It gives you predictions on the basis of the data it has been trained on before(called the training data) and when the other data(called the test data) is provided, it gives you predictions! Wait! Am I trying to say that machines can have neurons like humans do? Well, sort of. Moreover, the chat-bots, language translation and speech recognition uses Natural Language Processing which is applied using Deep Learning.

Convolutional Neural Networks (CNN)

Deep Learning functions from a Neural Network. A Neural Network is very similar to humans’ brain cells because it learns from its own mistakes, with experience (yes, just like some humans). Given below is the structure of neural networks. Do not, I repeat, do not under- estimate this simple looking structure. 

Fig. Neural Networks in Deep Learning

The structure shown above is the basis of an unimaginable future we’re going to experience-where we get up,robots are ready with our favorite breakfast and when we sit in the car, it exactly knows where your office is(without you mentioning it). This might seem unreal, but who imagined in the 1800’s that an elevator could work without someone manually pulling it?

Fig. Convolutional Neural Network

Basically, a CNN image classifier takes an input image, processes it and classifies it under certain categories. The CNN models take an image and pass it through a series of convolution layers with filters, various pooling operations, fully connected layers and then apply Softmax Function to classify the object with a probability between 0 and 1.

Image Classification

One of the major applications of Deep Learning is Image Recognition and Classification. It has a variety of uses. For example, automation of CAPTCHA cracking and reading number plates of car. Further, there are many other applications to it-Object detection and Semantic Segmentation for self driving cars, Image Analysis for the health sector and Face recognition used in various apps these days. In this blog, we’ll be discussing about Image Classification. 

We’ll use the MNIST data set. It has 70,000 images, out of which 60,000 are training images and 10,000 are testing ones. We’ll be using Google Colaboratory for coding. So, let’s start!

Importing the necessary libraries

We’ll start by importing the necessary libraries.Torch is an open-source Machine Learning library and the nn package is used for building neural networks. torchvision is Pytorch’s computer vision package. Finally, Matplotlib is a widely used library for data visualization purposes.


Working with CUDA

CUDA is a parallel computing platform and programming model. It helps programmers to speed up computation-intensive applications by utilizing the power of GPUs because GPU is a hardware accelerator.


Preparing the Dataset

Firstly, we’ll download the data set. Secondly,

we’ll make some transformations on it. Compose converts the images into numbers so that our models understand it. Resize resizes the input PIL image to a given size. ToTensor() converts an input image into a tensor. Basically, a tensor can be understood as the data structure of Pytorch. Normalise will normalise the tensor image with mean and standard deviation (the formulae of which is shown below). Here, both mean and standard deviation are 0.5.

NORMALIZATION: output[channel]=(input[channel] – mean[channel]) / std[channel]


Defining the Dataloaders:

The basic task of a data loader is to combine a data-set and a sampler. As a result, it provides single/ multi process iterators over the data set. Here, we’ll pass 3 parameters(out of which batch_size and shuffle) are optional. 

  • It is mandatory to pass the data set from which the data is loaded. 
  • The number of samples per batch to load is defined by the batch size. 
  • Shuffle is a boolean operator which determines if the data has to be reshuffled.
  • Drop_last is a very crucial parameter here. We are dropping the last batch as we have only 16 images in the last batch and this would create a problem in defining the shape in the training loop(refer point 10). 

Iterating through the batch

It is important to have a function to get a single batch from a DataLoader. This would prevent us from setting up a loop and returning the batches manually. We have next(inter(Dataloader)) for this. The shape of the image is 28*28. When we print images.shape, we get 64*1*28*28 which means we have a batch_size of 64 and each image has dimensions of 28*28. 



torch.Size([1, 28, 28])

Visualizing the dataset

We’ll now visualize the data set using Matplotlib to get more familiar with it.


Creating the Convolutional Neural Network Structure

In this step, we’ll construct the network that will be used to train our model. This is a very crucial step. Conv2d applies a 2D convolution over an input signal composed of several input planes. Firstly, we’ll notice that the out_channels and out_features in one step are the in_channels and in_features respectively of the next layer. Subsequently, in the forward method(as the name suggests), input is fed in the forward direction, through the hidden layers and the activation function, yielding an output. Out of many activation functions(tanh,sigmoid,exponential linear unit etc.), we’ll use the ReLu(Rectified Linear Unit) because it is the most popular one used and gives reliable results.


Creating an object

Now, we’ve created an object of the class Network. An object is an instance of the class.


Defining Loss, Learning rate and Optimizer

Cross-entropy loss(also called log loss), measures the performance of a classification model and outputs a probability between 0 and 1. Cross-entropy loss increases as the prediction probability diverges from the actual label(thus we aim to have a low value of Cross-entropy loss). Adam is an optimization algorithm for Stochastic Gradient Descent. After that, you might be thinking why not set a learning rate which is very high so that the model learns very fast? No, this is not the case. This is because it will have undesirable divergence in the loss function. And if it is very low, it will converge very slowly/not converge as we’ll be making very little change in our weights during the training process.


Training the network

All these steps mentioned above are so that we can execute this step-TRAINING and we’ll train our model on the training data set(with the help of trainloader). As we’ve set the number of epoch as 5(epoch is the fancy name for passing the data set through the neural network forward and backward once),the accuracy increases and the loss decreases for each epoch and the neural network learns more. We append loss,iterations and accuracy after every epoch in a list so that we can plot them later. By using the optimizer.zero_grad()function,we are zeroing all the gradients as the backward() function accumulates the gradients and we don’t want to mix the gradients between batches. Finally, the values of the gradients are updated using optimizer.step(). On testing our model on the test set(using the testloader), we observe that as the model learns during each epoch,the accuracy increases and the loss decreases.


In image below, with every iteration, Loss decreases and Accuracy increases.


Iteration: 500, Loss: 0.07350127398967743, Accuracy: 95%
Iteration: 1000, Loss: 0.08385323733091354, Accuracy: 96%
Iteration: 1500, Loss: 0.04026545211672783, Accuracy: 97%
Iteration: 2000, Loss: 0.02583783119916916, Accuracy: 98%
Iteration: 2500, Loss: 0.028324954211711884, Accuracy: 98%
Iteration: 3000, Loss: 0.1509086787700653, Accuracy: 98%
Iteration: 3500, Loss: 0.031392671167850494, Accuracy: 98%
Iteration: 4000, Loss: 0.015405401587486267, Accuracy: 98%
Iteration: 4500, Loss: 0.00860094279050827, Accuracy: 98%

Visualizing the loss and accuracy

In addition to the above work, it is important to make things more visible. Therefore, we are going to plot the graph of No. of Iteration vs Loss and No. of Iteration vs Accuracy.


Checking the network ourselves

We would be able to connect with the model better when we ourselves are able to check if it is predicting right. For this purpose, we’ll check what our model is evaluating. We’ll pass the images from test set to the trained model. Above all, try this on your own with different inputs to visualize better. Certainly, the model is working quite well! We’ve finally built a digit recognizer.



As shown above, Pytorch is very easy to work with. In addition,it is extremely powerful. It took a lot of research,reading and struggle before I was able to make this. MNIST is an easy data set to begin with but is a very important one to clear your concepts(as a beginner). In conclusion, I hope you enjoyed this blog and it contributed something to your learning.



You can download the notebook for this code from: