Convolutional Neural Networks perform great as feature extractors, especially in images. They scan through the images and select appropriate features necessary for the model like a charm. This greatly helps the models that are unable to perform well on datasets having huge sets of features by extracting only the important ones and reducing the input set for such models.

In this article, we will develop a convolutional neural network for a handwritten digit classification dataset using the TensorFlow framework. 

First we will import the required packages to build our model.

Python

Before we start building, let’s know something about data. The data we are going to use is MNIST(Modified National Standards and Technology dataset). It is a huge data set of pictures of handwritten digits with their corresponding labels. We can download the dataset from Kaggle but as it is a standard dataset the Tensorflow module already has a method that can return the MNIST data. We can load the data from tensorflow.keras.datasets.

Python

As we can see that we have 60000 training data samples having a shape of 28×28. From this, we can infer that they are single-channel images i.e. black and white images. Corresponding to each training data sample we have its label in the train_labels array. Similarly, we have 10000 samples of testing data and its corresponding labels as test_labels array.

It is always a best practice to analyze the data before we start building our model and start training with it. Let’s look at the distribution of all the labels in our data.

Python

From this, we can see that the data is quite well distributed and not biased towards any particular label. Hence we don’t have to worry about finding data or performing data augmentation.

Now we can plot our data with their corresponding labels and see how our data looks like

Python

But there’s a slight problem, as we know that we have an image MNIST data and it contains single-channel images but the channel dimension is missing in the array(samples, height, width), which may lead to some problems while training the model. Also the values of an image sample range from 0 to 255. For different images, we will have a wide range of distribution of values that can reduce the learning capacity of our model and we want all the data to have similar scaling and distribution. So to tackle all these problems we will first reshape the data by adding one more dimension and normalize the data to scale it down from [0,255] to [0,1]. 

Python

Now our training data and testing data are ready. As of now, the labels are directly representing the numbers like for the image of five we have 5, for the image of seven we have 7. To vectorize it we will one hot encode it. So that for a label 5 the corresponding vector will be [0,0,0,0,0,1,0,0,0,0]. To do so we have a method in TensorFlow called “one_hot” which can be used as tf.one_hot(indices, depth). Depth is a scalar representing the no of classes  and indices the data that we will encode..

By doing so we will have our vectorized labels of shape (number of samples, 10)

Python

Now that we have our data completely ready, let’s start building our CNN classification model. First, we need to design the model where we can see the dataflow and how the shape changes with each layer, what activation functions to use, and for regularization where and how much we should consider the dropout. Look at the figure below you’ll see how each layer is connected with their output shape and activation function mentioned. This is the one we are going to implement. 

The Tensorflow module has prebuilt methods for defining all the layers in the figure. Let’s understand them one by one,

First, we have the Input Layer that takes in the input data with specified shape passed as a parameter and transfers it to the next layer. Next is the Conv2D layer which creates a convolutional filter that is convolved with the previous layer’s output. The arguments we will be using are the filters, kernel size, and the activation function. The filters parameter specifies the number of output filters the convolution will have. The kernel_size is a list or a tuple of two integers that specify the shape of the convolution window. Then we have activation which is the activation function of that layer. Here we have used ‘relu’ as it performs very well with Convolution operations. You should also try with different activation functions and compare the results. There are also other parameters like strides and padding which specify by what amount the convolution window slides over and the extra empty array around the input to keep information during convolution respectively. The output of the first Conv2D layer is connected to another Conv2D layer which further applies a convolution and passed to the next layer which is MaxPooling2D.

The MaxPooling2D layer finds the maximum value from the pool size(window) specified. Here we specify the size of the window from which the maximum value to be chosen by passing it as a parameter. We can also adjust the stride and padding. By default, it takes “none” as stride and “valid” as padding. We apply a dropout layer to regularize the model by randomly setting input units to 0, so that the model doesn’t overfit. In the dropout layer, we pass a parameter rate which determines the fraction of the units to drop. With this the convolution layers are complete and with the features extracted from the input, we pass it to the fully connected layer for classification. But wait, right now the dimensions that we have are not acceptable for a fully connected layer as it accepts only a single dimensional vector. So for this, we apply a Flatten layer whose job is to convert the multidimensional tensors to a single dimension. As you can see the model diagram the output of MaxPooling2D is of shape (12,12,64) so flattening it will return a shape of 12*12*64 = 9216. 

Now for classification of the features extracted we are going to use the Dense Layer which is a fully connected layer. For the first 2 Dense Layers ‘relu’ is used as the activation function and for the last layer, which is the output layer a ‘softmax’ activation function is used. Because, as we have a multi-class classification problem we need an activation function that returns the probability distribution of the classes.

Alright now let’s define a function that will return our model. 

Python

In the code above the ‘Model’ method groups layers into an object and gives them training and inference features. Here the input to the model is ‘input_layer’ and the output of the model is ‘dense_layer_10’.

Now that the create_model function is ready, let’s call the create_model function and build our model. After that we will set the loss function and the optimization algorithm to use for the training of the model. As we know it is a multi-class classification model, we are going to use the categorical cross entropy loss. To see how our model performs during training we will add a parameter metrics  and set it to ‘accuracy’. After that our model building is complete. So to view the complete details of the model we’ll be calling the summary() method. 

Python

Now we are all set to start the training of the model. Before that we will define the number of batches per iteration which helps us to train faster by taking multiple training samples at a time. Also we will define the number of epochs for the training of our model. Then we will ‘fit’ the training data and training labels. For validation we will also pass the test data and test labels to the parameter ‘validation_data’.

Python

Great! Now our model is trained. So from the above output we see that at the tenth epoch our model achieved a training accuracy of 0.9864 out of 1 with a loss of 0.05. Which is great but what if it may be due to overfitting?

To check we evaluate the model with the unseen data, the test data.

Python

Now that’s really awesome. Our CNN model not only performs well in training data but also performs great in testing data. Meaning there is not overfitting.Also let’s look into how the loss and accuracy of our model change through each epoch. It gives an idea of whether we need to train for more epoch or the model is saturated.

Python

From the plots we can clearly figure out that at the initial epochs, our model performs terribly and as time passes it becomes more powerful and minimizes the loss and increases the accuracy and flattens the plot. Now with that we don’t have to train any further.

We definitely would like to use our model to predict some outputs. But our model returns an output vector shape 10 and the result that we want is a number just like the labels given in the dataset. To do so we have to write a function that takes the predicted vector and returns the max value from all the 10 elements. We do it because the output vector is the result of a softmax activation function in the last layer of our model, which returns the probability distribution of elements of the model. So the one having the highest probability will be our predicted output.

Python

Now that is outstanding, our model perfectly predicted the input data. With such results we surely want to save our model so that the next time when we use it we don’t have to train it again.

To do so we will just use,

Python

That will save our model. Now if you want to load the trained model, you can simply create a model instance and load the weights by using the load_weights(‘path’) method. And then using that you can predict or train it further from pre-trained weights if there’s a scope of improvement.

Python

With this we have successfully developed a Convolutional Neural Network model to classify the MNIST images. Now you can go for many image classification datasets available online and apply CNN. Also you can compare the performance of various model designs and see which one performs best.