LeNet in depth

Musstafa
6 min readMar 25, 2021

LeNet was the first architecture in modern CNN introduced in 1998 by Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner n their paper, Gradient-Based Learning Applied to Document Recognition. It is a very efficient convolutional neural network for handwritten character recognition.

Architecture of LeNet:

It has a 7 layer architecture, among which there are 3 Convolutional layers (C1, C3 and C5), 2 Sub-Sampling (Pooling) layers (S2 and S4), and 1 Fully Connected layer (F6), that are followed by the output layer.

Architecture in detail

The first layer is input layer with feature map size 32*32.

NOTE: Traditionally, the input layer is not considered as one of the network hierarchy.

First Layer [Convolutional Layer (C1)]:

The input size = 32 * 32

Number of filter = 6

Kernel size = 5 * 5

Stride = 1

Feature map size: To calculate feature map size we use the formula -:

[(Input Size -Kernel size) / Stride] + 1

Feature map size: [(32–5) / 1] + 1 = 28

Number of Neurons = Feature map size * Number of filters = 28 * 28 * 6

Trainable parameters: Weight + Bias = F * F * nc^l-1 * nc^l + nc^l

Trainable parameters = 5 * 5 * 1 * 6 + 6 = 156

(5 * 5 = 25 unit parameters and one bias parameter per filter, a total of 6 filters)

Total Connections: Trainable parameters * final dimensions

Total Connections: (5 * 5 + 1) * 6 * 28 * 28 = 122304

Description: The input for LeNet is a 32 * 32 grayscale image which passes through the first convolutional layer with 6 feature maps or filters having size 5×5 and a stride of 1. The image dimensions changes from 32 * 32 * 1 to 28 * 28 * 6.

Second Layer [Pooling Layer (S2)]:

The input size = 28 * 28

Number of filters = 6

Kernel size = 2 * 2

Stride = 2

Feature map size: To calculate feature map size we use the formula -:

[(Input Size -Kernel size) / Stride] + 1

Feature map size: [(28 –2) / 2] + 1 = 14

Number of Neurons = Feature map size * Number of filters = 14 * 14 * 6

Trainable parameters: (Coefficient + Bias) * filters = (1 + 1) * 6

Trainable parameters = 12

Total Connections: Trainable parameters * final dimensions

Total Connections: (2 * 2 + 1) * 6 * 14 * 14 = 5880

The size of each feature map in S2 is 1/4 of the size of the feature map in C1.

Description:

  • The pooling operation is followed immediately after the first convolution. Pooling is performed using 2 2 kernels, and S2, 6 feature maps of 14 14 (28/2 = 14) are obtained.
  • The pooling layer of S2 is the sum of the pixels in the 2 * 2 area in C1 multiplied by a weight coefficient plus an offset, and then the result is mapped again.
  • So each pooling core has two training parameters, so there are 2x6 = 12 training parameters, but there are 5x14x14x6 = 5880 connections.

Third Layer [Convolutional Layer (C3)]:

The input size = 14* 14

Number of filters = 16

Kernel size = 5 * 5

Stride is 1

Feature map size: To calculate feature map size we use the formula -:

[(Input Size -Kernel size) / Stride] + 1

Feature map size: [(14–5) / 1] + 1 = 10

Number of Neurons = Feature map size * Number of filters= 10* 10* 16

Trainable parameters: Weight + Bias = F * F * nc^l-1 * nc^l + nc^l

Trainable parameters: (5 * 5 * 6* 10) + 16 = 1516

Total Connections: Trainable parameters * final dimensions

Total Connections: 10 * 10 * 1516 = 151600

Description:

The second convolution layer with 16 filters of 5 * 5 and stride 1. Also, the activation function is tanh. Now the output size is 10 * 10 * 16

Fourth Layer [Pooling Layer(S4)]:

The input size = 10 * 10

Number of filters = 16

Kernel size = 2 * 2

Stride = 2

Feature map size: To calculate feature map size we use the formula -:

[(Input Size -Kernel size) / Stride] + 1

Feature map size: [(10 –2) / 2] + 1 = 5

Number of Neurons = Feature map size * Number of filters= 5 * 5 * 16

Trainable parameters: (Coefficient + Bias) * filters = (1 + 1) * 16

Trainable parameters = 32

Total Connections: Trainable parameters * final dimensions

Total Connections: 5 * 5 * 5 * 16= 2000

Description:

  • S4 is the pooling layer, the window size is still 2 * 2, a total of 16 feature maps, and the 16 10x10 maps of the C3 layer are pooled in units of 2x2 to obtain 16 5x5 feature maps. This layer has a total of 32 training parameters of 2x16, 5x5x5x16 = 2000 connections.

Fifth layer [Convolutional Layer (C5)]:

The input size = 5 * 5

Number of filters = 120

Kernel size = 5 * 5

Stride = 1

Feature map size: To calculate feature map size we use the formula -:

[(Input Size -Kernel size) / Stride] + 1

Feature map size: [(5–5) / 1] + 1 = 1

Number of Neurons = Feature map size * Number of filters= 1 * 1 * 120

Trainable parameters: Weight + Bias = F * F * nc^l-1 * nc^l + nc^l

Trainable parameters: (5 * 5 * 16* 120) + 120 = 48120

Description:

The C5 layer is a convolutional layer. Since the size of the 16 images of the S4 layer is 5x5, which is the same as the size of the convolution kernel, the size of the image formed after convolution is 1x1. This results in 120 convolution results. Each is connected to the 16 maps on the previous level. So there are (5x5x16 + 1) x120 = 48120 parameters, and there are also 48120 connections.

Sixth layer [Fully Connected Layer (C6)]:

Kernel size = 84

Trainable parameters: Weight + Bias

Trainable parameters (120 * 84) + 84 = 10164

Description:

Layer 6 is a fully connected layer. The F6 layer has 84 nodes. The training parameters and number of connections for this layer are (120 + 1) x84 = 10164.

Output Layer:

At the end there is a fully connected softmax output layer ŷ with 10 possible values corresponding to the digits from 0 to 9 and if the value of node i is 0, the result of network recognition is the number i.

Trainable parameters (120 * 84) + 84 = 10164

A radial basis function (RBF) network connection is used. Assuming x is the input of the previous layer and y is the output of the RBF, the calculation of the RBF output is:

The closer the value of the RBF output is to 0, the closer it is to i, that is, the closer to the ASCII encoding figure of i, it means that the recognition result input by the current network is the character i. This layer has 84x10 = 840 parameters and connections.

Implementing LeNet with Keras

import keras
from keras.datasets import mnist
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Flatten
from keras.models import Sequential

# Loading the dataset and perform splitting
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Peforming reshaping operation
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

# Normalization
x_train = x_train / 255
x_test = x_test / 255

# One Hot Encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Building the Model Architecture
model = Sequential()

model.add(Conv2D(6, kernel_size=(5, 5), activation=’relu’, input_shape=(28, 28, 1)))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(16, kernel_size=(5, 5), activation=’relu’))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(120, activation=’relu’))

model.add(Dense(84, activation=’relu’))

model.add(Dense(10, activation=’softmax’))

model.compile(loss=keras.metrics.categorical_crossentropy, optimizer=keras.optimizers.Adam(), metrics=[‘accuracy’])

model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=1, validation_data=(x_test, y_test))

score = model.evaluate(x_test, y_test)

print(‘Test Loss:’, score[0])

print(‘Test accuracy:’, score[1])

Output:

--

--