top of page

Welcome
to NumpyNinja Blogs

NumpyNinja: Blogs. Demystifying Tech,

One Blog at a Time.
Millions of views. 

Convolutional Neural Network (CNN) in AI landscape

Convolutional Neural Networks (CNNs) have revolutionized the field of artificial intelligence, particularly in computer vision. From facial recognition to self-driving cars, CNNs are at the core of many groundbreaking applications. This blog post will discuss about CNNs, breaking down their core concepts and showcasing their power with simple code snippets.


What Makes CNNs Special?


Traditional neural networks struggle with image data because they treat each pixel as an independent feature. This approach ignores the spatial relationships between pixels, which are crucial for understanding an image. CNNs, on the other hand, are specifically designed to capture these spatial hierarchies through a clever mechanism: convolutional layers.

Imagine looking at an image. You don't just see individual dots; you see edges, textures, and shapes that combine to form objects. CNNs mimic this by using filters (also called kernels) that slide across the image, detecting these local patterns.


CNN Architecture
CNN Architecture

Building Blocks of a CNN


1. Convolutional Layer

This is the heart of a CNN. A filter (a small matrix of numbers) slides over the input image, performing a dot product with the local pixel values it covers. The result is a feature map, which highlights where the filter's pattern was detected.


Convolutional layer
Convolutional layer

Let's illustrate with a simple example. Suppose we have a 5x5 image and a 3x3 filter:


import numpy as np

# Sample 5x5 image
image = np.array([
    [0, 0, 0, 0, 0],
    [0, 1, 1, 1, 0],
    [0, 1, 1, 1, 0],
    [0, 1, 1, 1, 0],
    [0, 0, 0, 0, 0]
])

# Simple 3x3 edge detection filter
filter = np.array([
    [-1, -1, -1],
    [ 0,  0,  0],
    [ 1,  1,  1]
])

# Output feature map for a 3x3 filter on a 5x5 image with stride 1,
# output will be 3x3

feature_map = np.zeros((3, 3))

for i in range(3):
    for j in range(3):
        # Element-wise multiplication and sum
        feature_map[i, j] = np.sum(image[i:i+3, j:j+3] * filter)

print("Original Image:\n", image)
print("\nFilter:\n", filter)
print("\nFeature Map (Output of Convolution):\n", feature_map)

In this simplified example, the filter detects horizontal edges. Different filters can detect vertical edges, corners, or more complex textures.


2. Activation Function


After a convolution operation, an activation function is applied to introduce non-linearity. There are many types of activation function like sigmoid and tanh. The most common choice is the Rectified Linear Unit (ReLU), which simply outputs the input if it's positive, and zero otherwise. This helps the network learn complex patterns.


ReLu Activation Function
ReLu Activation Function

def relu(x):
    return np.maximum(0, x)

activated_feature_map = relu(feature_map)
print("\nActivated Feature Map (ReLU):\n", activated_feature_map)

3. Pooling Layer


Pooling layers reduce the spatial dimensions of the feature maps, which helps in:

·       Reducing computational load: Fewer parameters mean faster training.

·       Controlling overfitting: By summarizing features, it makes the network more robust to small shifts or distortions in the input.

Max Pooling is the most popular type. It takes the maximum value from each patch of the feature map.


# Example of 2x2 max pooling on the activated_feature_map
# Input: 3x3, Output: 1x1 (if pooling over the entire map)
# Let's assume a 2x2 pool size with stride 2 for a more typical example

pooled_output = np.zeros((int(activated_feature_map.shape[0]/2), int(activated_feature_map.shape[1]/2)))

for i in range(0, activated_feature_map.shape[0], 2):
    for j in range(0, activated_feature_map.shape[1], 2):
        pooled_output[int(i/2), int(j/2)] = np.max(activated_feature_map[i:i+2, j:j+2])

print("\nPooled Output (Max Pooling):\n", pooled_output)

4. Fully Connected Layer


After several convolutional and pooling layers, the flattened feature maps are fed into one or more fully connected layers, similar to a traditional neural network. These layers learn to classify the high-level features extracted by the convolutional layers.


Simple CNN Architecture with Keras


Let's put it all together to define a basic CNN for image classification using Keras, a high-level neural networks API.


from tensorflow.keras import layers, models

# Define the CNN model
model = models.Sequential()

# First Convolutional Block
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3))) # 64x64 color image
model.add(layers.MaxPooling2D((2, 2)))

# Second Convolutional Block
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

# Third Convolutional Block
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Flatten the 3D feature maps to 1D vector
model.add(layers.Flatten())

# Fully Connected Layers
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax')) # Output layer for 10 classes

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

This code snippet defines a sequential model. It starts with three convolutional blocks, each consisting of a Conv2D layer followed by a MaxPooling2D layer. The Conv2D layers extract features, while the MaxPooling2D layers reduce dimensionality. Finally, the output is flattened and fed into two Dense (fully connected) layers for classification. The softmax activation in the final layer is used for multi-class classification, outputting probabilities for each of the 10 classes.


Why CNNs are so Powerful


CNNs' ability to automatically learn hierarchical features from raw pixel data makes them incredibly effective. They can detect simple features like edges and corners in early layers, and then combine these into more complex patterns like eyes, noses, or wheels in deeper layers.


Conclusion


Convolutional Neural Networks are a cornerstone of modern AI, particularly in computer vision. By understanding their fundamental layers—convolution, activation, pooling, and fully connected—you can grasp how these networks learn to interpret images with remarkable accuracy. While the underlying mathematics can be complex, the core concepts are intuitive and powerful, opening up a world of possibilities for image-based AI applications.

 
 

+1 (302) 200-8320

NumPy_Ninja_Logo (1).png

Numpy Ninja Inc. 8 The Grn Ste A Dover, DE 19901

© Copyright 2025 by Numpy Ninja Inc.

  • Twitter
  • LinkedIn
bottom of page