What is a neural network?

First, let’s brush up our knowledge about how neural networks work in general.

What’s the problem with simple NNs?

Regular artificial neural networks do not scale very well. For example, in CIFAR, a dataset that is commonly used for training computer vision models, the images are only of size 32×32 px and have 3 color channels. That means that a single fully-connected neuron in a first hidden layer of this neural network would have 32323 = 3072 weights. It is still manageable. But now imagine a bigger image, for example, 300x300x3. It would have 270,000 weights (training of which demands so much computational power)!

How does a CNN work?

A convolutional neural network, or ConvNet, is just a neural network that uses convolution. To understand the principle, we are going to work with a 2-dimensional convolution first.

What is convolution?

Convolution is a mathematical operation that allows the merging of two sets of information. In the case of CNN, convolution is applied to the input data to filter the information and produce a feature map.

Padding and striding

Before we go further, it’s also useful to talk about padding and striding. These techniques are often used in CNNs:

  • Padding. Padding expands the input matrix by adding fake pixels to the borders of the matrix. This is done because convolution reduces the size of the matrix. For example, a 5×5 matrix turns into a 3×3 matrix when a filter goes over it.
  • Striding. It often happens that when working with a convolutional layer, you need to get an output that is smaller than the input. One way to achieve this is to use a pooling layer. Another way to achieve this is to use striding. The idea behind stride is to skip some areas when the kernel slides over: for example, skipping every 2 or 3 pixels. It reduces spatial resolution and makes the network more computationally efficient.

3 layers of CNN

The goal of CNN is to reduce the images so that it would be easier to process without losing features that are valuable for accurate prediction.

  • A convolutional layer is responsible for recognizing features in pixels.
  • A pooling layer is responsible for making these features more abstract.
  • A fully-connected layer is responsible for using the acquired features for prediction.

Convolutional layer

We’ve already described how convolution layers work above. They are at the center of CNNs, enabling them to autonomously recognize features in the images.

Pooling layer

A pooling layer receives the result from a convolutional layer and compresses it. The filter of a pooling layer is always smaller than a feature map. Usually, it takes a 2×2 square (patch) and compresses it into one value.

  • Maximum Pooling. It calculates the maximum value for each patch of the feature map.
  • Average pooling. It calculates the average value for each patch on the feature map.

Fully-connected layer

The flattened output is fed to a feed-forward neural network and backpropagation is applied at every iteration of training. This layer provides the model with the ability to finally understand images: there is a flow of information between each input pixel and each output class.

Advantages of convolutional neural networks

Convolutional neural networks have several benefits that make them useful for many different applications. If you want to see them in practice, watch this thorough explanation by StatQuest.https://www.youtube.com/embed/HGwBXDKFk9I

Feature learning

CNNs don’t require manual feature engineering: they can grasp relevant features during training. Even if you’re working at a completely new task, you can use the pre-trained CNN and, by feeding it data, adjust the weights. CNN will tailor itself to a new task.

Computational efficiency

CNN, due to the procedure of convolution, are much more computationally efficient than regular neural networks. CNN uses parameter sharing and dimensionality reduction, which makes models easy and quick to deploy. They can be optimised to run on any device, even on smartphones.

High accuracy

The current state-of-the-art NNs in image classification are not convolutional nets, for example, in image transformers. However, CNNs have now been dominating for a very long time in most cases and tasks regarding image and video recognition and similar tasks. They usually show higher accuracy than non-convolutional NNs, especially when there is a lot of data involved.

Drawbacks of ConvNet

However, ConvNet is not perfect. Even if it seems like a very intelligent tool, it’s still prone to adversarial attacks.

Adversarial attacks

Adversarial attacks are cases of feeding the network ‘bad’ examples (aka slightly modified in a particular way images) to cause misclassification. Even a slight shift in pixels can make a CNN go crazy. For example, criminals can fool a CNN-based face recognition system and pass unrecognized in front of the camera.

Data-intensive training

For CNNs to showcase their magical power, they demand tons of training data. This data is not easy to collect and pre-process which can be an obstacle to the wider adoption of the technology. That is why even today there are only a few good pre-trained models such as GoogleNet, VGG, Inception, AlexNet. The majority are owned by global corporations.

What are convolutional neural networks used for?

Convolutional neural networks are used across many industries. Here are some common examples of their use for real-life applications.

Image classification

Convolutional neural networks are often used for image classification. By recognizing valuable features, CNN can identify different objects on images. This ability makes them useful in medicine, for example, for MRI diagnostics. CNN can be also used in agriculture. The networks receive images from satellites like LSAT and can use this information to classify lands based on their level of cultivation. Consequently, this data can be used for making predictions about the fertility level of the grounds or developing a strategy for the optimal use of farmland. Hand-written digits recognition is also one of the earliest uses of CNN for computer vision.

Object detection

Self-driving cars, AI-powered surveillance systems, and smart homes often use CNN to be able to identify and mark objects. CNN can identify objects on the photos and in real-time, classify, and label them. This is how an automated vehicle finds its way around other cars and pedestrians and smart homes recognize the owner’s face among all others.

Audio visual matching

YouTube, Netflix, and other video streaming services use audio visual matching to improve their platforms. Sometimes the user’s requests can be very specific, for example, ‘movies about zombies in space’, but the search engine should satisfy even such exotic requests.

Object reconstruction

You can use CNN for 3D modelling of real objects in the digital space. Today there are CNN models that create 3D face models based on just one image. Similar technologies can be used for creating digital twins, which are useful in architecture, biotech, and manufacturing.

Speech recognition

Even though CNNs are often used to work with images, it is not the only possible use for them. ConvNet can help with speech recognition and natural language processing. For example, Facebook’s speech recognition technology is based on convolutional neural networks.

Summing up

To sum up, convolutional neural networks are an awesome tool for computer vision and similar areas because of their ability to recognize features in raw data.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store