Transfer learning - Easiest Explanation for Beginners in 2020!

by - October 29, 2019

Transfer learning...? You’ve only heard about it right? You’ve seen “transfer learning deep learning”, “transfer learning tensorflow”, “transfer learning tutorial keras” all over the internet but don’t have a clear picture. Welcome.


Ever created a Deep Neural Network (or specifically, Convolutional Neural Network)? It’s a complex and time-consuming process, right? And due to that, ever wondered how things are done in real-life scenarios, where high accuracies are required in low computational resources and less time?


If you’ve ever created a deep learning model, you know how long it takes to train it on a GPU let alone CPU! Good Deep learning models are computationally expensive and take a lot of time. I personally trained a DL model for 4 days straight on my laptop with GPU, the craziest part is, it still wasn’t trained completely.
So the question is, how do we train Big Deep Learning models on our petty and cute machines? The answer is simple, Transfer Learning.
Again, let’s see what Miss Wikipedia has to say,

Transfer learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.

Well, if you’ve been reading my blogs, you know the drill. Let’s make this definition simpler to understand.



We, humans, have come up with ways to transfer our knowledge between tasks. We tend to recognize and apply knowledge gained from the previous experience/task to a similar kind of task that we encounter. The more the similarity in the two tasks, the more and better knowledge we can apply. Isn’t that just true?


For example, if you know how to ride roller-skates, it gets easier to learn Skateboard. If you know how to ride a bicycle, it gets easier to learn to drive a bike. And for all the techies out there, if you know one programming language, it gets easier to learn a completely new one with completely different syntax! I personally experienced that ;)


What do we observe in the above scenarios? C’mon, we’re machine learners here, you gotta find the pattern… we don’t start learning new things completely from scratch, we’ve somewhat basic knowledge of how to perform that new task because we had a kind of experience learning a similar task before. This saves us a lot of time and energy.
The same goes for Transfer Learning in Deep Learning.
If you ask me to explain Transfer Learning in one word, I would say it means ‘a head start’.




Traditionally, what do we do exactly?
  • We create a Network Structure.
  • We assign some random parameters (weights and biases).
  • We iterate and optimize those parameters to increase accuracy.
So tell me, what process takes time? And computational resources? Yes, optimization!


What if I tell you, you don’t have to optimize any parameter, in fact, you don’t even have to create any model or … wait for it …basically perform any of the above steps!
Imagine a vast, high performing model is made for you, the parameters are already optimized for you and that too using big datasets, you just have to use it “right away”! Well, this is what I meant by ‘Head Start’. Well, not “right away”, we still need to tune a few things.


And that is the power of TRANSFER LEARNING.


People/Researchers have created several Networks that are very vast (generally) and are trained on various huge datasets like ImageNet. And the result of that is that the network has learned very complex features/parameters are very highly optimized and because of being trained on huge datasets, they’re more flexible to be generalized. Basically, the same parameters can be used again on a similar kind of dataset because the features will somewhat be the same. Won’t they?

How to use Transfer Learning?

There are multiple ways you can use Transfer Learning. It basically depends on the size of your data, that’s all.


1. Short On Data?

Transfer Learning got you covered. Do not worry.
If you don’t have a vast amount of data, what you can do is to just leave most of the layers of the model as it is and tune the last layers of the model according to your needs i.e freezing training of the layers and only training the fine-tuned layers. This will ensure your model doesn’t overfit and also most of the pre-trained parameters can be used. And Transfer learning is mostly used like this only.

2. Oh, so your grandpa left you the Data?

Well if you’re rich with data, there are again two ways you can go with!
  • Re-train from Scratch: This, as the name suggests, is nothing but retraining the whole model from scratch, i.e from randomly initialized parameters! I understand your doubt …how is this transfer learning in that case and how is it even helpful?
Well, if you’ve got a huge database, training from scratch will ensure that the model is optimized specially for that one special dataset of yours. And moreover, the Network Structure you’ll be using is already very carefully created to give high accuracy. So, re-training from Scratch really does work. Though Transfer learning this way will take time and computational resources, but will also give a very high accuracy!


  • The recommended way: Everybody wants to know the best possible way. Which in the case of Transfer learning is, not to re-train from scratch, but re-train some deep layers.
What I mean with that is, recall what we did when we had a small dataset, we left most of the layers in their original condition and fine-tuned the last layers. But now that we’re rich (yeahh!), we can now afford to train more layers than before and put Transfer learning to some good use.
So, what is done is, few of the layers in the beginning (also called shallow layers) are left as they are, pre-optimized. And we train a few of the deep layers again so that they get optimized according to our dataset better.


Why do we do this? The answer is simple. See, the shallow layers of most of the CNNs detect the edges, canny edges and few other very basic features that can be very easily generalized. Specific features (according to the dataset) are extracted in the deeper layers. Now did you get it?
Though, there’s one thing that we HAVE to do! We have to manipulate the output layer to be in sync with our data (like the number of classes, ImageNet has 1000, your data might have just 2. Just imagine the Fiasco if we don’t do it!


I know it can be hard to believe how Transfer Learning can even work! Trust me I was in awe too when I first heard about this process, but Transfer learning does work. And not only just work, but Transfer learning also works remarkably well. And one explanation of why Transfer learning works so well can be that it skips the time taken in back-propagation and gradient descent i.e the time taken for updating the weights and biases to reach the optimum values. Second, as the networks are trained on huge datasets, they’ve got a vast knowledge base and thus works amazingly in our personal datasets given that our dateset is somewhat similar to the data it was trained on. Like if the model was trained on car images, Transfer learning can make it  work on truck images pretty well. I hope you got the intuition correct by now.


Well, this is all there is to know about Transfer Learning. Nowadays, Transfer Learning is used almost everywhere in industries. Nobody (except Researchers) tries to create a network from scratch, instead, fall back on big networks that are already made some of which are, ResNets, VGG version, AlexNet, ConvNet and many more. It’s really fun to use them.


So why wait more, go on and create your first Transfer Learning project!



More Transfer Learning Resources to refer to:


Machine Learning Books (Very Popular):

You May Also Like

2 comments