Home » Artificial Intelligence (AI) » Overfitting and Underfitting in Machine Learning Algorithms

Overfitting and Underfitting in Machine Learning Algorithms

You might have heard of overfitting and underfitting many times when testing the machine learning model. So let’s understand each of them.

Consider we have a course  which includes syllabus, students and examination.

Think of syllabus as features or independent variables of dataset, content of syllabus as training dataset, student as model and examination as test dataset in machine learning.

CASE 1:- When a student prepared well from the given syllabus and achieved good marks in the training dateset, we say the student performed well. 

Machine Learning:- When the training metrics are giving good scores as well as test metrics have good or near scores to training metrics, we say the model has good generalization capabilities on testing data.

CASE 2:- When the student prepared well but in examination he failed or scored bad scores, we can say the student is not performing well.

Machine Learning:- When the training metrics are giving good scores but the testing metrics are worst, then this is an OVERFITTING.

CASE 3:- When a student didn’t even prepare the syllabus and not achieved good scores in examination, we say the student’s performance is poor. 

Machine Learning:- When the both training metrics as well as test metric  score both are worst, we  say model is UNDER FITTED.

CASE 4:- When a student prepared all the syllabus, but on the day of exam or before that someone gave him/her the questions which are supposed to be asked in the next day examination, in this case the student will definitely score good scores on the next day exam. But when a new test paper is given to him he may perform better compared to the last exam. This is called leakage of question paper.

Machine Learning:- When some of the target variables of the testing set  has been exposed to the target variable such scenario is known as data leakage. It happens due to 2 main reasons

  1. Target or label leakage
  2. Train and test set contamination.

Data leakage is itself a big concept which can be taken in a separate article.

Join DataMites for Machine Learning Courses.

About DataMites Team

DataMites Team will publish articles on various topics like data science, machine learning, artificial intelligence, deep learning, python programming, statistics, DataMites® press releases and career guidance.

One comment

  1. Thank you for sharing the information about Overfitting and Underfitting in Machine Learning Algorithms. It is impressive and has good content. Keep sharing.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

x

Check Also

What is the Salary for Python Developer in India

What is the Salary for Python Developer in India?

Python is leading the way in programming, which is the future of the planet. Its popularity is increasing tremendously with each passing year. Python is ...

Is Data Science and Artificial Intelligence in Demand in South Africa?

Is Data Science & Artificial Intelligence in Demand in South Africa?

According to the Economic Complexity Index, South Africa was the world’s number 38 economy in terms of GDP (current US$) in 2020, number 36 in ...