Demystifying Simple Linear Regression: A Beginner’s Guide to Machine Learning Basics

Pranjali Srivastava
Dec 11, 2024
4 min read

Machine Learning (ML) is the art of teaching machines to learn from data and make intelligent predictions or decisions without being explicitly programmed. This process involves four essential stages:

Training: In this step, massive datasets are provided as input to machine learning models. These models, fundamentally a collection of mathematical equations and computations, work to uncover patterns within the data.
Prediction: Once trained, the model is exposed to new, unseen data. It applies the learned patterns to make predictions or infer outcomes.
Evaluation: The predictions generated by the model are assessed for precision and accuracy to determine how effectively it performs.
Tuning: Finally, the model’s parameters are fine-tuned and optimized to enhance its performance, making it more reliable for future tasks.

This blog focuses on Linear Regression, a foundational algorithm in machine learning. We’ll explore how it works, its mathematical underpinnings, and its practical applications, providing you with a clear understanding of this key ML concept.

Simple Linear Regression Algorithm — The Theory

Regression is a technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. Independent variables are controlled inputs. Dependent variables represent the output or outcome resulting from altering these inputs.

A regression problem is when the output variable is a real or continuous value, such as “salary” or “weight”. This method is mostly used for forecasting and finding out cause and effect relationship between variables.

So now as the name suggests, the relationship between dependent and independent variable is expressed in a straight line.

Remember our school time mathematics classes !!!

y = mx + c

where,

y = dependent variable i.e. the variable needs to be predicted

x = independent variable i.e. the variable is the input

m = Slope of the straight line

c = Intercept constant which determines the value of y when x =0

Graphical representation of Linear Regression

Simple Linear Regression Algorithm — The Intuition

Let’s understand with an example where the salary is to be estimated based on the years of experience of an individual.

The straight line equation can be translated in terms of variables of salary and experience as follows —

Salary = s0 + (s1 * Years of experience)

where s0 is the lowest possible salary for someone with zero level of experience.

Below is the graph plot depicting

Red marks : Actual data i.e. salary as per the experience

Green mark : Predicted value of salary for a given experience

Black regression straight line : Best fit of the line through all the given data points

Graphical representation of data points and prediction value in SLR

The regression model is found by using the ordinary least squares method. It takes the sum of all the squared differences between the actual value and the predicted value. The analysis requires this to be done for the many different lines that can “fit” through the data. The line that has the minimum sum of squared differences compared to the other lines is the best fitting line. The equation for this line represents your simple linear regression model which is modelled by this SLR algorithm.

Simple Linear Regression Algorithm — The Implementation in Python

In this section , we will use Python on Jupyter notebook to find the best salary for our candidate

Importing required libraries

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd

Importing dataset from “https://s3.us-west-2.amazonaws.com/public.gamelab.fun/dataset/salary_data.csv ”

dataset = pd.read_csv('salary_data.csv') 
x = dataset.iloc[:, :-1].values #get a copy of dataset exclude last column y = dataset.iloc[:, 1].values #get array of dataset in column 1st

Split the dataset into 2 sets : Training set for training the model and Testing set used for testing the model.

Here the dataset is divided such that 2/3rd is the training set and 1/3rd is the testing set

from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0)

Build the Regression model

from sklearn.linear_model import LinearRegression 
regressor = LinearRegression() #training model implementing linear regression
regressor.fit(X_train, y_train) #best fit line as per the training data is estimated

Now, calculate or predict the salary of an individual having an experience of 5 years

y_pred  = regressor.predict(np.array([5]).reshape(1, 1))

Executing the above code will give below output

Thus, you can offer your candidate the salary of Rs. 73,545.90 and this is the best salary for him!

In summary, there are couple of predominant steps that need to be followed for Simple linear regression model:

Importing datasets
Splitting dataset into training and testing set. Normally 20% of dataset should be used for testing
Initialize the regression model
Fit the model using the training set
Finally happy predicting !!!

Simple Linear Regression Algorithm — The Practical Application

Predicting house prices based on single feature like size
Sales forecasting based on advertisement expenditure
Temperature forecasting based on factors like temperature, time of day or altitude
Predicting patient health outcomes based on relevant factors
Energy consumption based on weather conditions

Welcome
to NumpyNinja Blogs

Demystifying Simple Linear Regression: A Beginner’s Guide to Machine Learning Basics

Simple Linear Regression Algorithm — The Theory

Simple Linear Regression Algorithm — The Intuition

Simple Linear Regression Algorithm — The Implementation in Python

Simple Linear Regression Algorithm — The Practical Application

Recent Posts

Welcome to NumpyNinja Blogs

Simple Linear Regression Algorithm — The Theory

Simple Linear Regression Algorithm — The Intuition

Simple Linear Regression Algorithm — The Implementation in Python

Simple Linear Regression Algorithm — The Practical Application

Welcome
to NumpyNinja Blogs