Demystifying Simple Linear Regression: A Beginner’s Guide to Machine Learning Basics
- Pranjali Srivastava
- Dec 11, 2024
- 4 min read
Machine Learning (ML) is the art of teaching machines to learn from data and make intelligent predictions or decisions without being explicitly programmed. This process involves four essential stages:
Training: In this step, massive datasets are provided as input to machine learning models. These models, fundamentally a collection of mathematical equations and computations, work to uncover patterns within the data.
Prediction: Once trained, the model is exposed to new, unseen data. It applies the learned patterns to make predictions or infer outcomes.
Evaluation: The predictions generated by the model are assessed for precision and accuracy to determine how effectively it performs.
Tuning: Finally, the model’s parameters are fine-tuned and optimized to enhance its performance, making it more reliable for future tasks.
This blog focuses on Linear Regression, a foundational algorithm in machine learning. We’ll explore how it works, its mathematical underpinnings, and its practical applications, providing you with a clear understanding of this key ML concept.
Simple Linear Regression Algorithm — The Theory
Regression is a technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. Independent variables are controlled inputs. Dependent variables represent the output or outcome resulting from altering these inputs.
A regression problem is when the output variable is a real or continuous value, such as “salary” or “weight”. This method is mostly used for forecasting and finding out cause and effect relationship between variables.
So now as the name suggests, the relationship between dependent and independent variable is expressed in a straight line.
Remember our school time mathematics classes !!!
y = mx + c
where,
y = dependent variable i.e. the variable needs to be predicted
x = independent variable i.e. the variable is the input
m = Slope of the straight line
c = Intercept constant which determines the value of y when x =0

Simple Linear Regression Algorithm — The Intuition
Let’s understand with an example where the salary is to be estimated based on the years of experience of an individual.
The straight line equation can be translated in terms of variables of salary and experience as follows —
Salary = s0 + (s1 * Years of experience)
where s0 is the lowest possible salary for someone with zero level of experience.
Below is the graph plot depicting
Red marks : Actual data i.e. salary as per the experience
Green mark : Predicted value of salary for a given experience
Black regression straight line : Best fit of the line through all the given data points

The regression model is found by using the ordinary least squares method. It takes the sum of all the squared differences between the actual value and the predicted value. The analysis requires this to be done for the many different lines that can “fit” through the data. The line that has the minimum sum of squared differences compared to the other lines is the best fitting line. The equation for this line represents your simple linear regression model which is modelled by this SLR algorithm.
Simple Linear Regression Algorithm — The Implementation in Python
In this section , we will use Python on Jupyter notebook to find the best salary for our candidate
Importing required libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd Importing dataset from “https://s3.us-west-2.amazonaws.com/public.gamelab.fun/dataset/salary_data.csv”
dataset = pd.read_csv('salary_data.csv')
x = dataset.iloc[:, :-1].values #get a copy of dataset exclude last column y = dataset.iloc[:, 1].values #get array of dataset in column 1stSplit the dataset into 2 sets : Training set for training the model and Testing set used for testing the model.
Here the dataset is divided such that 2/3rd is the training set and 1/3rd is the testing set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0) Build the Regression model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression() #training model implementing linear regression
regressor.fit(X_train, y_train) #best fit line as per the training data is estimatedNow, calculate or predict the salary of an individual having an experience of 5 years
y_pred = regressor.predict(np.array([5]).reshape(1, 1)) Executing the above code will give below output

Thus, you can offer your candidate the salary of Rs. 73,545.90 and this is the best salary for him!
In summary, there are couple of predominant steps that need to be followed for Simple linear regression model:
Importing datasets
Splitting dataset into training and testing set. Normally 20% of dataset should be used for testing
Initialize the regression model
Fit the model using the training set
Finally happy predicting !!!
Simple Linear Regression Algorithm — The Practical Application
Predicting house prices based on single feature like size
Sales forecasting based on advertisement expenditure
Temperature forecasting based on factors like temperature, time of day or altitude
Predicting patient health outcomes based on relevant factors
Energy consumption based on weather conditions


