top of page

Welcome
to NumpyNinja Blogs

NumpyNinja: Blogs. Demystifying Tech,

One Blog at a Time.
Millions of views. 

Demystifying Simple Linear Regression: A Beginner’s Guide to Machine Learning Basics


Machine Learning (ML) is the art of teaching machines to learn from data and make intelligent predictions or decisions without being explicitly programmed. This process involves four essential stages:

  • Training: In this step, massive datasets are provided as input to machine learning models. These models, fundamentally a collection of mathematical equations and computations, work to uncover patterns within the data.

  • Prediction: Once trained, the model is exposed to new, unseen data. It applies the learned patterns to make predictions or infer outcomes.

  • Evaluation: The predictions generated by the model are assessed for precision and accuracy to determine how effectively it performs.

  • Tuning: Finally, the model’s parameters are fine-tuned and optimized to enhance its performance, making it more reliable for future tasks.

This blog focuses on Linear Regression, a foundational algorithm in machine learning. We’ll explore how it works, its mathematical underpinnings, and its practical applications, providing you with a clear understanding of this key ML concept.


Simple Linear Regression Algorithm — The Theory


Regression is a technique for determining the statistical relationship between two or more variables where a change in a dependent variable is associated with, and depends on, a change in one or more independent variables. Independent variables are controlled inputs. Dependent variables represent the output or outcome resulting from altering these inputs.

A regression problem is when the output variable is a real or continuous value, such as “salary” or “weight”. This method is mostly used for forecasting and finding out cause and effect relationship between variables.

So now as the name suggests, the relationship between dependent and independent variable is expressed in a straight line.

Remember our school time mathematics classes !!!

y = mx + c

where,

y = dependent variable i.e. the variable needs to be predicted

x = independent variable i.e. the variable is the input

m = Slope of the straight line

c = Intercept constant which determines the value of y when x =0


Graphical representation of Linear Regression
Graphical representation of Linear Regression

Simple Linear Regression Algorithm — The Intuition


Let’s understand with an example where the salary is to be estimated based on the years of experience of an individual.

The straight line equation can be translated in terms of variables of salary and experience as follows —

Salary = s0 + (s1 * Years of experience)

where s0 is the lowest possible salary for someone with zero level of experience.

Below is the graph plot depicting

Red marks : Actual data i.e. salary as per the experience

Green mark : Predicted value of salary for a given experience

Black regression straight line : Best fit of the line through all the given data points


Graphical representation of data points and prediction value in SLR
Graphical representation of data points and prediction value in SLR

The regression model is found by using the ordinary least squares method. It takes the sum of all the squared differences between the actual value and the predicted value. The analysis requires this to be done for the many different lines that can “fit” through the data. The line that has the minimum sum of squared differences compared to the other lines is the best fitting line. The equation for this line represents your simple linear regression model which is modelled by this SLR algorithm.


Simple Linear Regression Algorithm — The Implementation in Python


In this section , we will use Python on Jupyter notebook to find the best salary for our candidate

Importing required libraries

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
dataset = pd.read_csv('salary_data.csv') 
x = dataset.iloc[:, :-1].values #get a copy of dataset exclude last column y = dataset.iloc[:, 1].values #get array of dataset in column 1st

Split the dataset into 2 sets : Training set for training the model and Testing set used for testing the model.

Here the dataset is divided such that 2/3rd is the training set and 1/3rd is the testing set

from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=0) 

Build the Regression model

from sklearn.linear_model import LinearRegression 
regressor = LinearRegression() #training model implementing linear regression
regressor.fit(X_train, y_train) #best fit line as per the training data is estimated

Now, calculate or predict the salary of an individual having an experience of 5 years

y_pred  = regressor.predict(np.array([5]).reshape(1, 1)) 

Executing the above code will give below output

Output Display
Output Display

Thus, you can offer your candidate the salary of Rs. 73,545.90 and this is the best salary for him!

In summary, there are couple of predominant steps that need to be followed for Simple linear regression model:

  1. Importing datasets

  2. Splitting dataset into training and testing set. Normally 20% of dataset should be used for testing

  3. Initialize the regression model

  4. Fit the model using the training set

  5. Finally happy predicting !!!


Simple Linear Regression Algorithm — The Practical Application


  1. Predicting house prices based on single feature like size

  2. Sales forecasting based on advertisement expenditure

  3. Temperature forecasting based on factors like temperature, time of day or altitude

  4. Predicting patient health outcomes based on relevant factors

  5. Energy consumption based on weather conditions

 
 

+1 (302) 200-8320

NumPy_Ninja_Logo (1).png

Numpy Ninja Inc. 8 The Grn Ste A Dover, DE 19901

© Copyright 2025 by Numpy Ninja Inc.

  • Twitter
  • LinkedIn
bottom of page