Getting Visual with Healthcare Data: A Beginner’s Guide to Exploratory Data Analysis (EDA) and Descriptive Statistics in Python

krishnananavaty
Oct 11
3 min read

Introduction:

What is EDA and Descriptive Analysis in Data Analysis?

Before diving into machine learning or statistical modeling, it's crucial to understand your data. This is where Exploratory Data Analysis (EDA) and Descriptive Analysis come into play. Both are crucial to uncover patterns, spot anomalies, test assumptions, and summarize key characteristics of the data before building any models or drawing conclusions.

EDA helps you:

Discover patterns and relationships
Detect outliers or missing values
Understand data distributions
Get insights to guide further analysis or modeling

Tools like histograms, scatter plots, box plots, and correlation matrices are commonly used in EDA.

Descriptive Analysis helps you:

Focuses on summarizing data quantitatively.
There are three main categories:
1. Measures of Central Tendency – Mean, Median, Mode
2. Measures of Dispersion – Standard Deviation, Variance, Range, IQR
3. Distribution Shape – Skewness, Kurtosis

Together, EDA and Descriptive Analysis form the foundation of any successful data analysis project by turning raw data into understandable insights, making it easier to decide what to do next.

In this blog, we will use health care data. It is some of the most meaningful and impactful data in the real world. It includes information about patients age, gender, blood pressure, cholesterol levels, diagnoses like diabetes or heart disease . This data helps healthcare professionals understand patterns, improve decisions, and enhance patient care.

For the purpose of this blog, I created a simulated healthcare dataset using Python. The dataset includes fictional patient records such as age, gender, blood pressure, cholesterol, diabetes, and heart disease status. No real patient data was used. This allows us to explore data analysis techniques without using any real or sensitive medical information.

Random Data

Preview of the Simulated Dataset:

Here’s a glimpse of the first few rows of our simulated dataset, showcasing the kind of patient information included.

Descriptive Statistics: Summarizing the Data:

We start by calculating basic descriptive statistics to understand the central tendencies, variability, and distribution shapes in our dataset. This step gives a quick overview of the patient population.

Interpretation :

Interpretation :

The median age is 53 years, very close to the mean. This similarity suggests the age distribution is fairly symmetrical (not heavily skewed).
The most frequently occurring age is 57 years. Mode can be particularly useful when looking at categorical data or ages clustered around certain values.
The most frequent gender in the dataset is Male. This means more patients in this dataset are male than female. This insight can be important when considering gender-specific health risks or tailoring healthcare interventions.
The diabetes column is encoded as: 0= No Diabetes , 1= Diabetes
- The most common status is 0, meaning most patients do not have diabetes.
- This is consistent with the roughly 19% diabetes prevalence from the descriptive statistics earlier.

Exploratory Data Analysis:

Interpretation:

Age follows a near-normal distribution centered around 53 years — most patients are middle-aged.

Interpretation:

No clear linear relationship; blood pressure varies widely across all ages.

Interpretation:

Cholesterol shows visible outliers above 250 mg/dL — potentially high-risk individuals.

Interpretation:

Heart disease appears slightly more prevalent in males in this dataset.

Interpretation:

Patients with diabetes are more likely to have heart disease, suggesting a strong association between diabetes and cardiovascular risk.

Interpretation:

The correlation heatmap shows that no single variable (age, gender, cholesterol, etc.) has a strong linear relationship with heart disease.

Conclusion

In this blog, we explored the basics of Exploratory Data Analysis (EDA) and Descriptive Analysis using a simulated healthcare dataset. We looked at how to summarize data using basic statistics and visualize patterns through charts like histograms., box plot, scatter plot.

Key Takeaways from the Analysis:

EDA helps you understand your data before doing any advanced analysis or building models.
Descriptive statistics give you a quick snapshot of your data — like averages, ranges, and frequency counts.
Visualizations help uncover patterns, outliers, and relationships that aren’t obvious from numbers alone.
Using even a few simple tools like Python’s pandas, matplotlib, and seaborn, you can gain powerful insights from your data.

EDA is a crucial first step in any data project. It ensures that you don’t miss any important details and helps you make more informed decisions — whether you’re working with any field like healthcare data, sales numbers, or anything else.

Welcome
to NumpyNinja Blogs

Getting Visual with Healthcare Data: A Beginner’s Guide to Exploratory Data Analysis (EDA) and Descriptive Statistics in Python

Introduction:

What is EDA and Descriptive Analysis in Data Analysis?

Conclusion

Recent Posts

Welcome to NumpyNinja Blogs

Introduction:

What is EDA and Descriptive Analysis in Data Analysis?

Conclusion

Welcome
to NumpyNinja Blogs