top of page

Welcome
to NumpyNinja Blogs

NumpyNinja: Blogs. Demystifying Tech,

One Blog at a Time.
Millions of views. 

The Sepsis Dataset That Taught Us More Than We Expected-Complete Project Journey


Let's Begin The Journey

In this blog I am dropping my thoughts and the journey during our First Numpy DA Project Sepsis. Once my tableau course Done, we as a Team Registered for Sepsis Project with 6 members in team. While were entering we didn’t have any idea about Sepsis. We were completely Blank.

When we first opened the Physio Net Sepsis Dataset, it felt less like a spreadsheet and more like an entering into Public Exams or Medically Entering to ICU. After Opening the Reference link, our inner thoughts going as if we are going to the ICU Room where monitors are beeping, Numbers are flashing seeing the patient data and all laboratory values in excel Sheet.

After Seeing the data, my first question is are we entering medical field or in Data analyzing Field?? Later, we studied about Sepsis and What is Sepsis How is Sepsis Occurring in our human body ,Gone through the given Data Set which as Biomarkers, Demographics and vitals values.

    My Blog of Sepsis journey, we can easily understand Behind the data what all we learnt during the Project. When we began our sepsis project, we weren’t just opening a dataset. We were stepping into war of data with hour by hour, patient by patient with N no. of Biomarkers.

     Firstly, Lets Walkthrough,

What Is Sepsis?


                  Sepsis is the human body’s extreme reaction to an untreated infection. It is a life-threatening emergency. The body attacks its own organs and tissues, which can lead to tissue damage, organ failure, and death. It doesn’t arrive with a warning label.


Overview Analysis of Data Journey

              Data Set has most Incomplete data and uncleaned data with the dataset contains 34 Biomarkers, with 8 Demographics, also most patients have large gaps in measurements.

Total Patient Count

We First analyzed the total Number of patients in the data set; Count Is 40336 patients. But in our data set we are Seeing the 1048576 Rows. So, we analyzed the Onset Patient count and Extracted patient with Distinct count.


Percentage of Sepsis -> (COUNTD( IF [Sepsis Label] = 1 THEN [Patient ID] END) / COUNTD([Patient ID])) *100


Total Sepsis and Non-Sepsis patient

                            In this we found the total sepsis as well as Non sepsis patient


IF {FIXED [Patient ID]: MAX ([Sepsis Label])}=1THEN 'sepsis'

ELSE 'No sepsis'

END


   

    

  Irregular Time-Series Data & KPI

                         The dataset provides hourly ICU measurements, but not every patient has consistent timestamps. Tableau doesn’t natively handle sparse clinical data well. It also includes

·        Missing hours

·        Irregular Sampling

·        Sudden gaps in Vitals

 

This made us difficult to build smooth time-based visualizations and significant preprocessing before Tableau could interpret the data correctly.

So we found the KPI values of Sepsis, NON Sepsis patient ,MICU, SICU .

ICU Type measured in Unit 1 & Unit 2 in Data Set, we Calculated the Units in MICU and SICU.


IF [Unit1] = 1 THEN "MICU"

ELSEIF [Unit2] = 1 THEN "SICU"

ELSE "Other ICU"

END


Missing Values Across Biomarkers

            Physio Net’s dataset is realistic which means Data is RAW which is Uncleaned Data set.

  • Key variables like lactate, bilirubin, platelets, and Pao₂/FiO₂ were missing for large portions of patients.

  • Generally, Tableau doesn’t handle sparse clinical data well.

  • We analyzed the Ideal Range of each biomarkers

  • We as an Analysts should set the rules before visualization.

  • Due to More no. of biomarkers and missing data we correlated some of the Biomarkers .

  • Some metrics require custom Calculated Fields to avoid misleading charts.



Demographics Analysis

           The Physio Net Sepsis Challenge dataset includes demographic information that helps contextualize sepsis patterns, risk factors, and outcome differences across patient groups.

  •  Age is strongest predictor in sepsis risk.

  • Age and gender all measured in Binary numbers, so we analyzed the value because some patients had unrealistic ages, so we created Age bins needed for clinical relevance.

  • Created Age bin values ranging from interval of 20.

    IF [Age] < 20 THEN "<20"

    ELSEIF [Age] < 40 THEN "20–39"

    ELSEIF [Age] < 60 THEN "40–59"

    ELSEIF [Age] < 80 THEN "60–79"

    ELSE "80+"

    END

           


Organ Analysis in Sepsis

                

Analyzing organ dysfunction in sepsis is an early detection, most complex parts of the Physio Net Sepsis Challenge dataset. Each organ system behaves differently, has different biomarkers, and presents unique data quality issues. Organ analyzed in our project is Heart, Liver, Lungs and Kidney. Translating this into accurate SOFA scoring and meaningful Tableau visuals required overcoming several challenges.

We as a team analyzed Heart organ .


      Analysis

                This Cardiovascular Component Depend on Biomarkers values and fluctuations also indirectly On SOFA Scores.

                We wanted to understand how the heart behaves in sepsis.

But the dataset didn’t give us cardiac output, ejection fraction, or troponin for most patients. 

Due to that :            

  • MAP values fluctuate rapidly hour‑to‑hour.

  • Rising heart rate

  • high lactate

  • Low pH

  • Oxygen saturation not stable.

                    We Created Calculated field for Biomarkers which directly affects the heart.

                   Biomarkers Like HR, Troponin, BP, MAP, potassium and calcium.


             HR(HEART RATE)

IF [HR] < 60 THEN 'Bradycardia.'

ELSEIF [HR] >= 60 AND [HR] <= 100 THEN 'Moderate Impact to heart'

ELSEIF [HR] > 100 AND [HR] <= 120 THEN 'Tachycardia'

ELSEIF [HR] > 120 THEN 'Sepsis Shock'

ELSE 'Unknown'

END



MAP(MEAN ARTERIAL PRESSURE)

Using MAP Biomarker we found out the Mortality rate .Checking Heart Cardiac Perfusion &Survival. Analyzed how many patients got Muscle damage, went to Mortality and Normal by visualization of the chart.


IF { FIXED [Patient ID] :

MIN(IF NOT ISNULL([MAP]) THEN [MAP] END) <= 65 }

THEN "Mortality"

ELSEIF { FIXED [Patient ID] :

MIN(IF NOT ISNULL([MAP]) THEN [MAP] END) >= 75 }

THEN "Heart Muscle Damage"

ELSEIF { FIXED [Patient ID] :

MIN(IF NOT ISNULL([MAP]) THEN [MAP] END) > 65 AND

MIN(IF NOT ISNULL([MAP]) THEN [MAP] END) < 75 }

THEN "Normal"

ELSEIF { FIXED [Patient ID] : COUNT(IF ISNULL([MAP]) THEN 1 END) = COUNT([Patient ID]) AND COUNT([Patient ID]) >0 }

THEN "Not Tested"

END



HOUR

     In this Analysis we found heart is one of the organs Majorly affected in Sepsis.

Also Animated heart rate by hour.


Implemented SIRS, SOFA, and Sepsis.

SIRS (Systemic Inflammatory Response Syndrome )

Systemic Inflammatory Response Syndrome (SIRS) is an exaggerated, body wide inflammatory reaction to infection or injury. It’s identified when at least two signs -abnormal temperature, heart rate, breathing, or white blood cell count are present. SIRS can quickly lead to organ dysfunction and becomes sepsis when caused by infection.

·       Checked SIRS/SOFA logic with clinical guidelines.

·       Used LODs to compute patient-level metrics.

·       Analyzed the trigger hour, transition hour in SIRS.

·       Generated Email Alert using Trigger and Transition Hour with SIRS level

·       This Alerting system identifies SIRS-positive, an alert system can notify the respective healthcare provider for timely intervention.


SOFA 

The SOFA Score is designed to describe the sequence of organ failure over time rather than just a static snapshot. It assesses the same six organ systems (Respiratory, Cardiovascular, Renal, Hepatic, Coagulation, and Neurological) on a scale of 0 to 4 points each, reaching a maximum of 24. 

    I analyzed the Biomarker with respect to SOFA and found the Max Biomarker Assessment.

    That’s the End of my Analyzation in Sepsis journey. As a whole Team of Sepsis Working for Complete Visualizations of other biomarkers and other Organs to present the full complete project in tableau.



Also analyzed Sofa Score is Affected in Individual Organ with COX Comb Chart Visualization.



What We Learned Beyond Data?

    This is the Overview and Journey of my sepsis Project.

     This was not just a Tableau project. It was a beautiful journey through uncertainty, complexity, and the messy beauty of real clinical data.

  • ICU data is imperfect, but it’s real data.

  • Sepsis is unpredictable, but not invisible

  • And every number in the dataset represents a human story

  • How to use Tableau wisely to create more visuals

  • Improved our analytical skill .

  • Learned Different types of Dashboard creation.

  • Learnt about how to work with team, work commitment, coordination.

In the end, this project didn’t just make us better analysts. But we got Knowledge on complete Sepsis Disease sorry not a disease if we detect early.


End Of The Journey

      Working on the sepsis project taught us far more than how to handle data, how to generate different visuals or physiological signals. It reminded us that behind every row in the dataset is a patient whose condition changed hour by hour, sometimes quietly, sometimes violently. The Physio Net Sepsis dataset didn’t just challenge our technical skills, it pushed us to think like clinicians and felt like working in medical field.

Reference Link

+1 (302) 200-8320

NumPy_Ninja_Logo (1).png

Numpy Ninja Inc. 8 The Grn Ste A Dover, DE 19901

© Copyright 2025 by Numpy Ninja Inc.

  • Twitter
  • LinkedIn
bottom of page