Data Analysis with PostgreSQL Window Functions: A Titanic Case Study
- Neetu Rathaur
- Jan 12
- 6 min read

When working with data analytics, one of the biggest challenges in SQL is answering questions like:
What is the rank of each record?
What is the running total over time?
How does today’s value compare with yesterdays?
What is each row’s value compared to the group average?
Traditional SQL aggregation using GROUP BY often not enough, because it collapses rows.
This is where window functions come in picture.
What Is a Window Function?
A window function performs a calculation across a set of related rows (called a window) while keeping each individual row visible in the result. Let’s understand with an example of Employee table with columns empId, empName, deptNo, salary.
empId | empName | deptNo | Salary |
1001 | Alex | 2 | 5000 |
1202 | Jose | 2 | 3000 |
1103 | George | 3 | 4000 |
1091 | Ethan | 1 | 5000 |
1201 | Charlie | 3 | 2000 |
1121 | Alice | 2 | 3000 |
1108 | Eddey | 1 | 4000 |
1015 | Jade | 1 | 2000 |
Question:
👉 Show each employee with the average salary of their department?
· Solution using GROUP BY
SELECT deptNo, AVG(Salary)
FROM employee
GROUP BY deptNo;deptNo | avg |
1 | 3666 |
2 | 3666 |
3 | 3000 |
Problem with above result is employee-level details (Name, Id) are lost.
· Solution using a Window Function
SELECT
Id,
Name,
deptNo,
Salary,
AVG(Salary) OVER (PARTITION BY deptNo) AS dept_avg_salary
FROM employee;Result will be
Name | deptNo | Salary | dept_avg_salary |
Ethan | 1 | 5000 | 3666 |
Eddey | 1 | 4000 | 3666 |
Jade | 1 | 2000 | 3666 |
Alex | 2 | 5000 | 3666 |
Jose | 2 | 3000 | 3666 |
Alice | 2 | 3000 | 3666 |
George | 3 | 4000 | 3000 |
Charlie | 3 | 2000 | 3000 |
From above result what we observed? Number of records are same from source to result. Here we can see employee names and their dept with average salaries.
We can say that average is calculated per row. Here you can observe that because there were 3 departments, and we wrote over clause as, OVER(PARTITION BY deptNo)
So data is divided into 3 windows (set of data) and each window (department wise) has all data belongs to it.
This is exactly, calculating across related rows while keeping each row visible.
There are following window function
1. ROW_NUMBER() - Assigns a unique number to each row.
2. RANK() and DENSE_RANK()
3. Running Totals - Calculates cumulative totals
4. LAG() and LEAD() — Comparing Rows
Lets understand more with a real data . I have a table with data of passengers were on titanic. We can see the table description with the query SELECT * FROM information_schema.columns WHERE table_name = 'passenger';

Table has 21 columns.
Ranking Passengers by Fare Within Each Class
SELECT
name,
pclass,
fare,
RANK() OVER (
PARTITION BY pclass
ORDER BY fare DESC
) AS fare_rank
FROM passenger;Understand What’s happening in query
PARTITION BY pclass → create a separate window for each class and assigning separate ranking for each class. Example class1 has ranks 1 to 80 and class 2 has rank 1 to 100.
ORDER BY fare DESC → highest fare gets rank 1
Rows are not grouped, only ranked

Observe above result from query here you can see fare_rank column has been created, that assign a rank according to fare. You must be wondering why there is rank 5 after 1. It is because rank() function assign rank row wise, there is 4 rows are with same fare has same rank as 1 and row 5 has less fair so assigned with 5 instead of 2.
Difference Between ROW_NUMBER, RANK, and DENSE_RANK
SELECT
name,
pclass,
fare,
ROW_NUMBER() OVER (PARTITION BY pclass ORDER BY fare DESC) AS row_num,
RANK() OVER (PARTITION BY pclass ORDER BY fare DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY pclass ORDER BY fare DESC) AS dense_rank
FROM passenger;
Now observe above query result, it has row_num, rank and dense_rank.
ROW_NUMBER assigns a unique, sequential number to each row based on the descending order of fare. Even if multiple rows have the same fare, each row still receives a different number.
RANK assigns the same rank to rows with the same fare, but the next rank reflects the total number of rows ranked so far. This causes gaps in the ranking sequence.
DENSE_RANK also assigns the same rank to rows with the same fare; however, it always assigns the next rank as the immediate next sequence number, regardless of how many rows share the previous rank. As a result, there are no gaps in the ranking.
If we want to retrieve the top 10 records for each class, we need to apply a WHERE condition. However, since window functions are evaluated after the WHERE clause, we cannot filter directly on a window function result.
To achieve this, we must first calculate the window function in a subquery and then apply the WHERE condition in the outer query. The example below demonstrates this approach.
With ranked_passenger AS
(SELECT
name,
pclass,
fare,
ROW_NUMBER() OVER (PARTITION BY pclass ORDER BY fare DESC) AS row_num,
RANK() OVER (PARTITION BY pclass ORDER BY fare DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY pclass ORDER BY fare DESC) AS dense_rank
FROM passenger)
select * from ranked_passenger where row_num<=10;
Let’s do some analysis
1. Passenger Class Impact on Ticket Fare
Let’s analyze how fare differs by class. Normally, we’d use GROUP BY. With window functions, we can keep each passenger row:
SELECT
name,
pclass,
fare,
AVG(fare) OVER (PARTITION BY pclass) AS avg_fare_in_class
FROM passenger;
This is extremely useful in analytics dashboards.
Insight:
First-class passengers paid significantly higher fares
Third-class fares cluster near the minimum
Window functions help us compare individual passenger’s vs class averages.
2. Average survival Rate by Gender (Window Function Style)
SELECT
name,
sex,
survived,
AVG(survived) OVER (PARTITION BY sex) AS survival_rate
FROM passenger;
Insight:
Females show a much higher survival rate
Confirms the “women and children first” policy
This query is useful for feature engineering, model input preparation and row-level reporting
3. Identifying High-Value Outliers Within Each Class
SELECT
name,
embarked,
fare,
SUM(fare) OVER (
PARTITION BY embarked
ORDER BY fare
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_fare_total
· Passengers paying the same fare receive the same rank
· If multiple passengers share a rank, the next rank is skipped
· This helps preserve the idea of relative standing within a class


Even though 69 is far less than 512, Passenger ‘Sage, Miss. Constance Gladys’ is still a top-paying outlier in 3rd class.
Why This Identifies High-Value Outliers
· Each passenger is evaluated relative to their peers
· Outliers identified naturally as those with top ranks per class
· This avoids bias caused by class-level price differences
Filtering Top Outliers (Example)
To fetch only the top 5 fare-paying passengers per class:
WITH ranked_passengers AS (
SELECT
name,
pclass,
fare,
RANK() OVER (
PARTITION BY pclass
ORDER BY fare DESC
) AS fare_rank_in_class
FROM passenger
)
SELECT *
FROM ranked_passengers
WHERE fare_rank_in_class <= 5;Analytical Use of the above query is to detect pricing outliers, Identify elite passengers, Useful for segmentation.
4. Age-Based Survival Patterns
Age was one of the most critical factors influencing survival on the Titanic. The famous rule “women and children first” suggests that younger passengers may have had a higher chance of survival. In this section, we analyze how survival rates vary across different age groups.
Before analyzing survival patterns, we group passengers into age bands, which makes trends easier to interpret.
Age based grouping Matters because raw ages are noisy and harder to compare. Age grouping reveals macro-level survival patterns, and this mirrors how analysts approach demographic data in real projects
Now, calculate survival statistics for each age group.

Above result shows survival rates per age group and comparison of each age group’s survival rate with the previous age group.
Insights from LAG Analysis
· The largest survival drop occurs when moving from children to teens
· Survival probability steadily decreases with age
· The decline becomes more gradual in older age groups
Similarly, we can analyze how each age group compares to the next one.

Insights from LEAD Analysis
· Children had a much higher survival advantage compared to teens
· Adults and middle-aged passengers faced consistently worsening odds
· Seniors had the lowest survival probability, with no further group to compare
Key Takeaways
· Window functions preserve row-level detail while enabling deep analytics
· PARTITION BY enables fair, segment-level comparisons
· Ranking and running totals uncover hidden patterns
· LAG() and LEAD() enable trend and transition analysis
· Window functions turn SQL into a true analytical language
Final Thought
If GROUP BY tells you what happened, window functions tell you why—row by row, segment by segment.
Reffrences
Sample data: https://github.com/neondatabase/postgres-sample-dbs

