top of page

Welcome
to NumpyNinja Blogs

NumpyNinja: Blogs. Demystifying Tech,

One Blog at a Time.
Millions of views. 

Is Java Too Strict? Is Python Too Forgiving? Data Errors Compared


Ever tried handling missing list element in any language like Python still hit an “Index error”

Or a “NullPointerException” in Java to crash? Let me share my learning with error handling, focusing on missing or null data and indexing error with both Python and Java.


Error handling is a very crucial factor in data science as the entire field revolves around the data quality. Bad data and silent feature may corrupt the insight and mislead the model study. Missing values, malformed entries and outliers are common in raw data. Data handlings prevent the model to study wrong pattern. Unhandled exception in one step can break the whole pipeline and produce bad outputs. In data science, error handling isn’t just a best practice —it’s a shield against flawed results, wasted time, and bad decisions.

 

Python

Java

Missing or Null Data with java

df['column'].mean()Solution:df['column'].dropna().mean()

 

df.isnull()

Check for null values

 

df.dropna()

drop null values

 

 

df.fillna(0, inplace=True)

fill null values with zero

String name = null;

int length = name.length();

solution:

if (name != null) {

    int length = name.length();

}

 

#From database:ResultSet rs = stmt.executeQuery("SELECT name FROM users");

while (rs.next()) {

    String name = rs.getString("name");  // name could be null

}Solution: .getObjectif (rs.getObject("name") != null) {

    String name = rs.getString("name");

}

Incorrect Data Types

df['date'] = pd.to_datetime(df['date'])Solution:df['date'] = pd.to_datetime(df['date'], errors='coerce')

1.Compile error- int x=”abd”;

2.NumberFormatException

Integer.parseInt(‘’abc”)

3.ClassCastException

(Integer)list.get(0)

4.SQLExceptiongetInt(“name”)

5.IllegalArgumentException

 Day day = Day.valueOf("Funday");

 

Solution:

try {

    Day day = Day.valueOf("MONDAY");

} catch (IllegalArgumentException e) {

    // Handle unknown enum

}

 

 

 

Indexing Error

my_list = [10, 20, 30]

print(my_list[5])Solution:if len(my_list) > 5:

    print(my_list[5])

 

# KeyError

d = {'a': 1}

print(d['b']) 

 

Solution:print(d.get('b', 'default'))

 

#iloc[] is integer-position based

#loc[] is label-based

 

df.iloc['row1']Solution:

df.loc['row1']--if label is row1

df.iloc[0] – for first row

 

#Slicing Mistakes

my_list = [1, 2, 3]

print(my_list[3])

Solution:

print(my_list[-1])- last one

print(my_list[:3])- safe one

 

 

#ArrayIndexOutOfBoundsException

int[] arr = {1, 2, 3};

System.out.println(arr[3]);

 

Solution:

if (index >= 0 && index < arr.length) {

    System.out.println(arr[index]);

}

 

#StringIndexOutOfBoundsException

s = "hello";

System.out.println(s.charAt(10));

 

Solution:

if (index < s.length()) {  System.out.println(s.charAt(index));

}

 

#IndexOutOfBoundsException (ArrayList)List<Integer> list = new ArrayList<>();

list.add(1);

System.out.println(list.get(2));

Solution:

if (index < list.size()) {

System.out.println(list.get(index));

}

SettingWithCopyWarning

df = pd.DataFrame({'Age': [20, 25, 30], 'Gender': ['M', 'F', 'F']})

young = df[df['Age'] < 30]

young['Age'] = young['Age'] + 1

 

Solution:

young = df[df['Age'] < 30].copy()

young['Age'] += 1

List<Integer> original = new ArrayList<>(Arrays.asList(1, 2, 3));

List<Integer> subset = original.subList(0, 2);

subset.set(0, 99);

 

subset gives only view but modifies with original

 

Solution:

List<Integer> safeCopy = new ArrayList<>(original.subList(0, 2));

Merging/Joining Errors

df1.merge(df2, on='id')(if id is not present)

Solution:

df1.merge(df2, left_on='user_id', right_on='id')

 

#Duplicate keysdf1.merge(df2, on='id', how='inner')

Solution:

1.Use drop_duplicates() before merge.

2.Or Use group()

 

# Join Produces Empty Resultdf1.merge(df2, on='id')

Solution:

Inspect overlap

print(set(df1['id']) & set(df2['id']))

1. Use how='outer' or how='left' to preserve rows.

 

 

# Data Type Mismatch in Keys

 

df1['id'].dtype  #int64

df1['id'].dtype  #object

Solution:

df1['id'] = df1['id'].astype(str)

 

to debug:  indicator=Truedf1.merge(df2, on='id', how='outer', indicator=True)

 

Map<Integer, User> userMap = ...

List<Transaction> transactions = ...

 

for (Transaction t : transactions) {

   User u = userMap.get(t.getUserId());

   if (u == null) {

       System.out.println("Missing user!");

   }

}

 

 

If java collection is used:

 

1.      Check for missing keys before lookup

2.      Use Optional or Map.getOrDefault()

3.      Use Java Streams or libraries (e.g., Apache Commons CSV, JOOQ) for complex data joins

Memory Errors with Large Data

Loading full dataset into RAMdf = pd.read_csv("huge_file.csv")  # 10GB CSV in 4GB RAM

 

Solution:

#Chunked file readingpd.read_csv(..., chunksize=100000)

 

#Use appropriate dtypes

astype('int32'), category for strings

 

#Delete intermediate objects

del df_temp; gc.collect()

Loading large result sets into memory

ResultSet rs = statement.executeQuery("SELECT * FROM huge_table");

 

Solution:

#JDBC result streamingsetFetchSize(n) or cursor-based fetching

#Use streaming APIs

Java 8 Streams or Apache Commons CSV

#Increase heap size

-Xmx8G in JVM options

 

#Process data in batches

e.g., 1000 rows at a time

 

 7. Incorrect Use of apply() and lambda

#Inefficient Use Instead of Vectorization

 

df['new'] = df['col'].apply(lambda x: x + 1)

Best:

df['new'] = df['col'] + 1

 

#Complex logicdf['flag'] = df['score'].apply(lambda x: 'high' if x > 90 else 'low' if x > 50 else 'fail')

 

Best:def classify(score):

    if score > 90:

        return 'high'

    elif score > 50:

        return 'low'

    else:

        return 'fail'

 

df['flag'] = df['score'].apply(classify)

 

#Using apply() on DataFrames Instead of Columnsdf['result'] = df.apply(lambda row: row['a'] * row['b'], axis=1)Best:

df['result'] = df['a'] * df['b']

 

#Silent Errors with applymap() vs apply()

 

df.applymap(lambda x: x + 1) # works only if dataframe is with numeric values only

#Confusing map() vs flatMap() 

List<List<String>> nested = ...

nested.stream().map(x -> x.stream())

 

Best:

nested.stream().flatMap(List::stream)

 

#Stateful or Side-Effect Lambdas

List<String> names = ...

names.stream().forEach(name -> counter++);

Best:

Avoid shared mutable state inside lambdas.

 

#Complex Filtering Logic Inline

list.stream().filter(x -> x.age > 18 && x.score < 50 || x.name.equals("Test"))

Best:

Prefer extracting logic to a named method.

 

 

#Performance Cost of Repeated Streams

stream.filter(...).map(...).collect(...)streams can not be reused

File Handling Errors

#File Not Found

with open('data/input.csv') as f:

    lines = f.readlines()

Solutions:

1.Use os.path.exists() before

2.Print os.getcwd()

 

#Incorrect File Paths

df = pd.read_csv('dataset/train.csv')

Solutions:

import os

path = os.path.join(base_dir, 'dataset', 'train.csv')

 

# Encoding Issues

pd.read_csv('data.csv')

Solutions

pd.read_csv('data.csv', encoding='utf-8')

 

# Reading Huge Files into RAM

pd.read_csv('big.csv')

Solutions

pd.read_csv('big.csv', chunksize=100000)

#File Not Found

File file = new File("data/input.csv");

Scanner scanner = new Scanner(file);

Solutions:

 

if (!file.exists()) { System.out.println("File not found!"); }

 

# Hardcoded File Paths

File file = new File("C:\\Users\\user\\input.csv");

Solutions

Use System.getProperty("user.dir") or path joining logic.

 

# Reading Large Files into Memory

List<String> lines = Files.readAllLines(Paths.get("large.csv"));

Solutions:

Files.lines(Paths.get("large.csv")).forEach(System.out::println);

 

 

# Encoding Mismatch

BufferedReader br = new BufferedReader(new FileReader("data.csv"));

Solutions

BufferedReader br = Files.newBufferedReader(Paths.get("data.csv"), StandardCharsets.UTF_8);

 

 

Ambiguous Column Names After Merge

pd.merge(df1, df2, on='id', suffixes=('_original', '_comparison'))

Solutions

df2 = df2.rename(columns={'score': 'score_2'})

merged.drop(columns=['score_y'])

SELECT * FROM employees e

JOIN departments d ON e.dept_id = d.id;

Solutions

SELECT e.id AS emp_id, e.name AS emp_name,

       d.id AS dept_id, d.name AS dept_name

FROM employees e

JOIN departments d ON e.dept_id = d.id;

 

 "The technology you use impresses no one. The experience you create with it is everything." - Sean Gerety 

 
 

+1 (302) 200-8320

NumPy_Ninja_Logo (1).png

Numpy Ninja Inc. 8 The Grn Ste A Dover, DE 19901

© Copyright 2025 by Numpy Ninja Inc.

  • Twitter
  • LinkedIn
bottom of page