Is Java Too Strict? Is Python Too Forgiving? Data Errors Compared
- SunAi Murugan
- Jun 19, 2025
- 5 min read
Ever tried handling missing list element in any language like Python still hit an “Index error”
Or a “NullPointerException” in Java to crash? Let me share my learning with error handling, focusing on missing or null data and indexing error with both Python and Java.
Error handling is a very crucial factor in data science as the entire field revolves around the data quality. Bad data and silent feature may corrupt the insight and mislead the model study. Missing values, malformed entries and outliers are common in raw data. Data handlings prevent the model to study wrong pattern. Unhandled exception in one step can break the whole pipeline and produce bad outputs. In data science, error handling isn’t just a best practice —it’s a shield against flawed results, wasted time, and bad decisions.
| Python | Java |
Missing or Null Data with java | df['column'].mean()Solution:df['column'].dropna().mean()
df.isnull() Check for null values
df.dropna() drop null values
df.fillna(0, inplace=True) fill null values with zero | String name = null; int length = name.length(); solution: if (name != null) { int length = name.length(); }
#From database:ResultSet rs = stmt.executeQuery("SELECT name FROM users"); while (rs.next()) { String name = rs.getString("name"); // name could be null }Solution: .getObjectif (rs.getObject("name") != null) { String name = rs.getString("name"); } |
Incorrect Data Types | 1.Compile error- int x=”abd”; 2.NumberFormatException Integer.parseInt(‘’abc”) 3.ClassCastException (Integer)list.get(0) 4.SQLExceptiongetInt(“name”) 5.IllegalArgumentException Day day = Day.valueOf("Funday");
Solution: try { Day day = Day.valueOf("MONDAY"); } catch (IllegalArgumentException e) { // Handle unknown enum } | |
|
|
|
Indexing Error | my_list = [10, 20, 30] print(my_list[5])Solution:if len(my_list) > 5: print(my_list[5])
# KeyError d = {'a': 1} print(d['b'])
Solution:print(d.get('b', 'default'))
#iloc[] is integer-position based #loc[] is label-based
df.iloc['row1']Solution: df.loc['row1']--if label is row1 df.iloc[0] – for first row
#Slicing Mistakes my_list = [1, 2, 3] print(my_list[3]) Solution: print(my_list[-1])- last one print(my_list[:3])- safe one
|
#ArrayIndexOutOfBoundsException int[] arr = {1, 2, 3}; System.out.println(arr[3]);
Solution: if (index >= 0 && index < arr.length) { System.out.println(arr[index]); }
#StringIndexOutOfBoundsException s = "hello"; System.out.println(s.charAt(10));
Solution: if (index < s.length()) { System.out.println(s.charAt(index)); }
#IndexOutOfBoundsException (ArrayList)List<Integer> list = new ArrayList<>(); list.add(1); System.out.println(list.get(2)); Solution: if (index < list.size()) { System.out.println(list.get(index)); } |
SettingWithCopyWarning | df = pd.DataFrame({'Age': [20, 25, 30], 'Gender': ['M', 'F', 'F']}) young = df[df['Age'] < 30] young['Age'] = young['Age'] + 1
Solution: young = df[df['Age'] < 30].copy() young['Age'] += 1 | List<Integer> original = new ArrayList<>(Arrays.asList(1, 2, 3)); List<Integer> subset = original.subList(0, 2); subset.set(0, 99);
subset gives only view but modifies with original
Solution: List<Integer> safeCopy = new ArrayList<>(original.subList(0, 2)); |
Merging/Joining Errors | df1.merge(df2, on='id')(if id is not present) Solution: df1.merge(df2, left_on='user_id', right_on='id')
#Duplicate keysdf1.merge(df2, on='id', how='inner') Solution: 1.Use drop_duplicates() before merge. 2.Or Use group()
# Join Produces Empty Resultdf1.merge(df2, on='id') Solution: Inspect overlap print(set(df1['id']) & set(df2['id'])) 1. Use how='outer' or how='left' to preserve rows.
# Data Type Mismatch in Keys
df1['id'].dtype #int64 df1['id'].dtype #object Solution: df1['id'] = df1['id'].astype(str)
to debug: indicator=Truedf1.merge(df2, on='id', how='outer', indicator=True)
| Map<Integer, User> userMap = ... List<Transaction> transactions = ...
for (Transaction t : transactions) { User u = userMap.get(t.getUserId()); if (u == null) { System.out.println("Missing user!"); } }
If java collection is used:
1. Check for missing keys before lookup 2. Use Optional or Map.getOrDefault() 3. Use Java Streams or libraries (e.g., Apache Commons CSV, JOOQ) for complex data joins |
Memory Errors with Large Data | Loading large result sets into memory ResultSet rs = statement.executeQuery("SELECT * FROM huge_table");
Solution: #JDBC result streamingsetFetchSize(n) or cursor-based fetching #Use streaming APIs Java 8 Streams or Apache Commons CSV #Increase heap size -Xmx8G in JVM options
#Process data in batches e.g., 1000 rows at a time
| |
7. Incorrect Use of apply() and lambda | #Inefficient Use Instead of Vectorization
df['new'] = df['col'].apply(lambda x: x + 1) Best: df['new'] = df['col'] + 1
#Complex logicdf['flag'] = df['score'].apply(lambda x: 'high' if x > 90 else 'low' if x > 50 else 'fail')
Best:def classify(score): if score > 90: return 'high' elif score > 50: return 'low' else: return 'fail'
df['flag'] = df['score'].apply(classify)
#Using apply() on DataFrames Instead of Columnsdf['result'] = df.apply(lambda row: row['a'] * row['b'], axis=1)Best: df['result'] = df['a'] * df['b']
#Silent Errors with applymap() vs apply()
df.applymap(lambda x: x + 1) # works only if dataframe is with numeric values only | #Confusing map() vs flatMap() List<List<String>> nested = ... nested.stream().map(x -> x.stream())
Best: nested.stream().flatMap(List::stream)
#Stateful or Side-Effect Lambdas List<String> names = ... names.stream().forEach(name -> counter++); Best: Avoid shared mutable state inside lambdas.
#Complex Filtering Logic Inline list.stream().filter(x -> x.age > 18 && x.score < 50 || x.name.equals("Test")) Best: Prefer extracting logic to a named method.
#Performance Cost of Repeated Streams stream.filter(...).map(...).collect(...)streams can not be reused |
File Handling Errors | #File Not Found with open('data/input.csv') as f: lines = f.readlines() Solutions: 1.Use os.path.exists() before 2.Print os.getcwd()
#Incorrect File Paths df = pd.read_csv('dataset/train.csv') Solutions: import os path = os.path.join(base_dir, 'dataset', 'train.csv')
# Encoding Issues pd.read_csv('data.csv') Solutions pd.read_csv('data.csv', encoding='utf-8')
# Reading Huge Files into RAM pd.read_csv('big.csv') Solutions pd.read_csv('big.csv', chunksize=100000) | #File Not Found File file = new File("data/input.csv"); Scanner scanner = new Scanner(file); Solutions:
if (!file.exists()) { System.out.println("File not found!"); }
# Hardcoded File Paths File file = new File("C:\\Users\\user\\input.csv"); Solutions Use System.getProperty("user.dir") or path joining logic.
# Reading Large Files into Memory List<String> lines = Files.readAllLines(Paths.get("large.csv")); Solutions: Files.lines(Paths.get("large.csv")).forEach(System.out::println);
# Encoding Mismatch BufferedReader br = new BufferedReader(new FileReader("data.csv")); Solutions BufferedReader br = Files.newBufferedReader(Paths.get("data.csv"), StandardCharsets.UTF_8);
|
Ambiguous Column Names After Merge | pd.merge(df1, df2, on='id', suffixes=('_original', '_comparison')) Solutions df2 = df2.rename(columns={'score': 'score_2'}) merged.drop(columns=['score_y']) |
"The technology you use impresses no one. The experience you create with it is everything." - Sean Gerety

