Cleaning Data
5 min read Quiz at the end
Handle nulls with dropna/fillna, remove duplicates, cast dtypes, rename columns, and clean strings.
Cleaning Data
import pandas as pd
df = pd.read_csv('data.csv')
# Handle missing values
df.dropna() # drop rows with any null
df.dropna(subset=['email']) # drop only if email is null
df.dropna(thresh=3) # keep rows with 3+ non-null
df.fillna(0) # fill nulls with 0
df['age'].fillna(df['age'].mean(), inplace=True)
df.fillna({'age': 0, 'name': 'Unknown'})
# Duplicates
df.duplicated().sum()
df.drop_duplicates()
df.drop_duplicates(subset=['email'], keep='first')
# Data types
df['age'] = df['age'].astype(int)
df['created_at'] = pd.to_datetime(df['created_at'])
df['price'] = pd.to_numeric(df['price'], errors='coerce')
# Rename columns
df.rename(columns={'old_name': 'new_name'}, inplace=True)
df.columns = [c.lower().replace(' ','_') for c in df.columns]
Topic Quiz · 1 questions
Test your understanding before moving on
1. Which method removes rows with any missing values?
💡 df.dropna() removes rows containing any NaN values; use subset= to target specific columns.