📡 You're offline — showing cached content
New version available!
Quick Access
Tutorials Pandas Cleaning Data

Cleaning Data

5 min read Quiz at the end
Handle nulls with dropna/fillna, remove duplicates, cast dtypes, rename columns, and clean strings.

Cleaning Data

import pandas as pd
df = pd.read_csv('data.csv')

# Handle missing values
df.dropna()                        # drop rows with any null
df.dropna(subset=['email'])        # drop only if email is null
df.dropna(thresh=3)                # keep rows with 3+ non-null
df.fillna(0)                       # fill nulls with 0
df['age'].fillna(df['age'].mean(), inplace=True)
df.fillna({'age': 0, 'name': 'Unknown'})

# Duplicates
df.duplicated().sum()
df.drop_duplicates()
df.drop_duplicates(subset=['email'], keep='first')

# Data types
df['age'] = df['age'].astype(int)
df['created_at'] = pd.to_datetime(df['created_at'])
df['price'] = pd.to_numeric(df['price'], errors='coerce')

# Rename columns
df.rename(columns={'old_name': 'new_name'}, inplace=True)
df.columns = [c.lower().replace(' ','_') for c in df.columns]
Topic Quiz · 1 questions

Test your understanding before moving on

1. Which method removes rows with any missing values?
💡 df.dropna() removes rows containing any NaN values; use subset= to target specific columns.