Skip to main content

How to Clean and Process Data Online

Remove duplicates, fix formatting, standardise values, and prepare messy data for analysis with our free Data Cleaner tool. Supports CSV and JSON.

Loading tool...

Steps

1

Upload your data

Upload a CSV, TSV, or JSON file, or paste your data directly. The tool parses the structure and gives you a preview of the data with its detected column types (text, number, date, boolean).

2

Review data quality issues

The data profiling panel shows detected issues: missing values by column (and what percentage of rows are affected), duplicate rows, inconsistent formatting (dates in mixed formats, phone numbers with and without country codes), leading or trailing whitespace, and outlier values.

3

Select cleaning operations

Choose which cleaning operations to apply: Remove duplicate rows, Trim whitespace from text columns, Standardise date formats (convert all dates to YYYY-MM-DD), Normalise text case (all lowercase, title case), Remove rows with too many empty values, Replace empty values with a specified default, or Remove specific columns.

4

Preview the cleaned data

Preview the result of your selected operations before applying. The diff view shows which rows changed and how. Verify that the operations produced the intended result without unintended side effects.

5

Download the clean data

Download the cleaned data in the same format (CSV or JSON) or export to a different format. The tool also generates a cleaning summary report showing how many rows were affected by each operation.

Common Data Quality Problems and Their Causes

Data quality problems fall into predictable categories. Structural problems: inconsistent column names (some columns use underscores, some use camelCase, some have typos), mixed data types in a column (mostly numbers but some text like 'N/A' or 'unknown'), and dates in different formats (01/15/2024, 2024-01-15, 15th January 2024 all in the same column). Content problems: duplicate records created by form submissions being processed twice, leading/trailing whitespace creating non-matching values ('London' ≠ ' London'), inconsistent categorical values ('UK', 'United Kingdom', 'England' all meaning the same thing), and values that are technically valid but logically impossible (negative ages, future birthdates, revenue totals that do not match line item sums). Knowing the common patterns helps you look for them systematically rather than discovering them when analysis produces unexpected results.

Data Cleaning for Different Downstream Uses

The level and type of cleaning needed depends on what you plan to do with the data. For statistical analysis: ensure correct data types, handle outliers, verify that distributions make sense, and decide on a principled approach to missing values. For machine learning: more aggressive cleaning is typically needed — handle missing values (most ML algorithms cannot handle nulls), encode categorical variables, normalise numeric ranges, and consider how to handle outliers (remove them or cap them). For database import: ensure values conform to the schema constraints — text lengths, required fields, foreign key relationships, and unique constraints. For reporting and visualisation: focus on aggregation errors, missing category labels, and date format consistency. For API integration: ensure data types match the API's expectations — particularly important for dates, booleans, and numeric precision.

Frequently Asked Questions

Related Tools