Data Analysis
data cleaningdata qualityETLanalyticsdata engineeringData Cleaning Checklist
A dataset-specific cleaning checklist that catches structural errors, missing data, and outliers before they corrupt your analysis.
Prompt Template
Generate a comprehensive data cleaning checklist for the dataset described as: [DATASET_DESCRIPTION]. The dataset contains [NUMBER_OF_ROWS] rows and [NUMBER_OF_COLUMNS] columns. It will be used for: [INTENDED_ANALYSIS]. Known data quality issues: [KNOWN_ISSUES]. The checklist should cover: (1) Structural checks — column names (standardized, no spaces, consistent case), data types (verify each column's type matches its content), duplicate rows (detection and removal criteria), (2) Missing data — for each column, specify: what % missing is acceptable, and the imputation or removal strategy, (3) Outlier detection — which columns to check for outliers, the method to use (IQR, Z-score, domain-specific rules), and the threshold for flagging vs. removing, (4) Consistency checks — cross-column validation rules (e.g., "end_date must be after start_date"), (5) Domain-specific checks — [DOMAIN_SPECIFIC_RULES — e.g., email format, phone number format, valid country codes], (6) Documentation — what to log for each cleaning action (original value, new value, reason), (7) Final validation — 3 sanity checks to run after cleaning to confirm the dataset is ready. Format as a step-by-step checklist with checkboxes.
How to use this prompt
- Copy the prompt template using the button above.
- Paste it into your preferred AI assistant (ChatGPT, Claude, Gemini, etc.).
- Replace all bracketed placeholders like
[TOPIC]with your specific details. - Send the prompt and refine the output as needed.
Advertisement