Quality of data for AI systems
Plain English Translation
Organizations must establish and document clear standards for the quality of data used in their artificial intelligence models. Because machine learning data quality directly impacts how accurately and fairly an AI system performs, organizations are required to define metrics for training, validation, testing, and production data. Monitoring AI data quality on an ongoing basis ensures the model continues to function safely and reliably as conditions change.
Technical Implementation
Use the tabs below to select your organization size.
Required Actions (startup)
- Define basic baseline rules for data quality such as handling missing values, identifying duplicates, and validating data types.
- Document the minimum expected data quality metrics required to accept a dataset for initial model training.
Required Actions (scaleup)
- Automate data quality validation checks within the ML pipeline using testing frameworks to verify inputs.
- Formalize the data quality requirements for AI systems in a centralized data dictionary or governance platform.
Required Actions (enterprise)
- Deploy advanced data drift detection to automatically monitor production data against training baselines.
- Integrate AI data governance and data quality management deeply into the incident response plan to handle data anomalies.
ISO/IEC 42001 Annex A.7.4 requires organizations to explicitly define and document their requirements for data quality and verify that the data actually used to develop and operate the AI system meets those standards.
Organizations define these requirements by evaluating the system's intended purpose and setting criteria to ensure data characteristics satisfy stated needs under specified conditions, typically applying frameworks like ISO/IEC 5259.
Organizations should incorporate comprehensive dimensions including accuracy, completeness, consistency, and representativeness, tailoring the metrics to the specific requirements of the machine learning algorithms being utilized.
Thresholds are determined by evaluating acceptable error rates and performance metrics for the model. Organizations measure data against these thresholds using automated scripts, statistical analysis, and validation tools during the development pipeline.
Expected documentation includes formal data quality policies, specific criteria for acceptable datasets, data preparation methodologies, and continuous logs verifying that used data meets the documented thresholds. Tools like WatchDog Security's Policy Management can manage these documents with versioning and attestations, while WatchDog Security's Compliance Center can link them to ISO/IEC 42001 controls and associated evidence.
Data quality monitoring in production should be a continuous or highly frequent process, relying on automated alerts to detect sudden anomalies, drift, or degradation in incoming data streams.
Compliance involves implementing data drift detection tools to monitor production inputs, and establishing procedures to retrain the model or update the data processing pipeline when drift exceeds acceptable thresholds.
By rigorously enforcing data quality requirements for AI systems, organizations can identify unrepresentative or historically prejudiced datasets early, allowing them to adjust the data to improve fairness before deployment.
Organizations must document standardized data preparation procedures that dictate how to correctly impute missing values, smooth noisy data, and validate human or automated labeling against defined quality benchmarks.
Auditors require documented quality rules, logs of ongoing automated quality checks, records of any manual overrides or remediations, and formal sign-offs approving data sets for AI training or operational use. WatchDog Security's Compliance Center can centralize this evidence with an audit trail for exceptions and remediation, and WatchDog Security's Trust Center can help share approved evidence packages with stakeholders when appropriate.
Data quality requirements often end up scattered across tickets, notebooks, and pipeline code, making it hard to prove consistent governance. Tools like WatchDog Security's Policy Management can centralize data quality specifications with version control and approvals, while WatchDog Security's Compliance Center can map those requirements to ISO/IEC 42001 controls and track supporting evidence.
Data-quality failures (e.g., missing fields, labeling errors, drift) can become material operational and compliance risks if they recur or impact model outcomes. Tools like WatchDog Security's Risk Register can log, score, and assign treatment plans for these issues, and WatchDog Security's Compliance Center can attach monitoring evidence, remediation records, and review cadence for audit readiness.
| Version | Date | Author | Description |
|---|---|---|---|
| 1.0.0 | 2026-02-23 | WatchDog Security GRC Team | Initial publication |