Acquisition of Data
Plain English Translation
Organizations must clearly define and record where their AI system data comes from and how it was selected. This means keeping detailed records of AI data acquisition, including the categories, quantities, and specific sources of data used. Implementing strict AI training data sourcing controls ensures that the organization has the legal rights to use the data, understands any inherent biases, and properly handles privacy constraints.
Technical Implementation
Use the tabs below to select your organization size.
Required Actions (startup)
- Track basic data sources, categories, and licensing terms in a centralized spreadsheet.
- Ensure development teams understand open data licensing constraints before downloading external datasets.
Required Actions (scaleup)
- Implement a formal AI data acquisition policy with required sign-offs.
- Document dataset demographics, known biases, and privacy handling procedures in a structured data dictionary.
Required Actions (enterprise)
- Automate data lineage and licensing compliance checks within the machine learning pipeline.
- Integrate AI data acquisition reviews deeply into formal AI system impact assessment and risk treatment processes.
Organizations must determine and document the specific details regarding the acquisition and selection of data used in their AI systems. This documentation ensures transparency into where data comes from and how it is chosen.
Organizations should record data categories, required quantities, sources such as internal or purchased data, data characteristics, subject demographics, prior privacy handling, data rights, associated metadata, and provenance. Tools like WatchDog Security's Compliance Center can help standardize these fields and keep supporting evidence (e.g., licenses, provenance notes) linked to each dataset record.
Proof of rights is established by retaining explicit contractual agreements, licensing records, and data processing agreements from suppliers that clearly define permitted usages for AI training. Tools like WatchDog Security's Vendor Risk Management can help retain these supplier artifacts alongside assessment results and renewal dates to support ongoing compliance.
Legal or compliance teams must analyze open data licenses to ensure they permit commercial AI training and to identify any attribution requirements or restrictive copyleft clauses.
Organizations must conduct strict privacy assessments, ensure conformity with privacy and security requirements, establish a documented lawful basis, and verify proper prior handling before acquiring personal data.
Organizations address this by analyzing and recording data subject demographics and identifying known or potential biases or systematic errors during the initial AI data acquisition process.
Data acquisition deals with the initial selection, sourcing, and authorization to use the data, whereas data provenance involves tracking the continuous lifecycle history and transformations of the data over time.
Data acquisition records should be reviewed dynamically whenever new datasets are introduced, data sources change, or the intended purpose of the AI system shifts.
Auditors typically expect to review data management policies, comprehensive data inventories detailing sources and rights, vendor security reviews for purchased data, and records of bias or privacy evaluations. Tools like WatchDog Security's Compliance Center can consolidate this evidence and show control-to-artifact traceability during audits.
Synthetic data must be explicitly categorized as machine generated in the data inventory, with supporting documentation detailing the generation algorithms and any source data it was derived from.
Data acquisition evidence often gets scattered across contracts, tickets, and shared drives, which makes audits and internal reviews slow and error-prone. Tools like WatchDog Security's Compliance Center can centralize dataset acquisition records, attach licensing/provenance evidence, and map each dataset entry to the ISO/IEC 42001:2023 Annex A.7.3 control for audit-ready reporting.
Third-party datasets introduce licensing, privacy, and security risks that require consistent assessment and documented approvals before use. Tools like WatchDog Security's Vendor Risk Management can standardize supplier assessments and retain contracts and DPAs, while WatchDog Security's Risk Register can track data sourcing risks, assigned owners, and treatment plans tied to the same dataset acquisition decision.
| Version | Date | Author | Description |
|---|---|---|---|
| 1.0.0 | 2026-02-23 | WatchDog Security GRC Team | Initial publication |