Acquisition of Data

Updated: 2026-02-23

Plain English Translation

Organizations must clearly define and record where their AI system data comes from and how it was selected. This means keeping detailed records of AI data acquisition, including the categories, quantities, and specific sources of data used. Implementing strict AI training data sourcing controls ensures that the organization has the legal rights to use the data, understands any inherent biases, and properly handles privacy constraints.

Executive Takeaway

Formalizing AI dataset sourcing prevents legal, privacy, and operational risks associated with undocumented or improperly licensed training data.

ImpactHigh
ComplexityMedium

Why This Matters

  • Mitigates intellectual property disputes and copyright infringement risks by enforcing data rights validation.
  • Improves AI model reliability and fairness by ensuring data sources are appropriate and thoroughly evaluated for biases.

What “Good” Looks Like

  • Maintaining a centralized inventory of all AI datasets that tracks source, licensing, quantity, and demographic characteristics; tools like WatchDog Security's Compliance Center can help keep this inventory evidence-linked and audit-ready.
  • Requiring a formal due diligence checklist and approval process before integrating new third-party or open data sources; tools like WatchDog Security's Policy Management can track the checklist workflow and capture approvals and acknowledgements.

Organizations must determine and document the specific details regarding the acquisition and selection of data used in their AI systems. This documentation ensures transparency into where data comes from and how it is chosen.

Organizations should record data categories, required quantities, sources such as internal or purchased data, data characteristics, subject demographics, prior privacy handling, data rights, associated metadata, and provenance. Tools like WatchDog Security's Compliance Center can help standardize these fields and keep supporting evidence (e.g., licenses, provenance notes) linked to each dataset record.

Proof of rights is established by retaining explicit contractual agreements, licensing records, and data processing agreements from suppliers that clearly define permitted usages for AI training. Tools like WatchDog Security's Vendor Risk Management can help retain these supplier artifacts alongside assessment results and renewal dates to support ongoing compliance.

Legal or compliance teams must analyze open data licenses to ensure they permit commercial AI training and to identify any attribution requirements or restrictive copyleft clauses.

Organizations must conduct strict privacy assessments, ensure conformity with privacy and security requirements, establish a documented lawful basis, and verify proper prior handling before acquiring personal data.

Organizations address this by analyzing and recording data subject demographics and identifying known or potential biases or systematic errors during the initial AI data acquisition process.

Data acquisition deals with the initial selection, sourcing, and authorization to use the data, whereas data provenance involves tracking the continuous lifecycle history and transformations of the data over time.

Data acquisition records should be reviewed dynamically whenever new datasets are introduced, data sources change, or the intended purpose of the AI system shifts.

Auditors typically expect to review data management policies, comprehensive data inventories detailing sources and rights, vendor security reviews for purchased data, and records of bias or privacy evaluations. Tools like WatchDog Security's Compliance Center can consolidate this evidence and show control-to-artifact traceability during audits.

Synthetic data must be explicitly categorized as machine generated in the data inventory, with supporting documentation detailing the generation algorithms and any source data it was derived from.

Data acquisition evidence often gets scattered across contracts, tickets, and shared drives, which makes audits and internal reviews slow and error-prone. Tools like WatchDog Security's Compliance Center can centralize dataset acquisition records, attach licensing/provenance evidence, and map each dataset entry to the ISO/IEC 42001:2023 Annex A.7.3 control for audit-ready reporting.

Third-party datasets introduce licensing, privacy, and security risks that require consistent assessment and documented approvals before use. Tools like WatchDog Security's Vendor Risk Management can standardize supplier assessments and retain contracts and DPAs, while WatchDog Security's Risk Register can track data sourcing risks, assigned owners, and treatment plans tied to the same dataset acquisition decision.

ISO-42001 Annex A.7.3

"The organization shall determine and document details about the acquisition and selection of the data used in AI systems."

VersionDateAuthorDescription
1.0.02026-02-23WatchDog Security GRC TeamInitial publication