ICO Publishes Final Response to Consultation on Generative AI and Data Protection

The Information Commissioner’s Office (ICO) has published its final response to the public consultation on the application of UK data protection law to generative AI (GAI). The response marks the culmination of an extensive consultation process aimed at clarifying key data protection principles in the context of generative AI systems.

The consultation invited input from organizations and individuals, with feedback submitted through surveys, direct correspondence, and stakeholder roundtables. A total of 192 organizations and 22 individuals participated, representing sectors such as technology, creative industries, civil society, and law firms. The ICO also engaged with the French Data Protection Authority (CNIL) to gain additional perspectives.

The ICO’s final response focuses on five key areas of uncertainty in applying data protection principles to generative AI:

  1. Lawful Basis for Web Scraping
    The ICO reaffirms that “legitimate interests” is likely to be the most appropriate lawful basis for processing publicly available web-scraped data for generative AI training. However, developers must demonstrate the necessity of this processing, explore alternative data collection methods, and justify why alternatives are unsuitable. Greater transparency is required, including providing clear information to individuals and enabling them to exercise their rights. Interestingly, as noted by Professor Theodore Christakis ICO questions the necessity of web-scraping for training LLMs suggesting an alternative means that has been challenged by the Italian Data Protection Authority (Garante). The ICO report highlights that:

    “The consultation responses clearly showed that the necessity of using web-scraped personal data to train generative AI is not a settled issue. The creative industries in particular challenged our initial position that web-scraping is necessary. We received evidence that other methods of data collection exist, for example where publishers collect personal data directly from people and license this data in a transparent manner. As a result, we encourage developers to seek out other sources of data where possible”.

    As Professor Christakis notes this position emerges only days after the Italian DPA, the Garante, issued a major decision regarding GEDI. In that case, the Garante challenged the use of legitimate interest as a legal basis for allowing the transfer of personal data from publishers to OpenAI for training its AI models. According to the Garante, this communication of editorial content by publishers for AI training could be in breach of several GDPR provisions.

  2. Purpose Limitation
    The ICO maintains its position that data collected for one purpose cannot be reused for generative AI training unless the new purpose is compatible with the original purpose. Organizations must assess compatibility and document their reasoning.

  3. Accuracy of Training Data and Outputs
    Developers must mitigate risks associated with inaccurate outputs of generative AI systems. The ICO stresses that ensuring accuracy is critical for compliance with data protection law and for building trust in AI technologies.

  4. Engineering Individual Rights
    The ICO emphasizes the need for generative AI systems to be designed to respect individuals’ data rights, such as access, rectification, and erasure. Developers must establish clear processes to facilitate these rights, even when data is embedded in models. While Article 11 of the UK GDPR (processing without identification) may apply in certain cases, organizations must demonstrate that they cannot identify individuals and must provide mechanisms for individuals to supply additional information for identification purposes.

  5. Allocating Controllership
    The ICO affirms that clear allocation of data protection responsibilities throughout the generative AI supply chain is essential. Organizations must determine whether they act as controllers, joint controllers, or processors and ensure compliance accordingly.

The ICO’s response underscores the importance of data protection by design and by default in the development and deployment of generative AI systems. Developers and deployers are urged to:

  • Be transparent about the data used to train generative AI models;
  • Assess and mitigate risks of inaccurate or biased outputs;
  • Enable individuals to exercise their data protection rights effectively;
  • Clearly allocate responsibilities for data protection across the supply chain.

The ICO also acknowledges the evolving nature of generative AI and its associated challenges. It plans to continue monitoring developments, engaging with stakeholders, and updating guidance as needed. Additionally, the ICO will collaborate with the Competition and Markets Authority (CMA) to address intersections between data protection and competition law in the context of generative AI.

Lastly, it should be noted that the European Data Protection Board (EDPB) is expected to publish its opinion on generative AI under Article 64(2) of the GDPR on December 18, 2024. The opinion follows a request by the Irish Data Protection Commission (DPC). The EDPB also organized a remote stakeholder event on November 5, 2024, to gather input from selected stakeholders ahead of the opinion’s release.

You can read the full ICO response here.

P.R.

Like this article?
Share on Facebook
Share on Twitter
Share on Linkdin
Share by Email