AI & GDPR: CNIL’s New Recommendations!

On 19 June, the French data protection authority (Conseil National de l’Informatique et des Libertés – CNIL) published two new practical fact sheets designed to help actors developing AI systems to comply with the General Data Protection Regulation (GDPR). The CNIL shares its recommendations on the use of legitimate interest as a legal basis for the development of AI systems and focuses on the collection of data through web scraping.  

These fact sheets follow a public consultation opened on 10 June 2024, whose results are publicly available. They are part of a set of 10 practical AI fact sheets available on the CNIL website. They present solutions and methods illustrated by concrete examples to support stakeholders in understanding and implementing the recommendations. 

The CNIL stresses that if the development of AI systems does not process personal data, the GDPR is not applicable. It also points out that other legal bases can be used to develop AI systems. 

All AI fact sheets are available on the CNIL website. Here is an overview of the recommendations set out in the two new fact sheets.


Sheet 8: Using the legal basis of legitimate interest to develop an AI system 

Legitimate interest can often serve as an appropriate legal basis for the development of AI systems by private entities. The CNIL points out that public entities should use this legal basis only when the AI system is developed for activities that are not strictly necessary to perform its specific missions. 

The use of legitimate interest as a legal basis for the development of AI systems requires to comply with three conditions: 

  1. The interest[1] pursued must be legitimate

      Legitimacy can be understood broadly, subject to compliance with three cumulative criteria: 

      • The interest is manifestly lawful
      • The interest is determined clearly and precisely
      • The interest is real and present for the organisation


      For example, a commercial interest may be considered legitimate. The development of an AI system based on one of the practices prohibited by the EU AI Act cannot be lawful, regardless of the legal basis used. 

      1. Processing must be necessary in order to pursue the interest.

        It implies that the data controller must ensure that no less intrusive means are available and that the processing is necessary to pursue the legitimate interest.

        1. The processing must not disproportionately affect the rights and freedoms of individuals having regard to their reasonable expectations.

         This condition requires the data controller to balance: 

        • The rights and freedoms of the subjects
        • The benefits
        • The impacts on individuals 

        Negative impacts must be identified and assessed by taking into account the potential and actual consequences of the development and use of the AI system based on their probability and severity. They may be linked to:

        • The development of the AI system: risk of invasion of privacy/rights guaranteed by the GDPR; loss of confidentiality; difficulty in exercising rights; difficulty in ensuring transparency, etc.
        • The use of the AI system: risk of memorization/extraction/regurgitation; damage to reputation; infringement of rights/secrecy, serious ethical risks, etc.

        The analysis must take account of the reasonable expectations of individuals as determined by the data controller through a set of indicators.

        A list of additional concrete measures to limit the impact of the processing is drawn up in order to:

        • Limit the collection and storage of personal data
        • Allow individuals to retain control over their data
        • Limit the risks involved in using the AI system.

        Sheet 8 Bis – Measures to be taken when collecting data through web scraping

        Web scraping data accessible online is not prohibited, but additional measures must be taken and analysed individually, depending on the context.

        Web scraping is lawful only if a valid legal basis can be relied upon and certain conditions are met. The CNIL lists the measures that data controllers must implement when conducting web scraping: 

        • Clearly define collection criteria
        • Exclusion of certain categories of data from collection 
        • Exclusion of irrelevant data immediately after collection or as soon as it is identified as such
        • Exclusion of websites that explicitly object to the web scraping of their content


        The data controller must implement additional safeguards, selected based on the intended use and the actual impact of the AI system on individuals, such as:

        • Drawing up a list of websites excluded by default
        • Excluding websites that object to web scraping or re-use of their data for AI systems training
        • Widely disseminate information about data collection and individual rights
        • Provide for a discretionary and prior right to object
        • Anonymisation/pseudonymisation of data after collection


        Advice is given to website publishers on how to protect their content from web scraping. The fact sheet gives recommendations to data controllers on what to do in the event of incidental collection of sensitive data.

        To consult all the CNIL’s practical information sheets, click here

        To read our article on the two previous CNIL fact sheets, click here.

        To stay informed, visit our website at AI-Regulation.com and follow us on LinkedInTwitter and Facebook.

        S.P


        [1] The CNIL stresses that interest, i.e. the benefit that the data controller and third parties may derive, and purpose, i.e. the reason for which the data is processed, are two concepts that must be distinguished. 

        Like this article?
        Share on Facebook
        Share on Twitter
        Share on Linkdin
        Share by Email