CNIL’s Public Consultation on the Development of AI Systems

On June 10th, 2024, the French Data Protection Authority (Commission Nationale de l’Informatique et des Libertés – CNIL) began a publication consultation on a second series of seven practical information sheets regarding the development of Artificial Intelligence (AI) systems. These follow on from the first seven recommendations shared in April. 

This time round, the CNIL is seeking to provide legal certainty for developers of AI systems by anticipating the relationship between the EU AI Act and the GDPR, given that the long-awaited EU Regulation laying down harmonized rules on AI is being published in July. The CNIL’s practical information sheets address the following topics:


  1. Using legitimate interest as the legal basis for the development of AI systems


As well as being the most flexible option, legitimate interest represents one of six lawful bases for processing personal data under the GDPR. Considering the amount of data needed to train an AI system, it is anticipated that legitimate interest will be the most widely used.  However, three conditions would have to be met, according to the CNIL, to use this legal basis.

Firstly, the objective would have to be legitimate (e.g., facilitating public access to certain information or developing an AI system that detects fraudulent content or behaviour). Secondly, if the development of the AI system requires the use of personal data, the controller must ensure, based on the information available to them, that the development of this system is indeed necessary to achieve the objective they have set themself (e.g., fraud prevention). It would also need to be examined in relation to the principle of data minimisation. Finally, the developers of such a system would need to ensure that its objective does not threaten the rights and freedoms of those individuals who are captured by the system.


  • Legitimate interest and the dissemination of open-source models


Open-source dissemination would only be considered to fall under the legitimate interest legal basis if certain safeguards and additional guarantees are put in place. The provider of an open-source AI system would have to conduct effective peer reviews and ensure that the information that is produced is sufficiently transparent, and makes a valid contribution to the open-source community (e.g., in terms of model parameters, the source code and the learning database). It would then have to implement a) the necessary legal measures (e.g. restrictive licences) to limit models being reused; b) technical data security measures (e.g. anonymisation or pseudonymisation); and c) measures which guarantee that individuals are informed and are able to exercise their rights.


  • Legitimate interest and web scraping


Collecting data publicly available online through web scraping must be accompanied by measures that guarantee the rights of the data subjects. Care must be taken to delete any irrelevant data that may have been collected, if these data were not previously defined as being necessary. This measure is already obligatory due to the GDPR’s Principle of Data Minimization. According to CJEU case law, if, despite taking the aforementioned measures, an organisation accidently processes sensitive data that it had not intended to collect, this is not considered unlawful. On the other hand, if the organisation comes to know that it has processed sensitive data, in particular as a result of the data subject intervening, it is obliged to delete this data as soon as possible.


  • Informing data subjects


Any organisation that utilises personal data to set up or use a training database in order to develop an AI system must inform data subjects of their identity, and the contact details of the organisation and its data protection officer; the purpose and legal basis for the data processing; the intended recipients or at least the intended category of recipients for the data; the period of time that the data will be stored for; their rights (e.g., the right to access, rectification, erasure etc…); and the right to lodge a complaint with the national data protection authority.

If the data was collected indirectly, organisations must also reveal the categories of personal data gathered (e.g., identities, contact details, images, publications on social networks, etc.) and the source(s) of the data (indicating in particular whether or not these are publicly accessible sources).

Information provided to data subjects should be as succinct and clear as possible (using simple vocabulary, short sentences, a direct style, etc.) and adapted to the context in terms of how best to communicate with the person concerned.


  • Respecting and facilitating data subjects’ rights


Individuals whose data is collected, used or re-used to develop an AI system have rights over their data that enable them to retain control over it. It is the responsibility of data controllers to respect these rights and enable data subjects to exercise these rights. There are two kinds of request which enable these rights to be exercised; the first relates to training data and the second relates to the model itself.


  • Annotating data


The data annotation phase is a decisive stage in the development of a quality AI model, both in terms of performance and respect for people’s rights. It can involve any type of data, personal or other, and contain any type of information, personal or non-personal.  The CNIL is therefore inviting stakeholders to develop an annotation protocol and to involve a referee or an ethics committee. Individuals must be informed of annotation operations and must be able to exercise their right to access, rectification, deletion, opposition and portability.


  • Guaranteeing that AI systems are secure


Ensuring the security of AI systems represents another obligation that makes it possible to guarantee data protection during both the development of the system and in advance of its deployment. The security areas that need to be taken into account when developing an AI system are data confidentiality, system performance and integrity, and the general security of the information system. In order to assess the level of risk associated with an AI system, factors such as the nature of the data, the control over the data, the models and tools used, the ways in which the system can be accessed and the content of the system’s outputs, as well as the intended context in which the AI system will be used, would need to be taken into consideration. CNIL’s practical information sheet provides a detailed list of the measures that should be taken into account in order to guarantee the safety of an AI system in relation to its training data, development and operation (e.g., versioning, security audits, the options available to stop the system etc…). 


These new recommendations will be open to public consultation until 1ᵉʳ September. The public can contribute by accessing the ‘recommendations for AI systems’ and/or ‘application of the GDPR to AI models’ forms. These contributions will be analysed at the end of the public consultation so that the final recommendations can be published on the CNIL website over the course of 2024. Other publications on AI are also expected to complement this second set of recommendations later this year.


T. K.

Like this article?
Share on Facebook
Share on Twitter
Share on Linkdin
Share by Email