EDPB report highlights the compliance challenges for ChatGPT

On May 23, 2024, the European Data Protection Board (EDPB) released a report containing preliminary views on the data protection issues related to OpenAI’s ChatGPT.

This report was produced by the Board’s ChatGPT Taskforce. This Taskforce was established in April 2023 to foster cooperation and exchange information on possible enforcement actions on the processing of personal data in the context of ChatGPT.

The EDPB report disclaims that it only offers preliminary views without prejudging ongoing investigations regarding ChatGPT and remarks on compliance challenges, particularly concerning the model’s accuracy and transparency.

Large language models (“LLMs”) like ChatGPT are trained and enhanced using a huge amount of data, including personal data,  some of which is obtained from web scraping. 

ChatGPT service, the first consumer-facing model, was launched on 30 November 2022. Since then, several Supervisory Authorities have initiated data protection investigations against OpenAI OpCo as controller for processing operations carried out in the context of the ChatGPT service.

The Taskforce report focuses on ChatGPT’s compliance with the General Data Protection Regulation (GDPR) principles of lawfulness, fairness, transparency, and data accuracy, as well as data subject rights.

The GDPR requires that all data processing must be based on one of the legal bases set out in Article 6(1) of the GDPR (consent; contract; legal obligation; vital interests; public task; or legitimate interests) and, where applicable, the additional requirements laid out in Article 9(2).

Additionally, it is necessary to distinguish between the different stages of personal data processing:

  • collection of training data (including the use of web scraping data or reuse of datasets),
  • pre-processing of the data (including filtering),
  • training,
  • prompts and outputs (responses) of ChatGPT, as well as
  • ChatGPT training via prompts.

The report particularly addresses the lawfulness of web scraping. Web scraping, which is the collection of personal data from publicly available sources on the internet –potentially including sensitive personal data under GDPR Article 9(1)– is no exception. Even though personal data is publicly available, the processing of that data by LLMs still requires a legal basis for such processing.

OpenAI argues its use under Article 6(1)(f), citing legitimate interests, The assessment of this legal basis must consider: i) the existence of a legitimate interest, ii) the necessity of processing, ensuring data is adequate, relevant, and limited to what is necessary for its intended purposes, and iii) a careful balance between the fundamental rights of data subjects and the legitimate interests of the controller. It is crucial to also consider the reasonable expectations of data subjects in this evaluation.

In addition, safeguards such as precise collection criteria and data anonymization are important to mitigate the impact on data subjects. The burden of proof for proving the effectiveness of such measures lies with OpenAI as the controller.

The principle of fairness of Article 5(1)(a) of the GDPR requires that personal data should not be processed in a way that is unjustifiably detrimental, unlawfully discriminatory, unexpected, or misleading to the data subject. 

Therefore, the responsibility for ensuring fairness falls on OpenAI, and not on the data subjects, even when individuals input personal data.

Controllers should not shift risks to data subjects, such as by placing a clause in the Terms and Conditions stating that data subjects are responsible for their “inputs” in the chat.

When collecting large amounts of data through web scraping, it is not practicable to inform each data subject about the web scraping. Therefore, the exemption under Article 14(5)(b) of the GDPR could apply as long as all requirements are met. 

However, when personal data is collected while directly interacting with ChatGPT, Article 13 of the GDPR applies: it’s important to notify individuals that any data they input directly could be utilized for training purposes.

The EDPB distinguishes between input data and output data. 

  • Input data can include data collected through methods like web scraping, as well as content provided by data subjects when using ChatGPT. 
  • Output data refers to the responses generated during interactions with ChatGPT. 

The principle of data accuracy in Article 5(1)(d) of the GDPR applies to both. Therefore, OpenAI is encouraged to clearly explain how ChatGPT’s outputs are generated, acknowledging their probabilistic nature and potential biases. This transparency is essential under the GDPR –for both output and input data–. 

The EDPB states that the purpose of data processing is to train ChatGPT and not necessarily to provide factually accurate information. In this context, the EDPB warns that the responses given by ChatGPT may be interpreted as factually accurate by users, including information about individuals, regardless of its actual accuracy. In any case, the principle of accuracy must be upheld.

The report stresses the importance for data subjects to be able to exercise their rights effectively and easily –their right to access personal data, be informed on how it’s processed, delete, rectify, restrict the processing, or file a complaint to a Supervisory Authority–. 

Furthermore, the report establishes that OpenAI should keep improving the methods available for exercising data subject rights. Specifically, this is because currently, OpenAI advises users to opt for erasure over rectification when rectification is impractical due to the technical complexities of ChatGPT.

While the EDPB report is not formal guidance, it provides a good framework for AI developers in order to comply with the GDPR. It highlights the responsibilities of AI developers –such as Open AI– and also companies that deploy AI to ensure data protection by complying to the GDPR. Developers must align their practices with the GDPR and comply with the principles of lawfulness, fairness, transparency, data accuracy, and accessible data subject rights when using LLMs such as ChatGPT. 

The report questions the practice of using web scraping for training datasets and emphasizes the importance of strong safeguards to protect personal data, particularly sensitive information.

Despite being preliminary, the implications of the Taskforce findings are interesting to anyone using LLMs. As organizations work to understand their data protection responsibilities regarding LLMs, more EDPB guidance will be valuable. In the meantime, organizations should consider fairness, transparency, and data accuracy in their LLM deployment policies.

Share this article

Share

The Regulation establishes a legal framework for digital identities, ensuring they are recognized and trusted across all EU member states. It was endorsed by the European Parliament on February 28, with a vote of 335 to 190 with 31 abstentions, and adopted by the EU Council of Ministers on March 26. It finally came into effect on May 20. The EU Digital Identity will be available to EU citizens, residents, and businesses who want to identify themselves or provide confirmation of certain personal information. It can be used for both online and offline public and private services across the EU.

Created by:

Picture of Borneo

Borneo

Related articles

Search

Newsletter

Subscribe to our legal newsletter and you will be the first to receive our new blog articles, webinar information, ebooks, and more.

Free Webinars