ava's blog

legal justifications for a default opt-in?

It recently came to my attention that LinkedIn is planning to begin AI training on the content on their platform, just like Meta had previously decided this year, and that there is only a limited time to opt-out (until 2nd of November). That means: Users are automatically opted into the training, even the ones in the EU.

I wondered how that was possibly in alignment with current EU law.

In their notice, they say:

On November 3, 2025, we’ll start to use some data from members in these regions [EU, EEA, Switzerland, Canada, and Hong Kong] to train content-generating AI models that enhance your experience and better connect our members to opportunities. This may include data like details from your profile, and public content you post on LinkedIn; it does not include your private messages. We rely on legitimate interest to process your data for this purpose. You can opt out anytime in your settings if you’d prefer not to have your data used in this way.

The legitimate interest they mention here is Article 6(1)(f) GDPR, which says

"Processing shall be lawful only if and to the extent that at least one of the following applies: [...] processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child."

Legitimate interest is sort of a catch-all if the previous (a) - f)) options do not apply. It is vague on purpose and needs a lot of context to explain and a lot more case law to further explore. One thing helping to make sense of it is looking into the recitals, the reasoning and thoughts behind a paragraph.
More specifically named legitimate interests in those recitals include, for example, the processing of personal data strictly necessary for the purposes of preventing fraud, for direct marketing purposes1, or for the purposes of ensuring network and information security2.

In general, Recital 47 explains:

"The legitimate interests of a controller [...] may provide a legal basis for processing, provided that the interests or the fundamental rights and freedoms of the data subject are not overriding, taking into consideration the reasonable expectations of data subjects based on their relationship with the controller.
Such legitimate interest could exist for example where there is a relevant and appropriate relationship between the data subject and the controller in situations such as where the data subject is a client or in the service of the controller.
"

The company is the controller, the user is the data subject. It shows there needs to be a sort of relationship, a transactional and usually beneficial exchange between the user and the company. It would make no sense to have absolutely no ties to a company while they claim a legitimate interest in your data. If you have an account on LinkedIn, you obviously use their service, so this relationship is a given.

A more important key word is: reasonable expectations. Legitimate interest isn't limitless, as the recital clarifies:

At any rate, the existence of a legitimate interest would need careful assessment, including whether a data subject can reasonably expect at the time and in the context of the collection of the personal data that processing for that purpose may take place.
The interests and fundamental rights of the data subject could in particular override the interest of the data controller where personal data are processed in circumstances where data subjects do not reasonably expect further processing.
"

Reasonable expectations are once again also vague and leave room for interpretation - on purpose, as the circumstances can be wildly different and should be taken into account accordingly. This legal concept actually originated in US case law and keeps in mind the subjective expectation of someone who is not trained in legal matters, while using common sense ("knowledge of the general public"). It's acknowledged that there is a justified expectation of privacy, though it's unclear whether subjective misunderstandings or sensitivities are included in the assessment.

In the cases of automatic opt-in into AI, I've seen reasonable expectations be applied to the starting point of data collection. The argument is that when you signed up for LinkedIn years ago (lets say, 2016) for better job opportunities and networking, you did not have to expect that in 2025, your public data on the platform will be used for AI training. You could not have foreseen this development, therefore you could not have made the account and accumulated all this data with that knowledge in mind, implicitly agreeing to this future development by using the website (if you see it that way).
At the same time, it is said that the specific purposes for which personal data are processed should be determined at the time of collection, and any changes in the purposes requires a sort of re-validation, like renewed consent, for example. If you have signed up years ago and produced data on the platform for years, you only signed up with the purposes they had listed back then - not AI training.

This is why some legal experts argue that, at the very least, the training data should not involve anything before 2024, as during 2024, it should have become clear that this is where these platforms are headed.

☁️☁️☁️

It's pretty interesting that these types of automatic opt-ins into training always have a basis in legitimate interest, when they could, you know, actually ask for your consent directly. Article 6(1)(a) GDPR is right there, saying processing is lawful if the data subject has given consent to the processing of their personal data for one or more specific purposes.

LinkedIn, for example, actually is pretty specific in what they want their GenAI to do:

"[...]enhance your experience and better connect our members to opportunities. [...] help hirers find and reach you more easily, and assist members in creating content such as profile updates, messages, and posts."

A good option to do this is very common and even mentioned in Recital 32: ticking a box when visiting an internet website3.

My assumption on why they don't use this is that they would have to hope for a clearly given agreement, instead of hiding an automatic agreement in settings and hoping people don't opt out until the deadline.

The GDPR actually has a lot of standards for consent: Article 7 details that the controller has to be able to demonstrate that the data subject has consented, that the request for consent needs to be presented in a manner which is clearly distinguishable from other matters, in an intelligible and easily accessible form, using clear and plain language, and that the consent needs to be freely given. Otherwise, the agreement is not binding.

The last few years have shown that companies offering these types of online services are absolutely allergic to presenting things in a straight-forward manner: Just think about other settings like tracking or cookies. They use difficult language, hide other options behind more clicks or in an area you have to flip open, let you consent to 1000 things with one click while letting you reject with multiple clicks to make it more annoying, and use other dark patterns like different sizes and colors for buttons to sway you in your decision.

So imagine if they'd offered a notice somewhere asking you for consent. The risk would be many people not giving it, and the consent they do get might be void due to the manipulative presentation. It would be a nightmare for them, and rather than not employing dark patterns to inflate consent numbers, they'd rather try anything else.

But I actually think that these default opt-ins violate the GDPR anyway, regardless of legitimate interest, even if just in principle or ethos.

Firstly, in my view, it violates the principles of Privacy by Design and Privacy by Default, and in terms of AI, also the principles around data minimization and storage limitation.

Article 25 of the GDPR explains the first two, which mean implementing appropriate technical and organisational measures in an effective manner and to integrate safeguards in order to protect the rights of data subjects and to only collect what is strictly necessary.
Also, it says

”[…] by default personal data are not made accessible without the individual’s intervention to an indefinite number of natural persons.”

In practice, this article is used to demand privacy-friendly default profile settings for EU users and no automatic opt-ins or pre-checked boxes. It’s not clear to me why this seems to not apply to data training opt-outs, especially when you could argue that your data is ‘internalized’ and offered to a huge number of users by these AI models.

Article 5(1)(c) and (e) GDPR say the personal data shall be limited to what is necessary in relation to the purposes for which they are processed, and kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed. Unfortunately, AI training poses an interesting problem here: Is the data still identifiable in the training set (probably), and how would it be deleted when the person deletes their account or revokes consent later?

I wrote more about unique challenges of AI and the GDPR here that go further in-depth on these issues.

Secondly, the way the opt-in works is that your consent is presumed unless you actively make it known you disagree. That is still consent. Recital 32 Sentence 3 says:

"Silence, pre-ticked boxes or inactivity should not therefore constitute consent."

How is the default opt-in not a pre-ticked box? Effectively, it means you give consent not verbally or written, and not by conducting yourself in a way that implies consent (konkludentes Handeln), you give it by inactivity and silence. Also: How is consent considered freely given if you may not even have been informed of this change in settings? As I don't use LinkedIn, I am not sure how they inform you of this, but I hope they at least sent an e-mail or had a pop-up notice when you logged in.

It's important to note that the recitals are not the law itself, they just give context in how to interpret a law, so you cannot use these directly for a claim, but: It shows that these opt-ins go directly against what the spirit of the law, and they are often still referenced in court decisions.

Thirdly, due to the above, I think the fairness and transparency of the processing could (and should) be called into question (Article 5(1)(a) GDPR).
Recital 39 goes more in-depth about this, saying, for example:

"The principle of transparency requires that any information and communication relating to the processing of those personal data be easily accessible and easy to understand, and that clear and plain language be used." [...] Natural persons should be made aware of risks, rules, safeguards and rights in relation to the processing of personal data."

Finding this setting in LinkedIn requires so many clicks that our IT department had to include a visual guide. Profile > Settings > Data Privacy > Data for Generative AI Improvement, and then after you click the last thing, you are finally on the page where you can flip the switch. How easily accessible is that? Decide for yourself, but we all know they could do better if they wanted to, and the nesting has an effect.

Do you think LinkedIn did a great job teaching everyone about the risks, rules and safeguards of their AI? Also, on the topic of understanding and clear and plain language: Do you think people know that their opt-out in these settings is only for the data collected moving forward?

That's right: You opting out is for the data that will be stored and collected about you in the future. Anything else up until now is apparently up for grabs or was potentially already used for other training, and if you don't want that, they want you to send in a separate Data Processing Objection Form4. This often not mentioned in the media coverage of it, and I doubt most LinkedIn users know this. Even I would have assumed that users opt out for all of their existing content.

In any way, claiming legitimate interest does not free you from complying with the rest of the law. After all, a weighing of the interest and the fundamental rights and freedoms of the data subjects (users) is warranted and explicitly written. In general, legitimate interest is judged by a so-called three-part test, asking for purpose, necessity and balancing: Is there a legitimate interest, is the processing necessary for that purpose, and is it overridden by the individual's interests, rights or freedoms?

Now, what are these fundamental rights and freedoms?

These include data protection and privacy rights, but also more general human rights and interests. That means not only your rights under the GDPR and more, but also constitutional rights, rights under the Charter of Fundamental Rights of the European Union, local and national laws and so on.

Recital 75 clarifies a bit what the risks and impact to these rights are:

"The risk to the rights and freedoms of natural persons, of varying likelihood and severity, may result from personal data processing which could lead to physical, material or non-material damage, in particular: where the processing may give rise to discrimination, identity theft or fraud, financial loss, damage to the reputation, loss of confidentiality of personal data protected by professional secrecy, unauthorised reversal of pseudonymisation, or any other significant economic or social disadvantage [...]

After that, it follows up with more concrete examples, like:

The risks are increased as personal data of vulnerable people is concerned, in particular of children, or where processing involves a large amount of personal data and affects a large number of people. This is definitely given with the training of AI models with large datasets.

It is said that these concerns outweigh the legitimate interest (in the third part of the three-part test, aka the balancing) when the users would not reasonably expect the processing, would likely to object to the processing, the processing would have a significant impact on them or would prevent them from exercising their rights, especially when that data is sensitive5.

This balancing act is not for me to decide, it is to be decided by courts. However, I'd like to point out that LinkedIn is inseparably tied to employment, income, credibility in the field and more, which I find pretty high-stakes, plus the difficulty in exercising your GDPR rights with AI as detailed here and also, there is room for arguments that a significant amount of people would like to object.

If you wanna take a break reading this, now is a good point.

☁️☁️☁️

There's actually a legal precedent to all of this now: This year, Meta did the same thing, and won at the OLG Köln in Germany against all odds.

Starting from the beginning:

On the 14th of April this year, Meta announced AI training on all publicly available user data starting on the 27th of May. Following that, the German Consumer Rights Organisation NRW (Verbraucherzentrale NRW) brought an interim injunction case against Meta before the High Regional Court of Cologne (OLG Köln), aiming to prevent the processing of publicly provided user data. The reasons given were that, as outlined above, it might not fall under the legitimate interest clause, and it might violate Article 5(2)(b) of the Digital Markets Act (DMA).
That one says:

"The gatekeeper shall not do any of the following: [...] combine personal data from the relevant core platform service with personal data from any further core platform services or from any other services provided by the gatekeeper or with personal data from third-party services".

Meta is considered the gatekeeper, and they mean the fact that if you have both Facebook and Instagram, they will combine the data in their AI training.

An interim junction is when a court compels a party from doing something for a specific time period. Essentially, in easy words, this case was supposed to get the court to tell Meta to knock off the AI training for a couple more months in an attempt to delay it and use the meantime to further hash out legal issues, build case law, allow more research in AI to make a better decision, and so on. It would have given us all more time to decide, research, and wait for more courts' views on the challenges around AI and the GDPR, for example.

The court decided on 23rd of May, so four days before the training started, that the processing was valid and that no interim junction was necessary, meaning Meta was able to go ahead with it all.

The dutch Consumer Rights Organisation (Stichtung Onderzoek Marktinformatie (SOMI)) chose to go a similar path at the end of June 2025, but failed as well, as Meta was already using the data for a month at that point, so the court did not see the urgency needed for such an interim injunction as given.

I was lucky to be able to participate in a lecture held by David Wagner of Spirit Legal, who represented the Verbraucherzentrale (Germany's consumer organization) against Meta in the above mentioned case (OLG Köln - 15 UKl 2/256), so I was able to get a bit more insight into the proceedings and arguments.

Basically, the court mostly trusted the assessment of the Irish Data Protection Authority, which had released a statement shortly prior to the decision that the processing was now okay, and heard the Commissioner for Data Protection and Freedom of Information of Hamburg, which also gave the okay.

To be fair, the Authority already delayed the start of the AI training previously (as it was already planned to happen in 2024) and got Meta to adjust some things about the training, like the opt-out and measures for de-identification, which is generally good, but: They have a reputation for being biased towards the companies, and we will likely see more of that.

The Irish DPA has hired Niamh Sweeney in October this year, who is a former senior Meta lobbyist. She spent more than 6 years at Meta, and for 3.5 of those years, she was Head of Public Policy at Facebook in Ireland before becoming Director of Public Policy for Europe at WhatsApp. This practice of staffing regulatory agencies with (former) industry employees is a form of corruption and is also called regulatory capture. Make of that what you will. Either way:

The court concluded that there was indeed legitimate interest, that there is no less invasive way for Meta to train their AI, that the six weeks of decision time were long enough, that it sees no combination data of different core platforms, and that Meta has offered enough proof of security measures.

It bases this on the fact that the term "combination" is not legally defined in the DMA, and as long as the EU Courts don't define it further (via case law, for example), the court considers that partially de-identified data7 from two platforms in an unstructured training dataset is not a combination of data, as it would, in their view, require the data of the same person to be put together. This does not make sense to me, or anyone else I've seen or heard talking about it. It further argues that the mere applicability of the protective purpose of a norm does not yet mean that the case is also covered by the norm 8, which reads like complete mental gymnastics to me.

On legitimate interest, they say it can include not only legal and ideal interests, but also economic interests, referencing another court decision. They actually don't give that many arguments around the concerns voiced around the violation of principles such as data minimization and others, they just say they see these as given.

The court even denied the suggestion that there is a less invasive way for Meta to train their AI (like anonymized or synthetic data, or using so-called flywheel data, where AI models continuously improve by model customization and learning from the latest user feedback9), based on the fact that Meta has plausibly proven to them that this would not be practicable. Meta insisted that this would result in worse AI performance, which would negatively affect them in the market competition around AI (= economic interests).

I find that pretty laughable, as Meta is such a huge part of online services now that there is not even much competition to consider. And even if: Your competition cannot implant itself onto your platforms. If your users want an integrated chatbot, AI suggestions for their posts while writing them or whatever else you want to offer directly on the platform, they can't get these things from the competition, as this is not the same platform.

In the same paragraph, the court is also indulging in the idea of security by obscurity, saying that the amount of data is so high (about 40 million Facebook users and 31 million Instagram users in Germany at least) that an individual piece of data is inconsequential and meaningless. I find that a pretty disappointing view - at least one Meta chat bot has previously spit out a real address when it was asked where it lived, which resulted in a man traveling there (but unfortunately dying along the way).

The court later acknowledges this, saying that while training datasets are filled with only probability parameters, there is a risk of "memory" where the AI can reconstruct the original data, but ultimately doesn't consider this a risk that outweighs Meta's interests. It also disagrees with the idea that data before 2024 should not be used, essentially relying on the fact that the information that is supposed to be used for training is and has been public either way and by using the services, you agree and know this, and would have otherwise not used the service or set everything to private.

Another big part of the overall reasoning is that the court decided the users have enough ways to opt out, like via the GenAI opt-out settings or making individual posts or entire profiles private. It mentions that depending on your circumstances, you might not want to have to limit your discoverability to others just to evade training (influencers, restaurants etc.) and that you have no way of doing this if someone else has your information on their profile, but ultimately, gives it no real thought or weight in the discussion. The court doesn't see any of the risks as something that could truly happen.

An interesting focus was the AI Act Recital 105.

"General-purpose AI models, in particular large generative AI models, capable of generating text, images, and other content, [...] The development and training of such models require access to vast amounts of text, images, videos, and other data. Text and data mining techniques may be used extensively in this context for the retrieval and analysis of such content, which may be protected by copyright and related rights. [...] Where the rights to opt out has been expressly reserved in an appropriate manner, providers of general-purpose AI models need to obtain an authorisation from rightsholders if they want to carry out text and data mining over such works."

It's mentioned very little in the actual court document, but Mr. Wagner said it was actually a rather big talking point during the actual court proceedings, because it essentially gave the court the option to point to this Recital for a sort of intent on how to handle AI and consent and what the EU lawmakers envisioned, and say: Training with large amounts of available data is necessary and normal for these models and enough opt-out and control is possible, together with apparent safety measures.

All in all, the court left some of the very interesting questions open, did some interesting mental gymnastics and a lot of reasons were "Meta convinced us they'll be nice and safe."

I think what annoys me the most aside from that is that Meta submitted confidential proof as to why they are allowed to do this. The proof in question is a decision of the Commission of 23 April 2025 (C(2025) 2091), which, due to the confidentiality still in place, was only submitted as in a limited, excerpted form. Isn't that great? You can just go to court, saying you got a secret document you cannot fully show, but it totally proves you're right. ;)


Anyway, these are the thoughts, concerns, and some of the court insights as to why automatic opt-in for AI training seems to go ahead right now and why that might not be so great. We'll have to see what the future holds.

Reply via email
Published 24 Oct, 2025

  1. Recital 47 Sentence 6-7.

  2. Recital 49.

  3. Recital 32 Sentence 2.

  4. Here, under: "Can I opt out of the use of my personal data and the content I create to train LinkedIn's or its affiliates GAI models?"; "Opting out means that LinkedIn and its affiliates (including Microsoft) won’t use your LinkedIn data to improve models that generate content going forward, but does not affect training that has already taken place. In addition to the setting above, members can also object to the use of their personal data and content they create for training non-content generating AI models (e.g. models used to personalize your LinkedIn experience or models used for security, trust, or anti-abuse purposes) using the LinkedIn Data Processing Objection Form."

  5. I think the Guidelines of the EDPB are good to know and read, if all of this interests you further.

  6. Original text here, GDPRhub summary here.

  7. De-identified data here means that Meta has promised to remove complete name, e-mail addresses, phone numbers, national ID numbers, usernames, credit card numbers, license plates, IP addresses and physical addresses, while also offering the data in a tokenized way. They say they want to train the AI for local conventions and practices so it is a better cultural fit in each location. Important: This data is still not considered anonymized, especially because faces and other identifying characteristics are not removed from images.

  8. "Die bloße Einschlägigkeit des Schutzzwecks bedeutet noch nicht, dass der Fall von der Norm auch erfasst wird."

  9. Data flywheel by NVIDIA.

#2025 #data protection