Good cookies and bad cookies: What’s next for online match keys?

What will marketers eat if the third party cookie crumbles (“TPC”)? If using non-cookie based technology, will these match keys also come under threat? There are insights in a recent update report by the UK Competition and Markets Authority (“CMA”). The CMA is reviewing the retirement of the third party cookie from the Google Chrome browser and supervising proposed alternatives which Google claims will provide equivalence (e.g., the Topics, Protected Audience, and Attribution APIs).

This was always going to prove challenging, as there are many helpful uses of identifiers allowing for a personalized web experience and benefits to all involved from tailored systems (e.g., fewer but more relevant adverts driving higher value for content). Yet it was also possible for bad actors to abuse the system through intrusive or distasteful practices (e.g., advertising based on protected categories). The difficult task here is to preserve responsible and valuable data use, and not to throw out the data baby with the bathwater in addressing the bad actors.

This note provides highlights on the state of play and areas for urgent engagement if concerned about the developments.

Testing, Testing, Testing

Unusually, the CMA is involved in reviewing the new Google technologies before they are deployed. The CMA has been at pains to ask for rival third party testing data. Assembling a market-wide picture is difficult. An attempt was made by CalTech researchers to do this in early 2022, by statistically modelling the loss of TPCs across a number of AdTech vendors. Their report pre-dated details of the current Privacy Sandbox proposals and did not therefore model the impact of them. Creating a similar study based on this is critical, and work needs to start very soon to meet Google’s desired Q2 2024 deadline for input.

The key to such a study will be to identify whether the replacements for the TPC allow others to compete with Google. There are serious concerns that this will not be so. For example, a range of niche data uses will be undermined by the loss of rich data sets. As these do not currently cause any apparent harm, but allow much better tailoring of data-driven services and advertising, a significant concern arises that the true reason for data restriction is to undermine competition from Google’s rivals.

Some readers will be familiar with the very interesting 2021 Alibaba study highlighting the extremely high value of personalization:

Source: Alibaba (2021)

As rich data was removed, the top 1000 ranked items received 90% of all exposures (red). Personalization (blue) delivered a much more diverse internet experience, with product views widely distributed according to tastes.

Google’s essential position is that rivals should be denied the broad, rich and deep data needed to compete for the blue portion of the traffic, while retaining significant insights itself, since Google will still be able to see relevant insights for personalization via other routes.

The key to such a study would be to identify areas where the proposals cannot be used to deliver a rich personalization experience, and to identify customer demand for these services (e.g., that of advertisers). This will avoid a scenario where the CMA reviews the Privacy Sandbox and, in the absence of competing evidence, simply gives it the thumbs up.

With the right study, it would be necessary for Google to open up its proposals before they are approved. However, the study must be done. Without it, Google will pick up the ball and run largely unopposed. There are concerning indications in the current report that this is happening:

The CMA’s update table says that despite a growing list of industry concerns the CMA has “no concerns” as of today regarding the future review of equivalence. This refers to the crucial CMA review of whether to allow Google to retire the TPC. Unless opposed, this will undermine a key lever in the review package, which is that Google must show data on equivalence, for others to comment, before the review takes place. Seen in this way, the CMA’s statement of a need for testing data by a deadline (Q2 2024) is a statement of urgent need for alternative evaluations of Google’s proprietary APIs.
There is an odd statement that the purpose of earlier testing was not to show equivalence (in that case, of Topics) or even effectiveness of Google’s new proposals (para. 18, CMA update). However, this is not the focus of the Commitments, which are supposed to look at equivalence of the replacements. It suggests that there is not sufficient information to assess equivalence and thus the CMA does not yet have evidence as to the potential impact on rival stakeholders as required under Google’s Commitments (Commitments, para. 17(c)(5)).
There is a warning shot from the CMA to Google on the need for transparent and fair methodology (para. 19, CMA update). A warning shot is fine and well, but the CMA needs rivals and – especially – affected customers to come in with the cavalry or it will not mean anything.

Are there discriminatory impacts from removing support for open standards from web-enabled software?

The core point in the entire Commitments package is to avoid so-called competitive discrimination. This exists where rivals cannot compete as well because of changes such as the removal of TPCs.

Google has long played a sophisticated game here: by saying that it is losing data as well, it can argue that, nominally, there would be no discrimination. Everyone lost the TPC data, including Google. The glaring issue is that, factually, Google will still have access to significant data sources, including those that it is restricting from rivals (e.g., restrictions on retargeting within Fledge/Protected Audiences and restrictions on cross-site matching of data within its Attribution APIs).

This is why looking at the impacts on Google is a “sleeveless errand”. Such an analysis will likely simply show that just as Google had good data sources before the loss of 3PCs, it also has good data sources afterwards. Unless the CMA can get under the hood (bonnet!) of the whole Googleplex – unrealistic – then it is doubtful that there would be comfort that information on what Google can do will say anything much about what competing vendors can do. There are just too many unknowns.

Moreover, by looking only across rival publisher properties while ignoring the competition of advertising within Google’s Search, YouTube and other 13 properties that it advertises attract more than 500m unique users, a myopic analysis would ignore the distortion to competition of this traffic becoming more valuable because of impaired competition in the open web. This may well result in a shift in spend to search, which would be a significant vertical foreclosure concern as the competition from open web advertising (OpenRTB) is then impaired in order to drive traffic to increasingly valuable, and scarce, search advertising. This significant competitive relationship was recently highlighted by the BDVW (IAB Germany) in its submission to the German competition authority (Bundeskartellamt).

It is essential that the analysis focuses not on Google’s capabilities, but on what others can do. This is the only way to see whether Google is constrained by competition with them, so that users of the technology benefit from a range of rich data-driven products with competitive pricing.

However, experience to date with the CMA bringing issues to Google’s attention has not inspired confidence. One of the most arbitrary data handling limits in the Privacy Sandbox is first party sets (“FPS”). FPS limits the scope for data handling across a specified set of domains. This makes little sense where low-risk data handling is at play: adverts for sweaters can appear across 100, or 1,000, domains, without any harm.

FPS is, essentially, an answer to a question no one was asking. Rather, Google has asked the question: “Please may I restrict data handling by others?”

Shrewdly, the CMA has pushed back. There is no clear consumer benefit from FPS and every reason to suspect competitive foul play. For several reporting cycles, Google has said it is “evaluating” revisions here. Yet despite the CMA report asking for this to be evaluated yet again (CMA report, para. 32) Google says in its accompanying report that it is “evaluating the numeric limit” – in the sense of the number of domains (p.25, Google Report). It is not considering whether to have any such limit.

No justifiable rationale for this has ever been provided. Instead, Google insists that there must be consumer control over the “plumbing” of the internet, well beyond any reasonable specification of consumer interests or risks of harm. FPS is the core example: what does it matter if an innocuous sporting goods advert is shown across the FPS domain boundary, or not? If there is a concern about some adverts, e.g., those based on sensitive categories, then this does not depend on the domain boundary and is a global property of adverts wherever they appear – including on pure first-party systems. This must raise a suspicion: is the FPS domain boundary not simply a mightily convenient excuse to restrict rivals’ effectiveness by a vendor largely unaffected by it, given Google’s large range of first-party websites and data handling systems? It is unclear why the FPS limit is needed at all.

This highlights the urgent need for engagement: if a rival were to come in with data on the value of the data handling proposed to be restricted, similar to the Alibaba or CalTech studies but updated for the Privacy Sandbox, it would provide ammunition in the fight against arbitrary data handling restrictions. Without it, the ball just keeps rolling downhill despite the growing list of concerns Google is publishing in each of its quarterly reports.

You should have come to the first party

The FPS experience is part of a wider debate about so-called “first-” and “third-party” data use. This is the argument that a direct relationship with the consumer is required for consent-based data handling to be valid (that is, a first party relationship).

This will be a familiar concept to fans of the silver screen: who could forget Groucho and Chico Marx arguing over “the party in the first part” and whether they should be called “the party in the first part” in A Night at the Opera? Truly there is nothing new under the sun.

In the movie, Groucho memorably says: “You should have come to the first party.” So it is with proposals relating to data handling. There is an argument that only first-party data should be used and that it should only be combined across those with a direct customer consent. However, such a world would lose significant insights from data combinations, even if there is no harm from using the data, and considerable benefit from improved access to business-facing solution providers that can help smaller businesses compete with vertically integrated rivals.

From the consumer point of view, this restriction seems as arbitrary as Groucho and Chico’s negotiation. What does it matter if tiny small print enables five newspapers to combine data processing, or if, absent consent, de-identified data is used to create insights across more vendors? The issue for the consumer is whether there is harm, and to preserve the indirect benefit from free content. It is well documented that rich third party data sets add more value. No fewer than five studies from 2011-2020 – including one from the CMA itself – found a range of 50-70% marginal value from having access to interoperable match keys. So, if the use is harmless (sweater advert) then the consumer will lose out, indirectly, from worse advertising. Lower value advertising in turn means more adverts per piece of content and less resource for publishers.

However, some publishers benefit from a data poverty scenario. This is because they have relatively strong brands and compete with the automated, interoperable data-rich systems. Some have woken up to the possibility that large investments in first-party data systems will now have to compete with Google’s proposal for synthesised third-party data handling known as Topics.

Topics is, as the name implies, a means by which websites are coded by topic. There are some significant developments here:

Topics is seen not to be adequate for some uses. Google is reported to have abandoned a Topics API classifier which would have used web address information. This leaves proposals between a rock and a hard place: website data is needed to encode properly, but it is being restricted from rivals.
Publishers wishing to move to first-party data complained loudly that Topics amounts to unfair competition, as it provides richer data than first-party systems. In a sense, it makes third party data into a monopoly – that using the Topics API. Those wishing for data poverty would say that is unfair as the first-party systems become less attractive in consequence. In pushing back, Google appears to have made a major concession: the ability to combine data across websites is “highly valuable” and that if it devalues first-party systems then this is simply par for the course (p. 7, Google Report). The obvious, unanswered, question is why then Google’s Topics API should be the only one to do it. Essentially, Google has admitted that third-party data handling is “highly valuable” and that, in competition terms, it sits in its own relevant market. This is a major, and quite possibly inadvertent, concession arising from an unrelated bun fight with publishers.
Those wishing to use third-party data should note this opening and provide examples of the marginal value of their own third-party data use so as to avoid a Google monopoly on – in Google’s words – a “highly valuable” asset. This is especially so as Google has now conceded that it is possible to combine first-party data with Topics (p.8, Google Report): why not allow third-party combinations, as well? No reason is given.
Publishers voiced concerns about the loss of control over how sites are coded (p.9, Google Report). Google’s report cannot be faulted for a lack of gumption: it says that (a) a misclassified website can always sell contextual adverts and (b) that if there is misclassification it will average out across different websites (!). There are obvious concerns, especially for high quality publishers, from the loss of input into how advertising on their sites works.

Where is the party?

Where processing takes place is a major practical question. Early Privacy Sandbox proposals had this on-device, which could limit competition by preventing the use of competing systems on remote servers — not to mention harming consumers by diminishing battery life. No reason was ever given why consumers needed advertising processing to take place on their phones.

The Protected Audience API has opened this to a degree of off-device processing, but only via two approved vendors: Amazon Web Services and… Google! It is time for your best Claude Rains impression – I am shocked, shocked to find that processing is allowed at these two big tech providers! Is this a not-so-subtle message to Microsoft to align with Google here, to allow Azure to be blessed as well?

The reason given for this continuing restriction is very weak: Google argues that it would be necessary to visit every server farm to verify on-premises security (para. 28, CMA Report; Google Report, p.10). But this is just a general property of web servers. It is not a principled reason to prevent the use of competing servers.

Essentially, the proposal is to tie the Privacy Sandbox to certain cloud providers. It is doubtful that this is legal under competition law principles on technological tying. The concern about premises visits also contrasts with Google’s thematic position on the Privacy Sandbox, which is that it is only providing APIs and then letting others do as they wish with them.

Beyond the concern about server location, there is also a significant concern about interoperation of data sets. It was pointed out to Google — although, perhaps, not flagged to the CMA as the concern is not in the CMA report — that there is no clear opt in signal in the Protected Audience API. This contrasts with industry initiatives to include preference signals, e.g., IAB’s GPP and MSPA proposals.

This leaves the Protected Audience as an island of data that cannot be used with other systems. It will be important for competing data users to pick up this thread with the CMA so as to address this important aspect of interoperability in its next Report in response to the CMA’s invitation for feedback on the Protected Audience API.

Other unexplained restrictions

Several other restrictions call for engagement by those affected:

Cross device data use – This is said to be a privacy concern (Google Report, p.14) – but there are many examples of useful cross-device handling. Any synced login service provides this, and it is sometimes even a selling point (Apple iPhone, iPad and Macbook integration springs to mind). As with one-device use, the question is always whether there is harm. The lack of engagement from Google on important cross-device use cases is concerning considering significant benefits (e.g., desktop search for restaurant and phone advert for same cuisine when out and about).
Bounce tracking — This is a very significant restriction as it prevents rivals using URLs to identify data points. It is equivalent to many other restrictions, and the restrictions ought to be analysed on an equivalent basis. Yet there is little analysis to date.
Aggregation Service — Google is launching a “safe” attribution reporting service, in competition with other vendors. This is welcome, but not on the basis of arbitrary restriction of data to those other vendors. Time delay has been decreased in the report to a 0-10 minute delay, but real time data is still restricted from rivals ad solutions (Google Report, p.20) – and with that restriction, interoperation becomes very difficult. Competing providers of attribution, of whom there are many, should step forward to point out that data restrictions unduly prevent competition with Google over the Aggregation Service.

Who else is at the party?

An important question for any engagement with regulators is who else is speaking up. The CMA Report provides interesting insights:

Advertisers are now speaking up. They are voicing concerns about service niche content; about Attribution Reporting not aligning with other measurement, and about the loss of measurement of reach across platforms and devices (para. 47, CMA Report). These are all crucial advertiser technologies and a major focus of commercial activities, e.g., audience analysis. Indeed, that is the original attribution service: a Nielsen panel! Google simply says it is “exploring features” (p.22, Google Report). The CMA should not permit another “dog ate homework” response – but to get there, advertisers will need to speak up more and provide concrete evidence of harm and proposals for mitigation (e.g., abolition of cross-site and cross-device restrictions).
SSPs have spoken up about major concerns: the loss of frequency caps, time and cost of API implementation, self-preference for Google Ad Manager, and — most significantly of all — the loss of interoperation of openRTB (para. 48, CMA Report). As with advertisers, the key will be evidence and proposals for changes will be key.
Non-Google cookie successor providers: Some data restrictions would harm responsible high-quality data handling systems innovation, including alternatives to the Privacy Sandbox. A clear solution here would be allowing any system meeting objective criteria to work in Chrome. Now is the time to speak up, before the horse has left the stable.

Reporting

There is a concern that Google does not always make the significance of the Commitments clear despite obligations to do so in the Commitments package. It would be helpful to have more pushback here from the CMA, not least, as otherwise the Reporting could be seen to bless Google’s approach in any review.

Significantly, there is an innovation in the latest Google report: about half the Report now provides significant detail about what Google has done. This may well be designed to provide points to defend TPC withdrawal in the event that the CMA and Google disagree and the matter goes to court. Some very significant points are hiding in plain sight, e.g., providers of alternatives to the Privacy Sandbox argued that they are foreclosed. Google says it “welcomes efforts to develop alternatives” but that it “will always keep in mind the privacy, safety, and security of its users” (p.38, Google Report).

Unless this is challenged, it could be taken to have given the CMA notice that Google regards alternatives as — for some specified reason — not private, unsafe, or insecure. This is untrue even today – a sweater advert on OpenRTB is hardly “insecure” in any meaningful sense — so why the prejudicial statement? There is a need for comment against these sneaky leading statements, especially where they lack an evidence base.

Takeaways

It is a critical time to engage with the process surrounding Third Party Cookie withdrawal.
This needs to come in the form of quantitative tests, to ensure that the CMA has the full picture about the Privacy Sandbox and its impacts. If there is concern about the costs to individual firms of testing, this can be done on a cross-industry basis using a shared expert report.
This can be provided with full whistleblower protections, preventing reprisals.
The most critical element is expert reporting on the commercial impact of the loss of rich data sets and the impact this will have on serving customers.

However, if the work is not done, then the Privacy Sandbox will become a reality through simple inertia – and a precedent will be set for withdrawing other identifier-based technologies, notably the Android MAID.

Leave a Reply Cancel reply