I retired from Google and published this article in 2017, in the Stanford humor magazine, the Chapparal. The “humor” is that, supposedly, Google did it to itself voluntarily! Larry Page gloats, in 2019, at how well it worked. Pretty funny, huh? I thought so. I guess the editors did, too.

The idea was that in 2017 Google voluntarily split itself into Google Data and many other baby Googles (Ads, YouTube, Mail, Maps, etc.) that use Google Data, and pay for it. The data is also available to any other company willing to pay for it on the same terms that the baby Googles receive. Thus, rather than breaking up Alphabet “vertically” (separating out YouTube, the Ad Exchange, etc.) Google made its priceless collection of the history of the Internet available to everyone. It was a “horizontal” breakup.

The idea came from my own experience working with Google’s data, when I was in the Ads group. You can read about that here and here.

In writing about this seven years later, I came up with a less radical idea that anyone can understand. I’ll describe that one first.

Google’s Production Infrastructure

Within Google, there are two major networks: Corporate and Production (I’ll refer to them like Googlers do, as “corp” and “prod”). All services that a user accesses are in prod, while every employee’s personal machine is in corp. The separation between them is maintained rigidly.

When I joined in 2005, the count of machines Google owned was a closely guarded secret; “one million” was a ballpark estimate. Wikipedia’s article gives a public overview of the Google Data Centers, including a 2016 Gartner estimate of 2.5 million.

Nowadays, multicore machines are standard (up to 96 per server), a single server can be running many more than one process, and capacity is sometimes quoted in “cores” not “servers.” It’s actually even more complicated: there are GPUs and TPUs (Graphics Processing Units and Tensor Processing Units) which are especially useful in machine learning and AI applications (article). The data centers are literally all over the world.

The Wikipedia article also discloses a lot about the Google network, which would make it the third largest Internet Service Provider in the world. Data is stored or pre-positioned in various places around the world, so that it tends to be close to the user wherever they are. It’s the reason why users in Europe tend to access servers in European data centers and not in the US.

In addition to the hardware, there are the people. Prod employs thousands of people to design the system and keep it running. Google is well-known for using “cheap, unreliable, redundant” instead of “expensive, bullet-proof” hardware, and that hardware is expected to fail often. The Site Reliable Engineers (SRE’s) are the people who will attend to that as well as the frequent software failures, around the clock.

Sometimes engineers in corp (like me!) are asked to be on “pager duty” where they can be paged any time of the day or night if the SRE’s need something. Fortunately, I was never on pager duty.

Hundreds of engineers design those gigantic, 96-core servers I mentioned, and the Ethernet switches that keep the machines all connected. Statistical analysts crunch the numbers about failures, utilization, delays, power consumption, and every other production factor.

The Cloud

“Cloud computing” is a buzzword in corporate IT. Whereas years ago, nearly all corporations ran their own data centers with giant IBM mainframes in locked, freezing-cold rooms with raised floors, nowadays it’s common to outsource to a cloud vendor, or several. Amazon Web Services (AWS) is the leader in cloud computing, and it evolved because Amazon realized that it had already created a cloud infrastructure, so why not get some return by renting it to others?

It’s working spectacularly well; The Register says:

Amazon Web Services is on track to earn $100 billion of revenue in FY2024, has improved its margins, and provides the bulk of its parent company's operating income.

Google Cloud

Google offers a “cloud” as well, so in a sense, Google prod already is a separate business; it’s just that Google Cloud doesn’t own all of it. According to this report, it recorded $9.1 billion in revenue during its fourth quarter 2023, and was profitable.

You might ask what “profitability” means when they’re still part of Alphabet (Google’s parent company). Do they pay the real cost of the prod resources they’re using? The best guess is that they account for it using transfer pricing. This is an imprecise, arbitrary method that’s prone to error, either accidental or deliberate. From that article:

Google runs a regional headquarters in Singapore and a subsidiary in Australia. The Australian subsidiary provides sales and marketing support services to users and Australian companies. The Australian subsidiary also provides research services to Google worldwide. In FY 2012-13, Google Australia earned around $46 million as profit on revenues of $358 million. The corporate tax payment was estimated at AU$7.1 million, after claiming a tax credit of $4.5 million. When asked about why Google did not pay more taxes in Australia, Ms. Maile Carnegie, the former chief of Google Australia, replied that Singapore’s share in taxes was already paid in the country where they were headquartered. Google reported total tax payments of US $3.3 billion against revenues of $66 billion. The effective tax rates come to 19%, which is less than the statutory corporate tax rate of 35% in the US.

Google Prod would have immense economies of scale, so that the current Google Cloud would enjoy a massive infusion of resources.

The Spinoff

The proposal, then, is for Google Prod to be spun off as a separate company. Google Search, Ads, Maps, Mail, YouTube, etc. (call them “the baby Googles in homage to the 1982 AT&T breakup) would be corporations that purchase services from Google Prod. The current Google Cloud would become the public facing part of Google Prod.

I could refer to the new entity as just “Google Cloud 2.0” but that might get confusing, so I’ll continue calling it Google Prod.

Accounting for this spinoff would handled like any other spinoff: shareholders in Alphabet would receive shares in both Google Prod and all the baby Googles.

Although as far as I know, no major Google service depends on the current Google Cloud, in the future they all would. Instead of “transfer pricing” fictions, they would be paying hard cash. They would also be free to purchase resources from any other cloud vendor, should they choose.

We should note here that unprofitable baby Googles would be exposed to the merciless scrutiny of the market. If they’re paying more to Google Prod that they’re taking in, they either go out of business or get acquired.

The current Google Cloud already has non-Google customers; in fact, that’s almost all of its customers. In the spinoff, Cloud would inherit all of Google prod’s infrastructure, which means nearly all Google capital assets. Theoretically, they would then be able to offer dramatically better terms to their customers than they currently do.

Exclusivity: Could Google Prod offer any sort of exclusive deal with Google Ads or Search? Emphatically not. Any antitrust lawyer worth their salt would insist on prohibiting sweetheart deals or exclusivity.

Idea #2: Make the Data Its Own Company

This is what I wrote about seven years ago. It’s considerably more radical.

Inside Google, the data is already treated as though it were a separate company, except there’s no money transferred. It already uses agreements very much like contracts. You have to get permission to use any log, and if it’s not plausibly related to your job, permission is denied. Access to any Personally Identifying Information (PII) requires a VP’s permission. All accesses are logged, so if you’re fishing for information on your ex-spouse or even yourself, you’re going to be found out and fired.

When I first joined Ads, I had to fill out a special form to get permission, and take it in person to the woman in charge. I don’t recall her name, but I still remember she had her dog in the office with her.

The data consists of hundreds of separate logs. There are the obvious ones (who queried what, what ads were shown, and the responses) and lots of others you’d never even imagine.

What’s In There?

Virtually everything important about the Internet is in there, and thus, it’s something an Artificial Intelligence system can be trained on. Even before the term “AI” was on everyone’s lips (i.e. in 2017 when I wrote the paper), it was everything you’d need to train a search / ads engine. If the logs were removed, Google would cease to exist.

Google Data is the condensed history of the Internet. If you did a search for “foreign banks that accept US citizens” five years ago, that’s in there, along with the results and why they were shown (did they broaden your query to include brokerages and mutual funds?), the ads, why they were shown, whether you clicked on the ads, and, quite often, whether you bought the things they advertised. That last one is called a “conversion” which I worked on, a lot.

Google Search, Ads, and all its other products are trained on all that data, and that’s a big reason why Google remains dominant. Now that Artificial Intelligence has reached the public consciousness and popular services like Reddit strike deals for training on their data, it’s become easier for the general public to understand this: AI requires data and lots of processing time to create intelligence. That’s why Microsoft and Google are arranging their own nuclear power plants: it takes city-level power to crunch all that data.

Moreover, Google’s own data is only available to Google. That means that anyone else seeking to build a competitive search engine or ads engine has an insurmountable handicap: they can’t possibly know what’s worked before as well as Google can.

What Are They Doing With It?

With my example of the “foreign banks that accept US citizens” query, you might be concerned about who gets to see that data. Maybe your spouse didn’t know you were investigating that and now there are divorce proceedings. “Did you actually open an account in one of those banks?” their lawyer would want to know.

Law Enforcement can see it, of course, depending on what they want and whether they have a subpoena or search warrant (explanation). This would not change under this breakup proposal: police can always demand whatever they can convince a judge is relevant.

But what about other uses, beyond just “improving your search experience” as they like to say? What’s to stop any future customer of Google Data from cross-correlating all your data; not just searches and ad clicks, but your browsing history (via Chrome), photos, car trips, store visits, and hotel stays (Maps), the YouTube videos you watched, the mail you’ve gotten, etc. ? Google can swear that they don’t do that, but if we let anyone with the money get at that data, they might.

The answer to that is: so could Google, and we’d probably never know. By making all data access contractual, backed up by legislation and regulations, we have the best chance of heading it off. Google Data would be required to log every access (as they already do), and to promptly terminate any customer violating their terms of access.

It’s worth reiterating that Google already does most of this! All accesses by Google employees are already checked for permissions and logged. Just because you’re an employee on the inside, it doesn’t mean you can look at anything.

The Value of Your Data

I made the title of that Stanford Chapparal article, “You Have My Data; Now Where’s My Check?” to emphasize that your data is personal to you, and you should be paid for it. But what’s it worth? Looked at narrowly: not much.

One thing people don’t realize about Google’s data and whether you, personally, are in it:

Most of the time, no one cares about you. It’s the millions of people like you that matter.

I did countless scans of the logs for my job. In every case, it was to count the queries or users or ads that matched some criteria, or classify them in some way. It was never to research any particular person. The “person id” attached to the queries was a numeric identifier, called a Gaia ID. The file with their identity and what Google knows about them was highly classified and I would not be able to access it.

But suppose I, as a third party customer of Google Data, wanted to refine my ads-placement software, and I need to analyze all the ads that Google has ever run for foreign banks offering accounts to US citizens? Let’s say Google Data, Inc. charges me $100,000 to do that analysis. This is a great business for them: they have all this data sitting around and customers pay good money to use it!

Chances are, if your data wasn’t in there, the price would still be $100,000. If you took it to some competitor of Google Data and offered it, they wouldn’t offer you more than $10 or so.

What if there were a better way to capture the inherent value of all your data, and pay you regularly for it?

Google Data as a REIT

One way, of course, is to give out common shares in Google Data, Inc. to everyone whose data they have. This would be a fine solution, but many people don’t know what to do with stock, so they would probably sell it right away and sharp businessmen would hoover it up, like Russian oligarchs did after the fall of communism.

The key insight here is that data is an asset class, like land.

Data is the new real estate

The list of all searches that Internet users have performed in a given year, or all the ads that were brought to users’ attention and which they clicked (if any), is a finite resource, just like the land under the Empire State Building. Like that land, it can be monetized by renting it out to various tenants who can use it for their own purposes. Although you might object that data can be copied, unlike land, it still carries maintenance costs: servers, software, staff, tools, buildings, power and cooling for all those servers.

Data can be improved with summaries: cleaning and editing, analysis tools, and visualization tools, just as land can be improved with streets, buildings, and utilities.

Real estate has network effects: having other businesses and services nearby adds immeasurably to its value. Similarly, data has network effects, in that adding more data and more types of data adds to the value of all of it.

Unlike real estate, though, the value of data tends not to increase with time. Just the opposite, if anything.

Why Do REITs Exist?

Real Estate Investment Trusts may be unfamiliar to even some sophisticated investors, but basically they were created in the 1950’s as a way to securitize real estate so that ordinary people can participate in it. That was the thought, anyway.

Unlike most stocks, a REIT is required to distribute 90% of its profits to its shareholders (it was originally 95% but the REIT Modernization Act of 1999 reduced it to 90%). If it does this, it avoids corporate taxes, and the shareholders instead pay income tax on the dividends.

I asked ChatGPT why this policy exists. I know people hate AI, so I’ll just give this one part of its answer:

Investor Protection and Steady Income: For individual investors, especially those seeking income, this rule guarantees a steady flow of dividends. It ensures that REITs return most of their profits to shareholders, rather than reinvesting heavily in new projects or accumulating excess cash reserves.

How does a REIT get created? One way, of course, is for some real estate operators to create it. But another, more interesting for our purposes, is the spinoff. A company discovers it has an asset (real estate, in the classic example) which would be worth more as a separate company, and especially, could attract other parties better able to utilize it.

Here’s one recent example: in 2015, Darden Restaurants, Inc. spun off the Four Corners Property Trust, Inc. (LCPT) as a REIT. You might not know what Darden Restaurants was, but it owns or has owned Olive Garden, LongHorn Steakhouse, Yard House, Ruth’s Chris Steak House, Cheddar’s Scratch Kitchen, The Capital Grille, Seasons 52, Eddie V’s and Bahama Breeze. Some of the Olive Garden and Longhorn Steakhouse restaurants went to LCPT.

In the same way, separating out Google Data would actually increase the value of the combined companies. You’re delusional if you doubt that there are companies and individuals out there who can make better use of Google’s data than Google itself has time to do.

This is true of almost any giant company. Railroads have hundreds of square miles of right-of-way that was granted a long, long time ago. Instead of developing that land themselves, they can spin off a REIT to do it.

What’s In It For You?

That distribution of 90% of the profits to the shareholders; that’s what’s in it for you. You don’t have to sell your shares (although you can); you just get a regular dividend check.

The plan is, if you have any data in Google Data, you get shares in proportion to how much they know about you.

Conclusions

I’ve laid out two independent divestiture plans:

Spin off the production infrastructure. Let the current Google Cloud business be vastly expanded to include it. All services that Google currently runs become its customers, and they pay real money. Spin off the data. Pay the customers whose data it’s using.

These are not mutually exclusive. We could do both.

Spin off the data

Option #2 is the more radical. Considering “data” to be “real estate” might require legislation (I’ll leave that to the lawyers). It has the advantage that ordinary people who are the customers receive payment. Recall that saying, “If you can’t tell, on a free service, what the product is, YOU are the product.”

Google Search and Ads would no longer have a self-sustaining monopoly. Competitors would have the same history that they have. This, too, would open up endless legal wrangling about what data is included, what is kept proprietary, and who gets to update it.

Spin off the production infrastructure

Option #1 is the easiest to understand. The “baby Googles” would be put on the same footing as the thousands of Web businesses that buy their computing from AWS. Google’s monopoly power, which enables them to build a better cloud infrastructure than anyone else has, would be broken. Anyone with the money can have access to the same data centers that Google now enjoys.

Intel is reportedly considering exactly this sort of breakup.

Asked at an investment conference Wednesday whether Intel’s semiconductor design and factory businesses might eventually be entirely separate companies, Chief Financial Officer David Zinsner said it’s important that they operate separately. He said each side of the business will have internal systems that maintain that independence.

There are many, many other considerations to this breakup. However, it’s straightforward and easy for anyone to understand.