The Billion User Table

The login is the gateway to the internet. And it’s about to get decentralized.


Jon Stokes

Jul 23, 2021 • 12 min read

I'm a localist and an anti-monopolist, so I've been thinking for a few years about the how's and why's of breaking up Big Tech via anti-monopoly legislation. It's because I've been pondering the nuts-and-bolts mechanics of how to break up Big Tech and then cap its size, that I've lately come to realize that the entire *aaS application stack is about to get massively disrupted in a very specific and under-appreciated way by the blockchain.

Every web-scale software player, from small B2B IaaS products to big consumer-facing SaaS megaplatforms (Facebook, Google, Amazon), is about to get its eggs scrambled.

Here's what's coming: the public blockchain amounts to a single, massive users table for the entire Internet, and the next wave of distributed applications will be built on top of it. This has all kinds of market and political implications that I'm just now starting to get my head around.

Why the users table matters

As a society, we realized a long time ago that if we let banking go entirely unregulated, then we end up with these mammoth, rickety entities that lurch from crisis to crisis and drag us all down with them. So when we set about putting regulatory limits on banks, we used a few simple, difficult-to-game numbers that we could use as proxies for size and systemic risk.

While there's still plenty of debate about what these numbers — e.g., tier 1 capital ratio, total assets, total revenues, etc. — really tell us about a bank's overall systemic risk, the fact that we have some key numerical targets for regulation is what once made it possible for us to limit Big Finance back in the bygone era when we used to think it was important to do that.

What would an equivalent size number for a social network or other *aaS product be? I think the answer is clearly the size of the platform's users table.

Every web application, from the most embryonic MVP all the way up to Google, has a table called users (very few apps might give it a different name) whose job it is to hold the information that lets users log into and use the software — email address or phone number and (encrypted) password are the basic rows you'll find in this table, but you also find a ton more proprietary data in there and in linked tables.

Here’s how important this one table is: If you went around the right Silicon Valley neighborhood at 3 AM on a Tuesday morning knocking on doors at random, every product owner you managed to stir from a deep slumber could look at the clock and then immediately tell you the approximate size of their users table at that moment. That's because the size of the users table is the main number everyone in the software game lives and dies by. Adding rows to your platform's users table is how you win at software.

Even if your users are registering via a social sign-on button — i.e., they sign in with Google, Facebook, Amazon, etc. — you've still got a users table with information that lets you track users and market to them. In fact, let's list some user-facing things you can do with your users table:

  • Record app-specific details, like transaction history, reputation markers (karma, badges earned, user ratings), social graph links, activity feeds, etc.
  • Support ticketing
  • Account recovery
  • Marketing and remarketing

Then there are the business- and investor-facing things you can do with this data, like sell ads, raise new investment, and so on.

The most critical thing your users table does for you, though, is it gives you access to network effects, and these network effects give you some lock-in. The more users your platform has, the more value it has to any one user (cf. Metcalfe's law), and the more value it has to any one user, then A) the easier it is to attract new users, and B) the harder it is for any one user to leave (because to leave is to give up all that value).

So the entire online attention economy is built around proprietary users tables that different apps jealously guard and are constantly trying to grow.

This being the case, the size of this table is a direct measure of the size of a tech platform. It's not a proxy measure, either. It's truly direct, because it's literally the same number that the platforms are using internally and that partners and investors are using externally.

If you wanted to cap the size of a platform, then, you'd do things to limit the size of its users table.

So if I were trying to determine if a platform is big enough to warrant regulatory intervention, I'd subpoena the integer result of this one query and go from there:

SELECT COUNT(*) FROM users;

Similarly, if I were going to break up, say, Alphabet into Google and YouTube components, I'd make the two companies maintain totally separate users tables. I'd then forbid these two entities from accessing each other's user data via non-public interfaces, so that if YouTube wants to host Google users then it has to enable social logins for Google just like every other service.

What it means that everyone has a separate users table

In the present incarnation of the web, all these separate users tables enforce a specific kind of interoperability architecture even on apps that want to be fairly open with their own user data. Specifically, everyone's proprietary user data sits behind an API that any third-party app has to call if it wants access.

The result is a decentralized network of siloed tables connected by access-controlled APIs. Depending on the relationship between any two nodes in that network, they’ll have different kinds of access to each others’ data through this ad-hoc, decentralized system.

For a practical look at what this means, let's take the example of LinkedIn. If I have endorsements for JavaScript and WordPress, and I register with your new website via a LinkedIn button, your site isn't automatically going to know that I have these endorsements. In order for me to learn about your LinkedIn endorsements, I have to hit a LinkedIn-hosted API to get that information. (I’m not sure if LinkedIn actually exposes this data via API, but that’s beside the point.)

This arrangement has a few key implications:

  • I can only access the user data that LinkedIn elects to publish via its API.
  • LinkedIn can revoke API access for any data at any time, so it's not safe for me to build a business that's premised on access to, say, LinkedIn endorsement data.
  • It's also not safe for me to anger LinkedIn if I do go ahead and develop a critical dependence on their endorsement data, because they may just revoke my access to that data while leaving my competitors alone.
  • If something bad happens to LinkedIn, like maybe Google buys them and then closes them down in typical Google fashion, then that API and all the endorsements behind it are gone.
  • I will never be able to write my own endorsements to LinkedIn’s tables for others to see unless Linked gives me write access to that table. And again, if it does then I can still lose that access because it was revoked (for everyone or just for me) or because LinkedIn is gone.

The bottom line of the above points is: LinkedIn’s ongoing control of its proprietary endorsement data gives that company leverage over all users of that data, and it gives all users of that data exposure to various LinkedIn-specific business risks.

This combination of leverage and risk exposure massively constrains every entrepreneur's ability to build their applications in ways that depend on data LinkedIn is curating.

More generally speaking, the pervasive leverage/risk dynamics that swirl around the users tables of the tech megaplatforms are the dark matter that gives shape to all the visible aspects of the Internet as we currently experience it.

When that dark matter disappears, everything is going to fly apart.

The blockchain as the Internet's users table

I don't want to get into a bunch of detail on how the blockchain can use public key cryptography to form what is, in effect, a public users table that operates at Internet scale. But not only is this possible, it's already being done. Maybe in a later post I can drill down on this.

In place of a decentralized network of user data silos connected by APIs, there’s a single decentralized user data store accessible via an open protocol and a decentralized network of storage nodes. So the identity-hosting blockchain represents decentralization at the datastore implementation layer, and recentralization at the datastore access layer.

I’ll use BitClout’s blockchain as an example of what I'm talking about because I’ve been thinking about it a lot lately for various reasons. But I could use others, as well.

So with BitClout, I have an identity on that blockchain that's publicly accessible to anyone running an open-source BitClout node. To continue with the example of LinkedIn, if they were to rebuild themselves on top of BitClout, and they were to write their entire users table along with all its endorsements to the BitClout blockchain, then that data would be there for every other BitClout node to read or append to.

If anything were to happen to LinkedIn or to its internal assessment of the value of providing public endorsement data on-chain, the public would still have unfettered, permanent access to the old data. Furthermore, I could easily stand up a service that lets BitClout users (who are also LinkedIn users now) continue to endorse each other for various skills under my own identity, and I could even add a new skill and endorsement options that others could recognize (or ignore) alongside LinkedIn's.

Sure, LinkedIn could continue to keep its endorsement data off-chain behind a proprietary API even if its users table was effectively the BitClout blockchain. But a decision to keep data siloed this way is a decision to forego on-chain network effects for that data. My guess is that the temptation to take advantage of blockchain-sized network effects will be so great, that companies will default to putting data on-chain rather than keeping it siloed.

This point about network effects is why people will build things using on-chain identities. It's so critical that I'm going to blockquote what I just wrote about network effects above, so you can read it again:

The most critical thing your users table does for you, though, is it gives you access to network effects, and these network effects give you some lock-in. The more users your platform has, the more value it has to any one user (cf. Metcalfe's law), and the more value it has to any one user, then A) the easier it is to attract new users, and B) the harder it is for any one user to leave (because to leave is to give up all that value).

Imagine that LinkedIn, Reddit, and Github all port their users tables (along with much of their proprietary data, like endorsements, karma, and activity history) to BitClout. Immediately, here's what happens: every Github user is also a Reddit user and a LinkedIn user and a BitClout user. Likewise, every Reddit user is also a Github user and a LinkedIn user and a BitClout user. I could go on, but you get the point.

Every company that builds on the same virtual users table has immediate access to the network effects of every other startup on that table. Every time an on-chain company onboards a new user, then your service has a new user, as well. (In a manner of speaking. They may not be actively using your service yet, but they effectively have an account on it.)

Conversely, users-table-based network effects cease being an economic moat that can give anyone any kind of defensible or investable edge.

The game theory of this arrangement is as follows:

  1. All of the big incumbents will want to keep their users tables proprietary in the beginning, because that's their moat and it gives them the network effects and lock-in that markets and investors reward them for.
  2. At some point, though, the blockchain users table (plus the ecosystem built on it) will grow to the size where incumbents can't compete on their own. When BitClout's virtual users table has surpassed, say, Twitter's proprietary users table in size, then Twitter will either join the blockchain or get left behind. And likewise on up the size leaderboards until we get to Google and Facebook.
  3. Eventually, there will be a blockchain users table that's bigger than the next two or three proprietary users tables combined. When that happens, the megaplatforms will either ditch their proprietary tables and reconfigure their businesses around the chain, or they'll enter a doom loop.

Imagine a world where every startup begins life with as many users as the largest incumbent. There's no on-boarding or sign-up friction — at least friction isn’t mandatory as it is now (you could reintroduce it). If you're built on the big identity blockchain, then every blockchain wallet is already a (inactive) user of your service who has a seamless way to transact with you for money and digital goods.

It will cost companies some money to reach their users and flip them from inactive to active for their particular service. I think there will definitely be paid inboxes on any chain that's going to survive because otherwise SPAM will make the whole thing unusable. But companies can query the chain, identify a subset of users to target, and then target them or service them in whatever way they can afford.

If you want to build a set of network effects that benefit your company specifically, it won't be enough to simply cultivate a large users table or email list — no, you'll have to offer something on-chain that others are also incentivized to use, so that the thing you're uniquely offering spreads and becomes a kind of currency.

For example, if I'm the only one offering a particular credential (e.g., user verification, like Twitter Verified but on-chain), then the more users employ that credential in their transactions the larger my on-chain network is.

How would I, the credential provider get value from this? I dunno, but it's not too hard to come up with random possibilities. For instance, if you accept my credential then maybe you have to accept marketing messages from me at some steep discount to your normal inbox rate; I can either use this discounted access directly or resell it. That's one example I just randomly thought up, and I'm just one dude — I'm sure the market will come up with a lot more.

The bottom line: Moving identity on-chain, and thereby removing the possibility of users-table-centric network effects, completely up-ends the entire landscape of API-based, access-controlled interoperability that the present Internet is built on. All of the non-technical market and political dynamics around users table size, leverage, and risk suddenly go out the window.

I don't know what the results will look like, but I'm pretty sure it will be unrecognizable to us right now. There are whole conversations we're currently having around tech consolidation and censorship that won't even make sense in a single-users-table world.

To put it another way, there’s now another large, growing entity on the scene in addition to the government that can face off against Big Tech and break it up by force. Of course, the government may try to murder that entity before it not only gets big enough to do in Big Tech, but then it turns its monopoly-busting powers on the government itself.

✅ Task: Earn $100 in ETH

Write a review of this post.

Write a review of this post on your social media page or (ideally) at your own domain. You can offer feedback, correct errors, or propose extensions; we ask only that you be constructive. We'll award up to ten $100 prizes to the best ten reviews.

🏆 Winners: Best Reviews

Ten submissions received $100 in ETH for their reviews of this post. Check them out below!