From the Tragedy of the Commons to the Ecstasy of the Collective

In my last post, I presented the idea of a data network effect (DNE) and explained why we pay so much attention to it at super{set}. We established a DNE as

any computational process that, with trust and privacy, harnesses the power of data from a collective to generate better results for the individual.

When it comes to a DNE, your objective shouldnt be to luck your way into it, but rather to design it intentionally into every aspect of your business. In this note I’ll unpack the three central design elements of a DNE (Trust, Contracts, and Architecture), illustrate them by drawing on a few personal lessons, and describe their successful implementation at Spectrum Labs, the first company to emerge from super{set}’s studio.

Trust

Trust is the root tree of any DNE. As I shared in my last post, I flubbed the DNE attempt in my first company. As we got underway with the next project, Krux, my Co-Founder Vivek and I started with a blueprint — but by no means a completed product — for catalyzing and exploiting data network effects. Our goal was to deploy our infrastructure ubiquitously across our customer base of media and marketers customers, at a time when data access patterns and internet norms still permitted such a buildout. It was a different time for sure, when third-party cookies were still active and Facebook and Google hadn’t yet erected the impenetrable barriers around their platforms we see today.

Our customers were justifiably worried that, if we amassed such a data position, there was nothing stopping us from creating our own advertising platform to compete with our media customers and to gouge our marketer customers. (This is, of course, exactly what Google does today, which is why it is the subject of overdue scrutiny by regulators and lawmakers.)

Job 1 was to commit to a neutral, Switzerland-like posture. We committed to all of our early customers that we would never build an ad network that would compete with or undercut their businesses. Our lawyers hated it, but we even wrote the ‘no ad network’ promise into a few of our early contracts. Many of our early employees were adtech veterans who had seen the runup in ad network valuations during the previous decade, so to them, it seemed close-minded and even a little priggish to foreclose extra revenue opportunities by making such a categorical commitment.

With the benefit of hindsight, today it all seems obvious, but at the time, it was a contrarian thing to do.

Spectrum has built AI that recognizes and removes toxicity from user-generated content, with multiple successful implementations in several leading dating, gaming, and marketplace web destinations. It’s a fascinating and timely offering: their AI can perceive, in multiple languages, a broad range of nefarious human behaviors (misogyny, racism, terrorist grooming, sexual harassment, among others). Delivering this capability at scale, across billions of interactions and hundreds of millions of users, requires the labeling and pooling of vast troves of user-generated data and the ability to parse and process them to detect all the behaviors of interest to Spectrum’s trust and safety users.

For the Founders, Justin and Josh, the mission is personal. While they didn’t face the same business model issues we confronted at Krux, Spectrum took steps early to ensure the safe processing and stewardship of all of the data flowing through their systems. They became the only provider in the cybersafety space to offer a performance guarantee, backed by a prominent re-insurer, that guarantees the fidelity and efficacy of their solution. This establishes the gold standard for AI powered trust & safety platforms, in an emerging and pertinent space looking to establish trust.

This incredibly prescient move has paid massive trust dividends across all of Spectrum’s customers. Finally, by sitting at the center of an industry-wide consortium to advance safe online experiences for all players, whether they’re Spectrum customers or not, Spectrum has solidified its position as the Switzerland of online trust and safety.

Contracts

Economists have identified a failure pattern in collective action, called the tragedy of the commons, wherein individuals act in their own self-interest by consuming and depleting a shared resource (atmosphere, grazing lands, fishing stocks). The concept applies also to the maintenance or monetization of a shared resource. If we all go to dinner in a group, for example, and everybody assumes that somebody else will cover the tip, the waiter goes home with nothing. If a subgroup ponies up for the tip, it’s locally optimal for individuals who dodge their contribution — economists refer to them as ‘“free riders” — but of course unfair to those carrying their fair share.

The creation of a DNE necessarily entails a contractual structure that staves off free riders and the tragedy of the commons.

Early on, Spectrum created a contract structure that gives their customers comfort and confidence that their data belongs to them, as it must. In the context of their customer implementations, however, they deploy classification algorithms that consume data from a secure, ever-growing data collective, what Spectrum calls the Data Vault.

Spectrum’s customers understand the premise and the value of the Data Vault, and they trust Spectrum to serve as its dependable steward. The more Spectrum deploys its solution, the greater the depth and breadth of the data in the Data Vault. Consistent with Norvig’s Principle (see my last post), this, in turn, increases Spectrum’s ability to reliably detect toxic behaviors.

Figure 1: The DNE Flywheel

Spectrum’s contract isn’t a lucky means of ensuring the virtuous cycle in the DNE Flywheel. It is, rather, its vital, most necessary condition. Without fanfare, well before its competitors even understood what was afoot, Spectrum assembled the world’s largest reservoir of labeled data for toxicity detection across multiple languages. Spectrum’s Data Vault gives the company a self-reinforcing advantage: the ability to more accurately see and classify every toxic behavior of interest in real-time across multiple languages — an unassailable, durable benefit for all of Spectrum’s customers. Spectrum’s Data Vault expands rapidly every day and represents the secure pooling of 100s of billions of labeled, consumer interactions, and many PBs of text and audio data across tens of languages.

As a result of these technical advantages, Spectrum helps its customers protect billions of people online every day.

Architecture

If Trust and Contracts are the necessary conditions for a potential DNE, architecture is the surest means of defending it over time.

Prior attempts at Trust and Safety approached the problem with an exclusive focus on workflow for content moderators. While it’s undoubtedly important to make life better for moderators with easy-to-use tools, before Spectrum no one had approached Trust and Safety as a data-first AI challenge.

At a time when their competitors were doubling down on workflow, Spectrum was making heavier investments in data collection and pipelining. They refined a cloud-based architecture for ingesting and processing billions of rows of conversational data and audio data. They streamlined the orchestration of that data across a large and growing farm of models, all of them hungry for data, and continued to build the tooling required to prevent model drift and ossification. While others were trumpeting the thin veneer of AI for Trust and Safety, Spectrum was quietly building, testing, and refining a durable technology architecture that could support the ingestion, pipelining, and processing of very large data sets to power the DNE Flywheel.

Takeoff

In Superintelligence, Nick Bostrom examines the analytic conditions under which an AI could surpass human intelligence and the possibility of such an AI taking what he calls ‘the Treacherous Turn’ — the AI’s subjugation of its former human overlords as a perhaps inevitable consequence of its prime directive to grow and learn.

Bostrom identifies two stopping points: One he calls Takeoff, wherein the AI, after an early phase, begins to learn exponentially; the second is what he calls the Decisive Strategic Advantage, wherein it becomes impossible for any comparable alternative (human, biological, or machine-based) to catch up.

Whether you subscribe to Bostrom’s larger concept of SuperIntelligence or not, the pattern of his logical argument explains Spectrum’s journey and serves as a template for any DNE design.

In the latter half of 2020, after 24 months of learning, refinement, and delayed gratification, Spectrum’s detection and classification algorithms started to significantly outperform all the available alternatives in side-by-side contests. Specifically, the precision and recall of its algorithms for toxicity detection — the most critical statistical measures of accuracy for any classification scheme — showed a steady upward climb from September through March.

As a direct result of the DNE designed into its business over two years ago, Spectrum has achieved takeoff for toxicity detection.

A recent experience underscores the claim. A well-resourced public company with a large team of data scientists and machine learning experts was recently on the cusp of launching their own internal effort to build a toxicity detection AI. One of their executives, worried about its projected cost, insisted on a consideration of external alternatives. After winnowing out other content moderation alternatives, a Kaggle-style contest between their internal team and Spectrum ensued.

The results were decisive: Spectrum outperformed their internal classification algorithms, which were uniquely tuned (and possibly overfitted) to their own data, by a significant margin. Importantly, this had nothing to do with the IQ and experience of their internal machine learning and data science experts. We know first-hand how talented Spectrum’s team is, but we should allow for the possibility that the customer’s group was even smarter and better trained.

None of that matters.

Consistent with Norvig’s Principle, the data in Spectrum’s Data Vault, further enriched in the DNE Flywheel, has given way to a decisive advantage that alternatives can never match.

For completeness we should observe that it’s feasible that others may find temporary advantages in smaller nooks and crannies of the general problem space. In just the same way that hedge funds occasionally stumble upon information asymmetries to sustain an ephemeral trading ‘edge,’ it’s possible that for more rarefied behaviors where the data is sparse, highly fitted, model-based methods could fill the gap. They rapidly succumb, however, to the effects of Spectrum’s DNE Flywheel.

It’s not just that Spectrum has successfully staved off free riders in the development of its DNE. In fact, the company and its customers together have achieved the inverse of the tragedy of the commons, what we might call the “ecstasy of the collective”: a better-together model wherein each new customer incrementally increases the depth and scale of the Data Vault, which in turn ensures higher accuracy in the detection and elimination of online toxicity.

That’s obviously a good thing for Spectrum and its customers. But it’s also good for citizens, governments, and societies — anyone who wants to see an internet that is free of noxiousness, a place for the better angels of our nature to thrive and grow.

We are a crew of proven technologists, go-to-market assassins, and seasoned entrepreneurs launching and growing new tech startups. Learn more at: superset.com