Data Network Effects: Why We Make it a Design Time Decision at super{set}

super{set}
4 min readApr 10, 2021
light bulb on top of bowl filled with binary code

At super{set}, our central commitment is to found, fund, and form data-driven companies. Our thesis is that AI without data is an engine on blocks in a garage, dormant and unmoving. To harness its power, you need a car, a driver, and, most critically — you need fuel. Data is the fuel and essential precondition for AI and all of the possibilities it portends.

Data-driven companies are all variations on a central theme. While the exterior and market focus of a data-driven company can vary widely, at its core, any data-driven company is in the business of

  • Capturing
  • Generating
  • Orchestrating
  • Pipelining
  • Unifying
  • Analyzing
  • Protecting, and/or
  • Activating

data. Notably absent from the preceding list is the idea of data processing, which was enterprise software’s remit throughout the 90’s and the 00’s. Enterprise systems in those decades were Systems of Record that processed data elements in slow-moving systems built on relational databases that recorded the names and addresses of customers (CRM), how many widgets they bought (ERP), or the names, roles, and salaries of employees (HCM).

The 2010’s saw the emergence of Systems of Engagement, which catalyzed a shift away from book-keeping to smarter, real-time handling of activities and events across new surfaces and touchpoints (web, mobile, chat). Targeting and personalization of ads and content by companies such as Google, Facebook, and LinkedIn offered indisputable evidence of these new systems’ effectiveness and scale. To create the network effects that power their business models forward, they had to crack the Perfect-Personal-Now challenge: how to give billions of us exactly what we want, tuned to just our individual needs, possibly before we ourselves knew we wanted it.

icons representing web, mobile, chat for systems of engagement and CRM, HCM, ERP, ITSM for systems of record

They nailed it.

But amidst the excitement in connection with the social platforms’ unbreakable network effects, it’s easy to overlook the fact that a similar dynamic has taken root in the enterprise software arena. Data network effects (DNE’s) in enterprise software offer scale and data advantages similar to social networks, but without the monopoly and surveillance effects.

DNE’s are subtle because they happen behind the screen in systems that help businesses respond to consumer needs. Some have termed this “B2B2C” because it doesn’t unfold in a direct-to-consumer interface a la Facebook or Google; rather, it follows an ‘action at a distance’ pattern. A DNE can be understood as

any computational process that, with trust and privacy, harnesses the power of data from a collective to generate better results for the individual.

Examples of DNE’s include

  • Aggregating log data from consumer interactions to calculate benchmarks and baselines regarding consumer trends, affinities, and behaviors (e.g., “this group of users indexes highly for household income over $150K and propensity to purchase cream cheese”)
  • Recommending a product (a movie to be streamed, a coupon, a hairbrush) based on your belonging to a cohort of people who exhibit similar behavior
  • Distilling input from workers in content and communication systems to surface employee sentiment, motivations, and talent potential to HR leaders and executives
  • Pooling identity and attribute data from third-party data providers to measure the accuracy of internet advertising and marketing spend
  • Probing user-generated content to detect and flag toxic behaviors (misogyny, racism, sexual harassment) to improve the trust and safety of online experiences

The idea of a ‘collective’ here requires qualification. All of us who’ve developed these systems have internalized Norvig’s Principle, attributable (at least anecdotally) to Peter Norvig from Google. In colloquial terms, it asserts that

99% of the time, {More data + simpler algorithms} beats {Less data + more complicated algorithms}

You can design fancier algorithms and model-based approaches in cases where the data is sparse, if that’s your only course. The better-stronger-smarter path, however, is to (1) industrialize the simplest possible algorithm (think of Google’s super-scaling of indexing, a well-established technique from computer science), and (2) nourish it with the largest reservoir of data possible (90% of the planet’s search queries).

Which is to say, if you’re designing a DNE, very large data collectives trump small data collectives every time. You want data, gobs of it.

One of our core principles at super{set}, born from personal experience, hard knocks, and flubs, is that when you’re forming a new data-driven company from scratch, you don’t want to luck your way into a DNE. Like a watchmaker, your objective should be to design its gears at the very beginning so that the resulting apparatus is self-perpetuating, the resulting DNE inexorable.

We learned this lesson the way good empiricists do, by seeing what happens when you implement it deliberately versus the case where you try to staple it on later. In my first company, hatched at the beginning of the ‘00’s, I understood the possibility of a DNE too late. Our arrangements with customers were hamstrung by the old data processing mindset.

We hadn’t anticipated the possibility of generating new results for our customers from the data flowing through our systems. We tried to revise it by striking new arrangements and re-architecting our software, but it was too late. The concrete had been poured; we were stuck with the implications of decisions made many years before.

We carried the lesson forward in our second company, obsessed over these aspects at design time, and achieved a much stronger, better result in the end. We’ve honed it into a framework we apply to DNE design for any new company at super{set}, centered on three key elements:

  1. Trust
  2. Contracts
  3. Architecture

I’ll unpack each of these with an example from super{set}’s portfolio in my next post.

--

--

super{set}

We found, fund, and build data-driven start-ups. Learn more at: superset.com