Databases for Serverless & Edge

2023-01-25

There's been massive innovation in the database and backend space for developers building applications with serverless and edge compute. There are new tools, companies, and even programming models that simplify how developers store data.

This post will be an overview of databases that pair well with modern application and compute providers.

Criteria

The “backend” space is vast: search, analytics, data science, and more – so I'll niche down here. The primary criteria of this overview is:

  1. Services that work exceptionally well when paired with serverless and edge compute
  2. Services that work with JavaScript and TypeScript codebases

A new programming model

Relational databases have been around for 25+ years.

While there are new companies creating serverless-first storage solutions, a new programming model is required for workloads to be compatible with serverless compute and modern runtimes.

These solutions must be:

  • Connectionless: Developers don't want to think about manual connection management. Traditional database protocols are stateful, whereas HTTP is mostly stateless, making it easier to use with scale to zero compute. Exposed through an SDK or HTTP API, “connectionless” solutions provide an abstraction over connection pooling.
  • Web native: Browser data-fetching APIs (e.g. Web fetch) and protocols are eating the world. New databases use HTTP APIs or WebSockets, rather than opening direct connections to the database. This makes them compatible with all forms of compute (including the lighter runtime used in edge compute).
  • Lightweight: Client libraries (and drivers) are becoming thin. Complexity is shifting to the database vendor, taking on the burden as part of their global infrastructure. For example, their gateway might handle connection pooling or provide caching infrastructure. This has led to a new wave of ORMs (i.e. abstractions to query data) both as standalone libraries and as integrated database SDKs.
  • (Bonus) Type-safe: TypeScript developers are favoring databases or libraries which provide tooling to enable type-safe access to your data. For example: Prisma, Kysely, Drizzle, Contentlayer, and Zapatos.

Consider databases like Postgres. New solutions like Neon and Supabase abstract connection management, providing you with a simple way to query and mutate data. In the case of Supabase, there's a client library that uses an HTTP API built on PostgREST:

import { createClient } from '@supabase/supabase-js';
let supabase = createClient('https://<project>.supabase.co', '<your-anon-key>');
let { data } = await supabase.from('countries').select();

And for Neon:

import { Client } from '@neondatabase/serverless';
let client = new Client(env.DATABASE_URL);
let {
  rows: [{ now }],
} = await client.query('select now();');

Neon's solution is particularly interesting.

The basic premise is simple: our driver redirects the PostgreSQL wire protocol via a special proxy. Our driver connects from the edge function to the proxy over a WebSocket, telling the proxy which database host and port it wants to reach. The proxy opens a TCP connection to that host and port and relays traffic in both directions.

Neon architecture diagram for connection pooling
💡

Using WebSockets, instead of HTTP, does have tradeoffs. There might be additional latency on the first request setup, but subsequent requests are faster. There's an RFC for WebSockets with HTTP/3 which would remove that extra network roundtrip.

The connection management isn't going away – it's just being handled by the vendor now.

There's even solutions like PlanetScale which can handle up to a million connections, which also allows you to effectively never think about managing connections.

This new programming model has created emerging trends for database companies:

  • Databases are increasingly becoming data platforms, including other adjacent solutions like full-text search and analytics.
  • The decoupling of storage and compute, popularized by Snowflake (and more), is enabling new players (e.g. Neon et al.) to massively reduce the cost of a “database at rest”. This pairs well with frontend git branch-based workflows, where you want to scale to zero when not being used.
  • Increasingly developers don't want to “dial the knobs”. Solutions like DynamoDB (and in some ways S3) provided infinite scale without needing to tweak memory, storage, CPU, clusters, and instances.
  • The dream of global data is here, but not how it was predicted. Trying to replicate all data to every network edge is probably not the correct solution most times. Instead, we're seeing specialized data APIs and the emergence of user-specific data stores (e.g. for shopping cart data).
  • More databases are embracing serverless, but what “serverless” means to them varies. There are different vectors of autoscaling: connections, storage, compute, and more.

Databases

I'll bucket these into “established” and “rising” categories, serverless/serverful, as well as generally available (GA) and pre-GA. I'll also mostly talk about managed vendors.

For example, you can of course run MySQL or Postgres on major cloud providers like AWS. There's a long tail of niche data storage solutions, so some will definitely be missing that I haven't heard of or used.

Established serverless solutions

  • DynamoDB
    • AWS DynamoDB: Fully managed, serverless, key-value NoSQL database.
  • Firebase
    • Firestore: Over 10 years old now, Firestore is a well-adopted document database. One unique advantage Firebase is its built-in support for authentication, real-time workloads, and cross-platform support for mobile.
  • MongoDB
    • Atlas Serverless: MongoDB has an entire data platform, including search / analytics / etc. Their recent investment into serverless is very exciting, with Atlas Serverless now generally available. Their Data API makes them a fantastic pair for serverless / edge.
  • MySQL
  • PostgreSQL
    • AWS Aurora: One of the first serverless Postgres offerings.
    • CockroachDB: Autoscales and distributes data across multiple nodes. Focused on high data consistency and integrity. Supports most of Postgres but not stored procedures and extensions.
  • Redis
    • Upstash: Offers durable/consistent Redis, global replication options, and Kafka. Fantastic @upstash/redis HTTP/REST client library.

Rising serverless database solutions

  • Generally Available (GA)
    • Fauna: They were an early mover in the serverless data space. I've had success with their GraphQL APIs before, but feel mostly indifferent on FQL.
  • Pre-GA
    • Convex: Fantastic for real-time workloads, but also a very simple, type-safe interface for querying/mutating data. You write (and think) in functions. Pairs well with React's mental model.
    • Grafbase: If you love GraphQL, Grafbase is worth exploring. Designed to integrate into branch-based workflows + fast reads globally. Realtime and full-text search are in the works.
    • Neon: Postgres with separation of storage and compute. Uniquely designed for serverless and works with the native Postgres driver + supports database branching workflows.
    • Supabase: Open-source, built-on pure Postgres. Database + Auth + Storage and more. Scales up on pay-as-you-go, and working on scale to zero.
    • Xata: Not only a database, but search / analytics as well. I'm a fan of their spreadsheet-like UI, which is approachable for a wider audience.
  • + Long tail of new providers that can do DB hosting (Railway, Render, Fly, etc).

Stateful backends and other solutions

  • Generally Available (GA)
    • AlloyDB: Very exciting innovation coming from Google. AlloyDB is unique because it can handle both transactional¹ and analytical² workloads.
    • Ably: Realtime infrastructure with queues support and other message durability options.
    • Crunchy Postgres: Crunchy Postgres is "just Postgres". Focused on performance and availability, cost-effectiveness, and supporting all native Postgres features.
    • Hasura: Makes it easy to connect data from many different sources and expose it as a GraphQL API. Can use Neon under-the-hood as the database provider.
    • Liveblocks: Realtime collaboration infrastructure, they offer persistent conflict-free data along with APIs for document browsing, permissions management, database sync. Beautiful design and documentation.
    • Replicache: More of a database synchronization engine that can be coupled with other solutions with no real-time solution. We use this for comments on Vercel previews.
    • TimeScale: For both transactional and analytical workloads. Adds many features found in NoSQL databases to Postgres, and also integrates with S3 for storage.
  • Pre-GA
    • ChiselStrike: Write your TypeScript class, generate an API. Really leaning into the “infrastructure from code” approach, you write and think in functions, somewhat similar to Convex.
    • EdgeDB: EdgeDB is challenging the status quo. Particularly the “merger” between ORM / database, more than just a query builder, but a way to optimize queries.
    • Nhost: Firebase, but with GraphQL. Built on Hasura, Postgres, and S3.
    • Rowy: Low-code, spreadsheet-like backend (almost like an open-source Airtable) but they also have the ability to write functions to mutate data.
    • SurrealDB: Has its own SQL-flavored syntax, also trying to provide the spreadsheet-like UI for viewing data (still waitlist access, haven't used it).
    • Tigris: Document database, focused on real-time and also includes full-text search.
    • WunderGraph: Trying to lead with an excellent ORM / client library (end-to-end type safety, GraphQL support) and wedge into a cloud product (still waitlist access, haven't used it).
  • Other Solutions
    • Caching Engines: Stellate, Prisma Accelerate, ReadySet.
    • Cloud Provider Offerings: Azure SQL, Azure CosmosDB, Google Cloud SQL, Google BigTable, and many more.
    • Content Management (Headless CMS): These can still act as a database, just a different domain-specific solution. Sanity, Contentful, Sitecore, and more.

¹ Commonly referred to as OLTP (Online Transactional Processing). These are for CRUD operations, most commonly the MySQL and Postgres databases of the world.

² Commonly referred to as OLAP (Online Analytical Processing). These are for your real-time data workloads, like Clickhouse (also Tinybird), SingleStore, TimeScale, and ElasticSearch.

Thanks to Guillermo Rauch and Bayo Coker for reviewing this post.