In the last post, I discussed end-to-end encryption and why we’re focusing instead on designing a permissioned data system. That was a narrowing down of the problem space. In this post, I’m going to start building up an actual solution. I’ll be introducing a new concept that is core to how we’re thinking about permissioned data: buckets.

What we’re trying to do here

Before diving into solutions, let’s first ground ourselves in what we’re trying to achieve. 

Permissioned data should feel like a natural extension of how public data already works in atproto. That doesn’t mean that we need to re-use the exact data structures and sync protocols as the public data system. However, the general shape should be familiar: users publish records, those records are canonically stored in the user’s PDS, applications crawl PDSes and sync data to build their own views, users own their data and can move around, authority rests in the DID that publishes the data, the system works at scale, and users shouldn’t feel they’re dealing with weird behavior just because they’re on a decentralized protocol.

Ideally we handle permissioned data with one coherent protocol rather than a patchwork of different systems for different use cases, though we shouldn’t be dogmatic about that if the design demands otherwise.

Groups are the hard case

Let’s revisit the modalities that I called out last time (excluding messaging this time):

  • Personal Data: mutes, bookmarks, drafts

  • Gated content: Patreon, Substack, paid newsletters

  • Socially shared: private posts, stories

  • Groups: Facebook groups, private forums, private subreddits

These are roughly ordered by complexity. Personal data is simple, it’s just you, your PDS, and maybe an application or two acting on your behalf. Gated content is one-to-many with a clear gatekeeper. Social sharing introduces some dynamism around who is able to view your stuff, how they interact with it, and who can see their interactions.

Groups are many-to-many. They have dynamic membership - people join & leave, admins change, ownership of the group changes. Many users are contributing content to a shared context. Users in these groups may want to view their groups in any number of different apps.

My hunch is that if we can design a system that works for groups, the simpler modalities will fall out naturally. Groups force you to confront the hardest questions about ownership, membership, and access control. So that’s the modality I’ll focus on in this post.

I’ll sketch out two possible solutions and then introduce the concept of a “bucket” which resolves the issues from each of the preceding solutions.

Attempt 1: App-controlled access (realms)

Think about the role an application plays in deciding what a user is able to see. Even in public social modalities like Bluesky (as in apps built on the app.bsky.* lexicons), the application prevents users from seeing posts that violate thread gates and post gates. Blocks prevent the blocked user from viewing certain posts and also prevent third parties from viewing block-violating posts. Of course, for public content you can always go directly to the protocol. Still, this is a basic form of access control being applied by the application.

One clean solution to permissioned data may be to say “let applications handle it”. If an application has access to all permissioned data for some particular social modality, then it can apply arbitrarily complex access control rules around which users are able to see which content in which context. Applications are in the best position to do this because they fully understand the business logic behind who has access to what. Keep the protocol logic coarse and give applications full flexibility around access control logic.

For applications to fulfill this function, they need access to all the content for a particular modality. Let’s call this a “realm”. A realm is an abstract content partition in the network intended for a particular “type” or “use” of permissioned data. Realms can be identified by an NSID defined by publishing a lexicon. When a user creates a permissioned record, they specify the realm that it is being posted into.

To make this concrete, consider a private forum application called “AtmoBoards”. AtmoBoards can define a new realm by creating a lexicon with an NSID, something like com.atmoboards.forum. A realm is network-wide and heterogeneous. The AtmoBoards realm contains posts, comments, profiles, votes, and more. It contains all AtmoBoards forum content from all users.

Access to the realm naturally translates into an authorization scope that can be displayed on the OAuth consent screen. Something like “Your content within AtmoBoards forums”. The app then simply syncs the data from the user’s PDS using the given OAuth credential. Applications don’t intrinsically “get access to the whole realm”. However, when a user logs into an application that works with a particular realm, that application requests access to all of that user’s content in that realm.

This is conceptually pretty elegant. We get to simply reuse existing auth infrastructure. The consent flow is legible to users. Applications can offer arbitrarily complex access control rules on a user-by-user basis. Users maintain the canonical copy of their data which enables users to choose their application and therefore migrate their community between applications.

However, problems start to emerge when users in the same group use different applications.

Say Alice, Bob, and Carol are all in a private forum together. Alice and Bob use AtmoBoards, but Carol uses a new app called ForumBrowser. Bob posts something in the group. Ideally Carol should be able to see it because she’s in the group! But ForumBrowser can’t access Bob’s post unless Bob has separately authorized ForumBrowser through an OAuth flow. Bob probably hasn’t, he might not even know ForumBrowser exists! 

Maybe we introduce a programmatic way to give access to a given application without going through the OAuth flow. If Bob and Alice both decide that Carol should be able to access this forum through ForumBrowser, they could choose to grant it access to their content in the com.atmoboards.forum realm. However, remember this realm contains all content from all forums. So in granting ForumBrowser access to this particular forum, they also grant ForumBrowser access to all of their forum content from across all forums.

This highlights the basic problem with realms: the protocol-level access boundary is too coarse. You either give all of your AtmoBoards content to an app or not. If a group wants to support more than one application, every member of the group needs to give access to every application that the group supports. 

Some users may feel comfortable with this if there are just one or two big applications in the ecosystem. If someone in my forum wants to use a weird little niche app, I might even feel comfortable sharing that particular forum with them. But I don’t want to give every experimental app access to all of my private forum content.

This becomes a centralizing force. Applications aren’t that useful if they don’t have access to the full set of content for any given forum. Each application needs a critical mass of users to individually authorize each app. Big apps might eventually get there. But the long tail of apps - the experimental ones, the niche ones, the apps that make an open ecosystem interesting & engaging - are completely iced out. This grates against the core value proposition of building on an open protocol: the data should be interoperable, composable, and remixable across applications. 

This suggests that we need a more fine-grained mechanism to manage access. And we can’t just riff off of OAuth; our mechanism needs to be programmatically expressible at the protocol layer.

(That said, we might not have seen the last of realms. Keep an eye out for them in a later post 👀)

Attempt 2: Granular user-controlled access

The immediate followup question is: what’s the unit of access control? What is the ACL actually attached to? And how does that ACL get updated?

Let’s walk through a couple scenarios and watch as the complexity escalates.

Note: As we go through these scenarios, I’ll discuss in terms of user-to-user access grants. Applications need to sync on behalf of users which is a whole ‘nother problem that we’ll address in a later blogpost. However, the same basic logic applies to user-to-app access grants.

Simple case: Alice wants to share some permissioned posts with Bob. She puts an ACL on her posts collection granting Bob read access. Great!

A bit harder: Alice has different types of permissioned posts meant for different people - posts for close friends, posts for her mutuals, posts for a paid community. Now she needs separate ACLs for each category. We can’t express these at the collection level anymore, but maybe we can attach ACLs at the record level. Okay, more bookkeeping but still tractable.

Harder still: Alice shares a permissioned post with Bob & Carol, and Bob replies. Carol wants to read the thread. She now needs access to both Alice’s and Bob’s permissioned data. Who coordinates that? Alice and Bob each have their own ACLs on their own PDS. 

And the real kicker: Alice creates a private community group. She adds Bob and Carol. A bit later, she adds Dan and Eve. Over time, Alice adds fifty more people. Every time someone new joins, every existing member’s ACL on every piece of content in the group needs to be updated to reflect that the new member can now see their contributions to the group. Carol may not even know Dan or Eve much less the 50 other members. Is she individually responsible for updating her ACL when the group owner adds someone? How does she get notified? Even if we invent some sub-protocol for managing ACL updates, can we really expect all group members to update the ACL on every single record meant for the group each time the group boundary changes?

The fundamental issue here is that when access control is attached to individual pieces of data, every social interaction creates a coordination problem. As the number of participants increases, that coordination overhead starts to exponentially spin out of control.

Social interaction revolves around a shared context. What we really want is a way to say: “Here’s a space. Here’s who has access to it. Everything in this space inherits that access”.

We need a bucket. 

Buckets

A bucket is a named container that holds records and has a single authoritative ACL. It’s the protocol-level primitive for “these people can access this stuff”. When you post into a bucket, your post inherits the ACL of that bucket. When someone is added to a bucket, they get access to everything in it. When they’re removed, they lose it.

A bucket is a bit like a repository. But it isn’t the public repository and (spoiler) probably doesn’t use an MST. Users will have many buckets, one for each social context that they are creating content in. A bucket holds records of many different types and from many members. For instance, a single bucket may contain all content for a given AtmoBoards forum.

To quickly compare buckets back to realms: realms were abstract data partitions that described the “type” or intended use of some data. For example, “AtmoBoards forums content”. A bucket describes a particular shared social context with defined owner, admins, and members. For example, “the Protocol Nerds forum on AtmoBoards”.

Buckets give us a few things we’ve been missing. They provide a natural unit of access control that is neither too granular (per-record) nor too coarse (per-app). They handle dynamic membership. And they give applications something concrete to sync and index.

One thing worth flagging upfront: a bucket doesn’t necessarily imply a physical container sitting on one PDS. Consider how threads work in the Bluesky app today. Each post lives on its author’s PDS and the thread is compiled by the application by virtue of the fact that each post references the same root. A bucket could work similarly, with its contents distributed across members’ PDSes rather than centralized on one host. There are real tradeoffs here and we’ll dig into them in the next post.

And there’s still a lot more to figure out. How is bucket data actually stored/represented in the PDS? How is bucket data addressed? How is bucket ownership managed and can buckets be transferred? How is access to a bucket represented and enforced? Who has access to buckets: users or applications? How do applications sync buckets in real-time?

We’ll get into some of those in the next post. For now, the key insight is this: permissioned data needs a shared context with a perimeter. And that space needs to exist, not just in the application and not scattered across individual users’ PDSes, but at the protocol layer.