In the last post, I discussed end-to-end encryption and why weāre focusing instead on designing a permissioned data system. That was a narrowing down of the problem space. In this post, Iām going to start building up an actual solution. Iāll be introducing a new concept that is core to how weāre thinking about permissioned data: buckets.
What weāre trying to do here
Before diving into solutions, letās first ground ourselves in what weāre trying to achieve.Ā
Permissioned data should feel like a natural extension of how public data already works in atproto. That doesnāt mean that we need to re-use the exact data structures and sync protocols as the public data system. However, the general shape should be familiar: users publish records, those records are canonically stored in the userās PDS, applications crawl PDSes and sync data to build their own views, users own their data and can move around, authority rests in the DID that publishes the data, the system works at scale, and users shouldnāt feel theyāre dealing with weird behavior just because theyāre on a decentralized protocol.
Ideally we handle permissioned data with one coherent protocol rather than a patchwork of different systems for different use cases, though we shouldnāt be dogmatic about that if the design demands otherwise.
Groups are the hard case
Letās revisit the modalities that I called out last time (excluding messaging this time):
Personal Data: mutes, bookmarks, drafts
Gated content: Patreon, Substack, paid newsletters
Socially shared: private posts, stories
Groups: Facebook groups, private forums, private subreddits
These are roughly ordered by complexity. Personal data is simple, itās just you, your PDS, and maybe an application or two acting on your behalf. Gated content is one-to-many with a clear gatekeeper. Social sharing introduces some dynamism around who is able to view your stuff, how they interact with it, and who can see their interactions.
Groups are many-to-many. They have dynamic membership - people join & leave, admins change, ownership of the group changes. Many users are contributing content to a shared context. Users in these groups may want to view their groups in any number of different apps.
My hunch is that if we can design a system that works for groups, the simpler modalities will fall out naturally. Groups force you to confront the hardest questions about ownership, membership, and access control. So thatās the modality Iāll focus on in this post.
Iāll sketch out two possible solutions and then introduce the concept of a ābucketā which resolves the issues from each of the preceding solutions.
Attempt 1: App-controlled access (realms)
Think about the role an application plays in deciding what a user is able to see. Even in public social modalities like Bluesky (as in apps built on the app.bsky.* lexicons), the application prevents users from seeing posts that violate thread gates and post gates. Blocks prevent the blocked user from viewing certain posts and also prevent third parties from viewing block-violating posts. Of course, for public content you can always go directly to the protocol. Still, this is a basic form of access control being applied by the application.
One clean solution to permissioned data may be to say ālet applications handle itā. If an application has access to all permissioned data for some particular social modality, then it can apply arbitrarily complex access control rules around which users are able to see which content in which context. Applications are in the best position to do this because they fully understand the business logic behind who has access to what. Keep the protocol logic coarse and give applications full flexibility around access control logic.
For applications to fulfill this function, they need access to all the content for a particular modality. Letās call this a ārealmā. A realm is an abstract content partition in the network intended for a particular ātypeā or āuseā of permissioned data. Realms can be identified by an NSID defined by publishing a lexicon. When a user creates a permissioned record, they specify the realm that it is being posted into.
To make this concrete, consider a private forum application called āAtmoBoardsā. AtmoBoards can define a new realm by creating a lexicon with an NSID, something like com.atmoboards.forum. A realm is network-wide and heterogeneous. The AtmoBoards realm contains posts, comments, profiles, votes, and more. It contains all AtmoBoards forum content from all users.
Access to the realm naturally translates into an authorization scope that can be displayed on the OAuth consent screen. Something like āYour content within AtmoBoards forumsā. The app then simply syncs the data from the userās PDS using the given OAuth credential. Applications donāt intrinsically āget access to the whole realmā. However, when a user logs into an application that works with a particular realm, that application requests access to all of that userās content in that realm.
This is conceptually pretty elegant. We get to simply reuse existing auth infrastructure. The consent flow is legible to users. Applications can offer arbitrarily complex access control rules on a user-by-user basis. Users maintain the canonical copy of their data which enables users to choose their application and therefore migrate their community between applications.
However, problems start to emerge when users in the same group use different applications.
Say Alice, Bob, and Carol are all in a private forum together. Alice and Bob use AtmoBoards, but Carol uses a new app called ForumBrowser. Bob posts something in the group. Ideally Carol should be able to see it because sheās in the group! But ForumBrowser canāt access Bobās post unless Bob has separately authorized ForumBrowser through an OAuth flow. Bob probably hasnāt, he might not even know ForumBrowser exists!Ā
Maybe we introduce a programmatic way to give access to a given application without going through the OAuth flow. If Bob and Alice both decide that Carol should be able to access this forum through ForumBrowser, they could choose to grant it access to their content in the com.atmoboards.forum realm. However, remember this realm contains all content from all forums. So in granting ForumBrowser access to this particular forum, they also grant ForumBrowser access to all of their forum content from across all forums.
This highlights the basic problem with realms: the protocol-level access boundary is too coarse. You either give all of your AtmoBoards content to an app or not. If a group wants to support more than one application, every member of the group needs to give access to every application that the group supports.Ā
Some users may feel comfortable with this if there are just one or two big applications in the ecosystem. If someone in my forum wants to use a weird little niche app, I might even feel comfortable sharing that particular forum with them. But I donāt want to give every experimental app access to all of my private forum content.
This becomes a centralizing force. Applications arenāt that useful if they donāt have access to the full set of content for any given forum. Each application needs a critical mass of users to individually authorize each app. Big apps might eventually get there. But the long tail of apps - the experimental ones, the niche ones, the apps that make an open ecosystem interesting & engaging - are completely iced out. This grates against the core value proposition of building on an open protocol: the data should be interoperable, composable, and remixable across applications.Ā
This suggests that we need a more fine-grained mechanism to manage access. And we canāt just riff off of OAuth; our mechanism needs to be programmatically expressible at the protocol layer.
(That said, we might not have seen the last of realms. Keep an eye out for them in a later post š)
Attempt 2: Granular user-controlled access
The immediate followup question is: whatās the unit of access control? What is the ACL actually attached to? And how does that ACL get updated?
Letās walk through a couple scenarios and watch as the complexity escalates.
Note: As we go through these scenarios, Iāll discuss in terms of user-to-user access grants. Applications need to sync on behalf of users which is a whole ānother problem that weāll address in a later blogpost. However, the same basic logic applies to user-to-app access grants.
Simple case: Alice wants to share some permissioned posts with Bob. She puts an ACL on her posts collection granting Bob read access. Great!
A bit harder: Alice has different types of permissioned posts meant for different people - posts for close friends, posts for her mutuals, posts for a paid community. Now she needs separate ACLs for each category. We canāt express these at the collection level anymore, but maybe we can attach ACLs at the record level. Okay, more bookkeeping but still tractable.
Harder still: Alice shares a permissioned post with Bob & Carol, and Bob replies. Carol wants to read the thread. She now needs access to both Aliceās and Bobās permissioned data. Who coordinates that? Alice and Bob each have their own ACLs on their own PDS.Ā
And the real kicker: Alice creates a private community group. She adds Bob and Carol. A bit later, she adds Dan and Eve. Over time, Alice adds fifty more people. Every time someone new joins, every existing memberās ACL on every piece of content in the group needsĀ to be updated to reflect that the new member can now see their contributions to the group. Carol may not even know Dan or Eve much less the 50 other members. Is she individually responsible for updating her ACL when the group owner adds someone? How does she get notified? Even if we invent some sub-protocol for managing ACL updates, can we really expect all group members to update the ACL on every single record meant for the group each time the group boundary changes?
The fundamental issue here is that when access control is attached to individual pieces of data, every social interaction creates a coordination problem. As the number of participants increases, that coordination overhead starts to exponentially spin out of control.
Social interaction revolves around a shared context. What we really want is a way to say: āHereās a space. Hereās who has access to it. Everything in this space inherits that accessā.
We need a bucket.Ā
Buckets
A bucket is a named container that holds records and has a single authoritative ACL. Itās the protocol-level primitive for āthese people can access this stuffā. When you post into a bucket, your post inherits the ACL of that bucket. When someone is added to a bucket, they get access to everything in it. When theyāre removed, they lose it.
A bucket is a bit like a repository. But it isnāt the public repository and (spoiler) probably doesnāt use an MST. Users will have many buckets, one for each social context that they are creating content in. A bucket holds records of many different types and from many members. For instance, a single bucket may contain all content for a given AtmoBoards forum.
To quickly compare buckets back to realms: realms were abstract data partitions that described the ātypeā or intended use of some data. For example, āAtmoBoards forums contentā. A bucket describes a particular shared social context with defined owner, admins, and members. For example, āthe Protocol Nerds forum on AtmoBoardsā.
Buckets give us a few things weāve been missing. They provide a natural unit of access control that is neither too granular (per-record) nor too coarse (per-app). They handle dynamic membership. And they give applications something concrete to sync and index.
One thing worth flagging upfront: a bucket doesnāt necessarily imply a physical container sitting on one PDS. Consider how threads work in the Bluesky app today. Each post lives on its authorās PDS and the thread is compiled by the application by virtue of the fact that each post references the same root. A bucket could work similarly, with its contents distributed across membersā PDSes rather than centralized on one host. There are real tradeoffs here and weāll dig into them in the next post.
And thereās still a lot more to figure out. How is bucket data actually stored/represented in the PDS? How is bucket data addressed? How is bucket ownership managed and can buckets be transferred? How is access to a bucket represented and enforced? Who has access to buckets: users or applications? How do applications sync buckets in real-time?
Weāll get into some of those in the next post. For now, the key insight is this: permissioned data needs a shared context with a perimeter. And that space needs to exist, not just in the application and not scattered across individual usersā PDSes, but at the protocol layer.