In the last post, I discussed end-to-end encryption and why weâre focusing instead on designing a permissioned data system. That was a narrowing down of the problem space. In this post, Iâm going to start building up an actual solution. Iâll be introducing a new concept that is core to how weâre thinking about permissioned data: buckets.
What weâre trying to do here
Before diving into solutions, letâs first ground ourselves in what weâre trying to achieve.Â
Permissioned data should feel like a natural extension of how public data already works in atproto. That doesnât mean that we need to re-use the exact data structures and sync protocols as the public data system. However, the general shape should be familiar: users publish records, those records are canonically stored in the userâs PDS, applications crawl PDSes and sync data to build their own views, users own their data and can move around, authority rests in the DID that publishes the data, the system works at scale, and users shouldnât feel theyâre dealing with weird behavior just because theyâre on a decentralized protocol.
Ideally we handle permissioned data with one coherent protocol rather than a patchwork of different systems for different use cases, though we shouldnât be dogmatic about that if the design demands otherwise.
Groups are the hard case
Letâs revisit the modalities that I called out last time (excluding messaging this time):
Personal Data: mutes, bookmarks, drafts
Gated content: Patreon, Substack, paid newsletters
Socially shared: private posts, stories
Groups: Facebook groups, private forums, private subreddits
These are roughly ordered by complexity. Personal data is simple, itâs just you, your PDS, and maybe an application or two acting on your behalf. Gated content is one-to-many with a clear gatekeeper. Social sharing introduces some dynamism around who is able to view your stuff, how they interact with it, and who can see their interactions.
Groups are many-to-many. They have dynamic membership - people join & leave, admins change, ownership of the group changes. Many users are contributing content to a shared context. Users in these groups may want to view their groups in any number of different apps.
My hunch is that if we can design a system that works for groups, the simpler modalities will fall out naturally. Groups force you to confront the hardest questions about ownership, membership, and access control. So thatâs the modality Iâll focus on in this post.
Iâll sketch out two possible solutions and then introduce the concept of a âbucketâ which resolves the issues from each of the preceding solutions.
Attempt 1: App-controlled access (realms)
Think about the role an application plays in deciding what a user is able to see. Even in public social modalities like Bluesky (as in apps built on the app.bsky.* lexicons), the application prevents users from seeing posts that violate thread gates and post gates. Blocks prevent the blocked user from viewing certain posts and also prevent third parties from viewing block-violating posts. Of course, for public content you can always go directly to the protocol. Still, this is a basic form of access control being applied by the application.
One clean solution to permissioned data may be to say âlet applications handle itâ. If an application has access to all permissioned data for some particular social modality, then it can apply arbitrarily complex access control rules around which users are able to see which content in which context. Applications are in the best position to do this because they fully understand the business logic behind who has access to what. Keep the protocol logic coarse and give applications full flexibility around access control logic.
For applications to fulfill this function, they need access to all the content for a particular modality. Letâs call this a ârealmâ. A realm is an abstract content partition in the network intended for a particular âtypeâ or âuseâ of permissioned data. Realms can be identified by an NSID defined by publishing a lexicon. When a user creates a permissioned record, they specify the realm that it is being posted into.
To make this concrete, consider a private forum application called âAtmoBoardsâ. AtmoBoards can define a new realm by creating a lexicon with an NSID, something like com.atmoboards.forum. A realm is network-wide and heterogeneous. The AtmoBoards realm contains posts, comments, profiles, votes, and more. It contains all AtmoBoards forum content from all users.
Access to the realm naturally translates into an authorization scope that can be displayed on the OAuth consent screen. Something like âYour content within AtmoBoards forumsâ. The app then simply syncs the data from the userâs PDS using the given OAuth credential. Applications donât intrinsically âget access to the whole realmâ. However, when a user logs into an application that works with a particular realm, that application requests access to all of that userâs content in that realm.
This is conceptually pretty elegant. We get to simply reuse existing auth infrastructure. The consent flow is legible to users. Applications can offer arbitrarily complex access control rules on a user-by-user basis. Users maintain the canonical copy of their data which enables users to choose their application and therefore migrate their community between applications.
However, problems start to emerge when users in the same group use different applications.
Say Alice, Bob, and Carol are all in a private forum together. Alice and Bob use AtmoBoards, but Carol uses a new app called ForumBrowser. Bob posts something in the group. Ideally Carol should be able to see it because sheâs in the group! But ForumBrowser canât access Bobâs post unless Bob has separately authorized ForumBrowser through an OAuth flow. Bob probably hasnât, he might not even know ForumBrowser exists!Â
Maybe we introduce a programmatic way to give access to a given application without going through the OAuth flow. If Bob and Alice both decide that Carol should be able to access this forum through ForumBrowser, they could choose to grant it access to their content in the com.atmoboards.forum realm. However, remember this realm contains all content from all forums. So in granting ForumBrowser access to this particular forum, they also grant ForumBrowser access to all of their forum content from across all forums.
This highlights the basic problem with realms: the protocol-level access boundary is too coarse. You either give all of your AtmoBoards content to an app or not. If a group wants to support more than one application, every member of the group needs to give access to every application that the group supports.Â
Some users may feel comfortable with this if there are just one or two big applications in the ecosystem. If someone in my forum wants to use a weird little niche app, I might even feel comfortable sharing that particular forum with them. But I donât want to give every experimental app access to all of my private forum content.
This becomes a centralizing force. Applications arenât that useful if they donât have access to the full set of content for any given forum. Each application needs a critical mass of users to individually authorize each app. Big apps might eventually get there. But the long tail of apps - the experimental ones, the niche ones, the apps that make an open ecosystem interesting & engaging - are completely iced out. This grates against the core value proposition of building on an open protocol: the data should be interoperable, composable, and remixable across applications.Â
This suggests that we need a more fine-grained mechanism to manage access. And we canât just riff off of OAuth; our mechanism needs to be programmatically expressible at the protocol layer.
(That said, we might not have seen the last of realms. Keep an eye out for them in a later post đ)
Attempt 2: Granular user-controlled access
The immediate followup question is: whatâs the unit of access control? What is the ACL actually attached to? And how does that ACL get updated?
Letâs walk through a couple scenarios and watch as the complexity escalates.
Note: As we go through these scenarios, Iâll discuss in terms of user-to-user access grants. Applications need to sync on behalf of users which is a whole ânother problem that weâll address in a later blogpost. However, the same basic logic applies to user-to-app access grants.
Simple case: Alice wants to share some permissioned posts with Bob. She puts an ACL on her posts collection granting Bob read access. Great!
A bit harder: Alice has different types of permissioned posts meant for different people - posts for close friends, posts for her mutuals, posts for a paid community. Now she needs separate ACLs for each category. We canât express these at the collection level anymore, but maybe we can attach ACLs at the record level. Okay, more bookkeeping but still tractable.
Harder still: Alice shares a permissioned post with Bob & Carol, and Bob replies. Carol wants to read the thread. She now needs access to both Aliceâs and Bobâs permissioned data. Who coordinates that? Alice and Bob each have their own ACLs on their own PDS.Â
And the real kicker: Alice creates a private community group. She adds Bob and Carol. A bit later, she adds Dan and Eve. Over time, Alice adds fifty more people. Every time someone new joins, every existing memberâs ACL on every piece of content in the group needs to be updated to reflect that the new member can now see their contributions to the group. Carol may not even know Dan or Eve much less the 50 other members. Is she individually responsible for updating her ACL when the group owner adds someone? How does she get notified? Even if we invent some sub-protocol for managing ACL updates, can we really expect all group members to update the ACL on every single record meant for the group each time the group boundary changes?
The fundamental issue here is that when access control is attached to individual pieces of data, every social interaction creates a coordination problem. As the number of participants increases, that coordination overhead starts to exponentially spin out of control.
Social interaction revolves around a shared context. What we really want is a way to say: âHereâs a space. Hereâs who has access to it. Everything in this space inherits that accessâ.
We need a bucket.Â
Buckets
A bucket is a named container that holds records and has a single authoritative ACL. Itâs the protocol-level primitive for âthese people can access this stuffâ. When you post into a bucket, your post inherits the ACL of that bucket. When someone is added to a bucket, they get access to everything in it. When theyâre removed, they lose it.
A bucket is a bit like a repository. But it isnât the public repository and (spoiler) probably doesnât use an MST. Users will have many buckets, one for each social context that they are creating content in. A bucket holds records of many different types and from many members. For instance, a single bucket may contain all content for a given AtmoBoards forum.
To quickly compare buckets back to realms: realms were abstract data partitions that described the âtypeâ or intended use of some data. For example, âAtmoBoards forums contentâ. A bucket describes a particular shared social context with defined owner, admins, and members. For example, âthe Protocol Nerds forum on AtmoBoardsâ.
Buckets give us a few things weâve been missing. They provide a natural unit of access control that is neither too granular (per-record) nor too coarse (per-app). They handle dynamic membership. And they give applications something concrete to sync and index.
One thing worth flagging upfront: a bucket doesnât necessarily imply a physical container sitting on one PDS. Consider how threads work in the Bluesky app today. Each post lives on its authorâs PDS and the thread is compiled by the application by virtue of the fact that each post references the same root. A bucket could work similarly, with its contents distributed across membersâ PDSes rather than centralized on one host. There are real tradeoffs here and weâll dig into them in the next post.
And thereâs still a lot more to figure out. How is bucket data actually stored/represented in the PDS? How is bucket data addressed? How is bucket ownership managed and can buckets be transferred? How is access to a bucket represented and enforced? Who has access to buckets: users or applications? How do applications sync buckets in real-time?
Weâll get into some of those in the next post. For now, the key insight is this: permissioned data needs a shared context with a perimeter. And that space needs to exist, not just in the application and not scattered across individual usersâ PDSes, but at the protocol layer.