Can you explain how federation works on protocol level?

akkajdh999@programming.dev · 4 days ago

Can you explain how federation works on protocol level?

Draconic NEO@lemmy.world · 3 days ago

Think of it this way, when you make a post that post will be automatically distributed by your server to everyone who is a subscriber, depending on the type of platform that could mean subscriber to the community, or it could mean to your user account in the case of things like Mastodon. When the post is received it will be copied and re-hosted on all the servers which have subscribers.

Exceptions to this happening are in the case of a user being banned or server being defederated, in which case the request is denied and the post isn’t re-hosted by the instance with the ban or defederation against the user or server who made the post. It should be known that bans and defederation only typically happen in extreme cases such as defending against spam, hate speech, or abusive users.

Might be a more simple explanation but I’m trying to keep it more simple since it helps people better understand the process.

flamingos-cant@feddit.uk · edit-2 4 days ago

alice from A makes a post “Hello, world” to B

Alice can’t make a post to B, but I assume you mean a community on B, let’s call it foo. When Alice makes a post it first goes through A’s local API and creates the local (and canonical) version of Alice’s post. Once A has finished processing Alice’s post, it will create an ActivityPub representation of Alice’s post to send to B.

ActivityPub is basically a bunch of assumptions laid on top of JSON. An ActivityPub ‘file’ can be divided into broadly 3 types, Object, Activity and actors.^[1] These types then have subtypes; for example, both Alice and foo are actors but Alice is a Person while foo is a Group.

A second important assumption of ActivityPub is the concept of inboxs and outboxs, but, for Lemmy, only inboxs matter. An inbox is just a URL where Lemmy can send activities and it’s something all actors have.

So when instance A is finished processing Alice’s post, it will turn it into a Page object, wrap that in a Create activity and send it foo’s inbox.

Round about what the JSON would look like

{
  "@context": [
    "https://join-lemmy.org/context.json",
    "https://www.w3.org/ns/activitystreams"
  ],
  "actor": "https://a/u/alice",
  "type": "Create",
  "to": ["https://www.w3.org/ns/activitystreams#Public"],
  "cc": ["https://b/c/foo"],
  "id": "https://a/activities/create/19199919009100",
  "object": {
    "type": "Page",
    "id": "https://a/post/1",
    "attributedTo": "https://a/u/alice",
    "to": [
      "https://b/c/foo",
      "https://www.w3.org/ns/activitystreams#Public"
    ],
    "audience": "https://b/c/main",
    "name": "Hello world",
    "attachment": [],
    "sensitive": false,
    "language": {
      "identifier": "en",
      "name": "English"
    },
    "published": "2024-12-29T15:10:51.557399Z"
  }
}

.

Now instance B will then receive this and do the same kind of processing A did when Alice created the post via the API. Once it has finished, it will turn the post back into a Page but this time wrap it in an Announce activity. B will then look at all the actors that follow the foo (i.e. are subscribed to it) and send this Announce to all of their inboxs. Assuming a user on instance C follows foo, it will receive this Announce and process it like A and B before it, creating the local version of Alice’s post.

Edit: I made a small mistake, I said that foo wrapped the Page in an Announce, when it actually wraps the Create in an Announce.

Technically, Activity and actors are themselves objects, but they’re treated differently. There’s also Collection’s which are their own type, but Lemmy doesn’t really utilise them. ↩︎

akkajdh999@programming.dev · 4 days ago

Thank you, very clear.

So B will list all users subscribed to foo, look at their instances, and send the update to them.

I assume that if someone from a new instance (D) subscribes to foo, then D will need to request all the old posts from foo, since they weren’t pushed to D?

flamingos-cant@feddit.uk · 4 days ago

I assume that if someone from a new instance (D) subscribes to foo, then D will need to request all the old posts from foo, since they weren’t pushed to D?

Lemmy is pretty bad about backfilling content. Communities do have outboxs, but these only list the last 50 posts and you can’t get the vote or comments on any of them. See GitHub issues #5283, #3448 and #2004.

Kichae@lemmy.ca · 4 days ago

ActivityPub works like a magazine subscription. They don’t send you back issues for subscribing.

Burstar@sopuli.xyz · 3 days ago

Why does a mastodon user get completely different profiles and history when viewed from different lemmy instances? They look like 2 completely different users when compared except for having the same @address. In fact this makes them immune from moderation if they comment from a different instance than the mod is on.

flamingos-cant@feddit.uk · 3 days ago

Mastodon doesn’t have Group support (fep-1b12), so when they reply to a post, they don’t send it to the community’s inbox (only to the inbox of the Person they’re replying to), thus breaking Lemmy’s model of federation.

Burstar@sopuli.xyz · 3 days ago

Okay, thanks.

AbouBenAdhem@lemmy.world · 4 days ago

Does ActivityPub really send copies of all activities to www.w3.org?

flamingos-cant@feddit.uk · 4 days ago

No, the https://www.w3.org/ns/activitystreams#Public is just there to indicate that it’s ok for receiving instances to display this publicly, nothing actually gets sent to it. See the spec for more details.

AbouBenAdhem@lemmy.world · 4 days ago

Why not a binary flag or something? Is it just to avoid making it a formal part of the protocol?

JackbyDev@programming.dev · edit-2 2 days ago

Because it is JSON-LD and that’s how JSON-LD works. It’s an extensible format. Similar to XML namespaces.

akkajdh999@programming.dev · 2 days ago

So overengineered bullshit

JackbyDev@programming.dev · 2 days ago

I don’t understand the comment. It’s like calling the fact that firstName is in the JSON {"firstName": "Bob"} “over engineered bullshit” when they should’ve made some application specific protocol instead of using JSON. ActivityStreams and ActivityPub are built on top of JSON-LD to utilize existing libraries to represent linked data (that’s what the LD is). To specify what schemas are used there is a “context” field. There are other schemas as well. Take a look at https://schema.org/ to see them.

If it feels over engineered it’s because it’s meant to be able to represent a wide variety of types of social media and typical interactions with them. I seriously doubt Mastodon (micro blogging) and Lemmy (link aggregation forum) would be able to interact easily if they weren’t “over engineered”.

akkajdh999@programming.dev · edit-2 2 days ago

I don’t care, json-ld is itself overengineered, ie bloating every JSON that you send with 300 useless http:// links without an actual purpose (instead of a boolean flag or whatever) This bloated protocol doesn’t even… work properly.

flamingos-cant@feddit.uk · 4 days ago

I actually don’t know, you’d need to ask someone privy to design decisions made with ActivityPub, like Prodromou or Lemmer-Webber. It’s definitely not to avoid making it part of the protocol, because it already is (see the link in the last comment).

JackbyDev@programming.dev · 2 days ago

It’s because it’s JSON-LD.

flamingos-cant@feddit.uk · 10 hours ago

What about JSON-LD makes it so they have to include the “this is public” declaration in the to field instead of having an as:public property on the object? (I don’t know a whole lot about JSON-LD or RDF more broadly)

AbouBenAdhem@lemmy.world · edit-2 4 days ago

Thanks—I meant “formal” as in “formal grammar”, not that it wasn’t described in the published protocol. As in, there’s nothing in the protocol’s explicit form that distinguishes between this implied meaning and a real extra recipient—so it simplifies the parsing but adds an extra post-parsing step.

JackbyDev@programming.dev · 2 days ago

It helps when you understand that you only ever directly interact with your instance.

Alice posts to A (in some community hosted on B)
B is federated with A so will eventually receive the post
C is federated with B so will eventually get the post

h4x0r@lemmy.dbzer0.com · 4 days ago

https://www.w3.org/TR/activitypub/

https://en.m.wikipedia.org/wiki/ActivityPub

FrostyTrichs@walledgarden.xyz · 4 days ago

The easiest way to explain it is that the instances have no native ability to crawl other instances for communities or content. For all intents and purposes, a fresh Lemmy server is on an island and all other instances are their own island until someone builds a bridge to them.

The ability of an instance to receive content is dependent on the subscriptions users add to the database. Once the instance is aware of these other places it will begin checking them for updates and you’ll see them regularly whether you interact with them or not.

This goes completely against what the average person is expecting and causes a lot of confusion.

jollyroberts@jolly-piefed.jomandoa.net · edit-2 4 days ago

Piefed instances now do have a form of this for instance admins to populate new instances.

Admins can:
-pull the lemmyverse data and subscribe to a bunch of communities at once
or
-target a single lemmy or mbin instance, get the list of communities that instance hosts, and subscribe to a bunch of communities on that instance.

Both have some tunable settings to allow admins control over how many communities are followed.

Its not an end-user thing, but it should help with setting up new instances and them not being so ‘empty’.

edit: typo

FrostyTrichs@walledgarden.xyz · 4 days ago

That sounds like a much better implementation of community discovery.

JubilantJaguar@lemmy.world · 4 days ago

This goes completely against what the average person is expecting and causes a lot of confusion.

But this is only true if the user looks at the All feed, correct?

FrostyTrichs@walledgarden.xyz · 4 days ago

But this is only true if the user looks at the All feed

It impacts what content is available to users at all. The All feed is just the visual representation of what’s actively federating.

Let’s say you join a new instance for whatever reason with no outside awareness of how the fediverse works. If you try to search the instance for “sportball” and get zero results the natural assumption is going to be that there are no communities and no interest in that topic. The user has no idea that lemmyserver5000.com has a sportball community with thousands of users because no one with those interests ever did the work to get the content flowing in a way that they could access it intuitively. It’s a poor design IMO.

The reason I brought it up has more to do with starting a new instance or using a smaller instance. Communities that the instance isn’t aware of (via someone previously subscribing) won’t show up at all which causes places to appear non-existent or dead by default. Someone trying a federating website for the first time isn’t going to know this, so to them, that’s all the fediverse has to offer.

JubilantJaguar@lemmy.world · 4 days ago

OK, I see that problem. In fact I remember having the same issue myself. (Presumably this will create a secondary confusion problem for “All” subscribers, who will see the content of their feed gradually expand without explanation as other users subscribe to other foreign servers, correct? Whatever, I don’t care much about them, someone who subscribes to “All” apparently doesn’t know what they want anyway!)

So the optimal solution here would be for each instance to preemptively connect to a whitelist of known foreign communities, perhaps? Or maybe each instance could regularly ping other servers in order to update its search database with popular communities.

Kichae@lemmy.ca · 4 days ago

It’s a poor design if what you want to do is emulate a centralized social media service.

But maybe we should stop trying to do that.

FrostyTrichs@walledgarden.xyz · 4 days ago

Maybe.

But I’d counter that it’s prohibitive to growth. People aren’t used to turning up at a domain name only to find out 90% of the content can’t be accessed without jumping through a bunch of hoops.

Zak@lemmy.world · 4 days ago

instances have no native ability to crawl other instances for communities or content

That’s not quite true. They don’t do it automatically or routinely, but a user can cause a server to read a post from another server by putting its URL into the search box. This can be useful for an end user to manually address a federation glitch.

Here’s a concrete example. I was trying to post a comment via lemmy.world, but lemmy.world sits behind Cloudflare, and Cloudflare flagged its content as potentially malicious. I then posted that comment via my own Mastodon server, but push federation to lemmy.world also failed, for the same reason. I could, however cause lemmy.world to pull the comment using the search.

Scipitie@lemmy.dbzer0.com · 4 days ago

Does that mean that an “all” view is "onl"y all of the subscriptions/places people from my server have?

That’s quite interesting.

And thanks!

FrostyTrichs@walledgarden.xyz · 4 days ago

Does that mean that an “all” view is "onl"y all of the subscriptions/places people from my server have?

Correct.

Rikudou_Sage@lemmings.world · 4 days ago

Note that many instances either have a bot subscribed to other communities to force federation, or use something like https://lemmy-federate.com/

FrostyTrichs@walledgarden.xyz · 4 days ago

Note that many instances either have a bot subscribed to other communities to force federation, or use something like https://lemmy-federate.com/

FWIW this approach can be helpful but is flawed in its own ways.

Firstly, since not all instances participate you still aren’t getting the “complete” fediverse so to speak. This becomes less of an issue as more instances join the bot program, but it’s another step that roadblocks what should be an easy and organic process.

Secondly, the bot can pose a potential security risk depending on how it’s configured. If you use it to federate in both directions you’re subject to malicious actors spinning up tons of new communities on instances that don’t restrict user registration. This will in turn hammer the database an instance uses for EVERYTHING and eventually causes slow downs, crashes, etc. The solution to this is to only seed your communities outwardly but if everyone only does that the bot is rather useless…

I don’t have a solution for any of this, I’m just pointing out some rather frustrating problems this platform has in its current state.

Rikudou_Sage@lemmings.world · 4 days ago

Well, you can always defederate if an instance starts abusing it. Not that much different to the normal flow, really.

FrostyTrichs@walledgarden.xyz · 4 days ago

you can always defederate if an instance starts abusing it

Sure, but potentially after at least one of the instances subscribed to the bot goes down and someone realizes what’s happening. It’s incredibly easy to overwhelm a small server’s database just by subscribing to a lot of communities the normal way. The difference here is potentially any instance federating the bot in both directions is susceptible to this.

Not that much different to the normal flow, really.

The impact across the fediverse vs just one instance would be the main difference. Plenty of people are using that bot having no real idea of what it’s doing.

Rikudou_Sage@lemmings.world · 4 days ago

That’s just a part of the learning process, IMO. My instance crashed many times, I’ve fixed it every time and now it’s better than before. And I don’t think I’ve had my last fuck up with the instance.

FrostyTrichs@walledgarden.xyz · 4 days ago

And that’s fine for you, I’m not knocking the experimenting and learning process. That was the whole reason I spun up an instance myself.

What I’m saying is that to the other users that would be impacted by these things, it sucks. People are patient to a point but the fediverse has a lot of odd quirks that make it more difficult than it should be to use for a lot of people. Things have gotten better in the last year or so but it still feels like we’re asking people to know more than they should have to just to figure out that Lemmy isn’t empty. Many people will get frustrated and leave long before they start making excuses for a site they don’t know anything about.

It’s easy to sit around proclaiming that reddit sucks but the fact of the matter is that it’s easy to use and everything they have to offer is covered under one domain. Again, I don’t have the solution to these things for Lemmy, but we can’t deny that this platform is harder to use than most and a lot of people aren’t going to handle that well.

Illecors@lemmy.cafe · 4 days ago

A makes a post to B
B federates that post to all instances that have at least 1 user subbed to the community of the post

All users from all instances get the post from their home instance.

akkajdh999@programming.dev · edit-2 4 days ago

Thanks but this is quite high-level.

Okay, so Alice makes a request to A. A makes a request to B. B makes requests to all other instances.

If you get posts from your home instance, does it mean that all instances duplicate the same database?

Zak@lemmy.world · 4 days ago

They don’t duplicate the database in a technical sense, but when things go right, they each have a copy of the same post and comment text, and the same votes.

akkajdh999@programming.dev · edit-2 4 days ago

Do you mean that the database is not identical, but still duplicates all data, basically? (you said “they each have a copy”, I assume it’s persistent on disk). So if we have 100 lemmy instances, they all save the same post.

Zak@lemmy.world · 4 days ago

Correct. Each server that shows the post to its users stores a copy of the post. It does not necessarily store attached media (IIRC Mastodon usually does and Lemmy usually hotlinks media).

4 days ago

If you get posts from your home instance, does it mean that all instances duplicate the same database?

Ur home instance only has a database of posts that are on a community that at least 1 user has subscribed to.