I’ve been thinking about what to do about cross-posts (e.g. where the same link is uploaded to both fediverse@lemmy.world and fediverse@lemmy.ml).
In terms of them being annoying, I don’t yet know what to do about that.
My progress so far, and what it requires:
The Community table has an extra field (xp_indicator), for the field which determines if something is a cross-post or not. It defaults to URL, but it could be the title for communities like AskLemmy.
The Post table has an extra field (cross_posts), which is an array of other post ids (Note: this would lock PieFed into using Postgresql)
New posts, for local and ActivityPub, are checked to see if they are a cross-post, and the relevant posts are updated. This also happens for local edits and AP Update. In the DB, the posts in the screenshot looks like:
-[ RECORD 1 ]----------------------------------------------------------
id | 27
title | Springtime Ministrone
url | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {28,29,30}
-[ RECORD 2 ]----------------------------------------------------------
id | 28
title | Springtime Ministrone
url | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {27,29,30}
-[ RECORD 3 ]----------------------------------------------------------
id | 29
title | Springtime Ministrone
url | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {27,28,30}
-[ RECORD 4 ]----------------------------------------------------------
id | 30
title | Springtime Ministrone
url | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {27,28,29}
In the UI, posts with cross-posts get an extra icon, which when clicked bring you to another screen (similar to ‘other discussions’ in Reddit)
In terms of hiding duplicate posts from the feed, I don’t yet know. If it was up to the back-end, it would require some extra DB activity that might be unacceptable speed-wise. This update would mean though, that a future API could provide a response similar to Lemmy for posts, so apps/frontends could merge duplicates the same way some of them do for Lemmy. Likewise, if there was a ‘Hide posts marked as read’ feature, it could regard any post ids in the cross_posts field as also being Read.
I have to wait a few days until the quota on my ngrok account resets (something in the Fediverse went crazy, I’d guess), so I thought I’d share here in the meantime. Also, it means the PR doesn’t come out of the blue, and it can be discussed beforehand.
(also: it turns out I can’t spell ‘minestrone’)
Oh, okay. I was only thinking of using ‘title’ for very few communities, like AskLemmy or ShowerThoughts, but I see how it could produce false positives even for those (I may also have been misled by the recent Issue into thinking title-based cross-posts happen more often than they do).
Speaking of that Issue, maybe the search for URL-based cross-posts could also happen in Redis - would be quicker, and would only be for recent stuff (depending on the expiry for how recent, of course).
Anyway, I’ll share here how I eventually got DB arrays to work, in case anyone considers it for anything else:
from sqlalchemy.dialects.postgresql import ARRAY from sqlalchemy.ext.mutable import MutableList ... cross_posts = db.Column(MutableList.as_mutable(ARRAY(db.Integer)))
(they need to be mutable, because the DB won’t update when they’re added to, otherwise)
Fetching them is this code (called when the ‘layers’ icon is clicked):
@bp.route('/post/<int:post_id>/cross_posts', methods=['GET']) def post_cross_posts(post_id: int): post = Post.query.get_or_404(post_id) cross_posts = Post.query.filter(Post.id.in_(post.cross_posts)).all() return render_template('post/post_cross_posts.html', post=post, cross_posts=cross_posts)
This isn’t as bad as that Stack Overflow post, because it’s not Joining those values with another table. The values in the array are sort-of self-references, rather than foreign keys, I think, so I assumed it’d be quicker than using another table (which would then refer back to the Post table again)
Oh, well, if we can use
Post.id.in_()
, that’s quite elegant! That goes a long way to mollifying my concerns. Let’s do it!Okay. I’ll nix the xp_indicator idea (which’ll also make the code clearer), and keep plodding on.