I’ve been thinking about what to do about cross-posts (e.g. where the same link is uploaded to both fediverse@lemmy.world and fediverse@lemmy.ml).

In terms of them being annoying, I don’t yet know what to do about that.

My progress so far, and what it requires:
The Community table has an extra field (xp_indicator), for the field which determines if something is a cross-post or not. It defaults to URL, but it could be the title for communities like AskLemmy.
The Post table has an extra field (cross_posts), which is an array of other post ids (Note: this would lock PieFed into using Postgresql)
New posts, for local and ActivityPub, are checked to see if they are a cross-post, and the relevant posts are updated. This also happens for local edits and AP Update. In the DB, the posts in the screenshot looks like:

-[ RECORD 1 ]----------------------------------------------------------
id          | 27
title       | Springtime Ministrone
url         | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {28,29,30}
-[ RECORD 2 ]----------------------------------------------------------
id          | 28
title       | Springtime Ministrone
url         | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {27,29,30}
-[ RECORD 3 ]----------------------------------------------------------
id          | 29
title       | Springtime Ministrone
url         | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {27,28,30}
-[ RECORD 4 ]----------------------------------------------------------
id          | 30
title       | Springtime Ministrone
url         | https://www.bbcgoodfood.com/recipes/springtime-minestrone
cross_posts | {27,28,29}

In the UI, posts with cross-posts get an extra icon, which when clicked bring you to another screen (similar to ‘other discussions’ in Reddit)

In terms of hiding duplicate posts from the feed, I don’t yet know. If it was up to the back-end, it would require some extra DB activity that might be unacceptable speed-wise. This update would mean though, that a future API could provide a response similar to Lemmy for posts, so apps/frontends could merge duplicates the same way some of them do for Lemmy. Likewise, if there was a ‘Hide posts marked as read’ feature, it could regard any post ids in the cross_posts field as also being Read.

I have to wait a few days until the quota on my ngrok account resets (something in the Fediverse went crazy, I’d guess), so I thought I’d share here in the meantime. Also, it means the PR doesn’t come out of the blue, and it can be discussed beforehand.

(also: it turns out I can’t spell ‘minestrone’)

  • Andrew@piefed.socialOP
    link
    fedilink
    arrow-up
    1
    ·
    8 months ago

    Oh, okay. I was only thinking of using ‘title’ for very few communities, like AskLemmy or ShowerThoughts, but I see how it could produce false positives even for those (I may also have been misled by the recent Issue into thinking title-based cross-posts happen more often than they do).

    Speaking of that Issue, maybe the search for URL-based cross-posts could also happen in Redis - would be quicker, and would only be for recent stuff (depending on the expiry for how recent, of course).


    Anyway, I’ll share here how I eventually got DB arrays to work, in case anyone considers it for anything else:

    from sqlalchemy.dialects.postgresql import ARRAY
    from sqlalchemy.ext.mutable import MutableList
    ...
    cross_posts = db.Column(MutableList.as_mutable(ARRAY(db.Integer)))
    

    (they need to be mutable, because the DB won’t update when they’re added to, otherwise)


    Fetching them is this code (called when the ‘layers’ icon is clicked):

    @bp.route('/post/<int:post_id>/cross_posts', methods=['GET'])
    def post_cross_posts(post_id: int):
        post = Post.query.get_or_404(post_id)
        cross_posts = Post.query.filter(Post.id.in_(post.cross_posts)).all()
        return render_template('post/post_cross_posts.html', post=post, cross_posts=cross_posts)
    

    This isn’t as bad as that Stack Overflow post, because it’s not Joining those values with another table. The values in the array are sort-of self-references, rather than foreign keys, I think, so I assumed it’d be quicker than using another table (which would then refer back to the Post table again)

    • Rimu@piefed.socialM
      link
      fedilink
      arrow-up
      1
      ·
      8 months ago

      Oh, well, if we can use Post.id.in_(), that’s quite elegant! That goes a long way to mollifying my concerns. Let’s do it!

      • Andrew@piefed.socialOP
        link
        fedilink
        arrow-up
        1
        ·
        8 months ago

        Okay. I’ll nix the xp_indicator idea (which’ll also make the code clearer), and keep plodding on.