RoundSparrow

“Finnegans Wake is the greatest guidebook to media study ever fashioned by man.” - Marshall McLuhan, Newsweek Magazine, page 56, February 28, 1966.

I have never done LSD or any other illegal drugs, but I have read FInnegans Wake: www.LazyWake.com

Lemmy tester, “RocketDerp” is my username on GitHub

  • 248 Posts
  • 631 Comments
Joined 1 year ago
cake
Cake day: June 2nd, 2023

help-circle




  • It’s just so unexpected… the turn Reddit took in 2023 and how Lemmy has responded to success in 2023. The SQL code is obviously performing badly and the Rust community hasn’t really taken Lemmy as something to help out… it could be a showcase of how improving and optimizing is easy with Rust…

    Instead Lemmy.world started crashing all July and August and nobody with Rust background made it an effort to fix the pretty obvious problems or add some cool new feature to show off their coding.

    Weird, I have to keep looking back at Elon Musk 2023 and Reddit and say it isn’t just Lemmy. It’s just odd, like pandemic, to see issues spread across so many areas and low-budget vis high-budget Twitter, etc.




  • who would have predicted that Elon Musk would do all the wild things he did with Twitter. Reddit pissing everyone off in June… pretty odd how audiences are behaving in 2023 towards all this. Oh yha, Threads, that coming on the scene too. 2023 has really been odd for audiences.

    The SQL speaks for itself, but I don’t know what’s going on in terms of why people are treating social media platforms like Lemmy, Twitter, Threads, Reddit this year so unusually. This SQL statement kind of thing has been covered in so many books, conferences, etc. It’s like forgotten history now in the era of Elon Musk X and Reddit Apollo times.

    I don’t know what to say other than I can try to hire a translator or teacher to explain how this SQL problem is obvious and well understood 13 years ago. I mean, there was a whole “NoSQL movement” because of this kind of thing. But I clearly can’t get people to hear past all the Elon Musk, Threads, Lemmy from Reddit … and I’m left describing it as ‘social hazing’ or whatever is gong on with social media.

    Lemmy has like 5 different Rust programming communities, but nobody fixing Lemmy. It’s surreal in 2023 the Elon Musk X days. I think it’s making all of us uncomfortable. The social movement underway.


  • Ok, so let’s look at recent changes that they have deployed… https://github.com/LemmyNet/lemmy/issues/3886

    One of which makes entire tree of comments disappear. Do you see developers fretting over this and fixing it? Or do you see them ignoring the May 27 PostgreSQL JOIN problem.

    How did such a bug go out? Do you see Lemmy developers actually using Lemmy to test things and notice these crashes and problems? Do you look at their posting and comment history? Do they actually go login over at Beehaw and Lemmy.world and see just how terrible the code performance is?

    If it isn’t hazing, what is it?

    It’s as if they build a product only for other people to use… and they don’t notice any of the constant crashes, incredibly slow performance etc - and they act like nobody in the computer industry ever heard of Memcache or Redis to solve performance problems. If it isn’t extreme hazing going on, then what is it?


  • Here, you can dig into what posted days before the pull request you read:

    https://github.com/LemmyNet/lemmy/issues/2877#issuecomment-1685314733

     

    June 4:

    joins are better than in queries with potentially thousands of inserted IDs.

    Given that more than 8 JOIN statements is something PostgreSQL specifically concerns itself with (join_collapse_limit). I hand-edit the query with a single IN clause and the performance problem disappears. 8 full seconds becomes less than 200ms against 5,431,043 posts. And that 200ms is still high, as I was extremely over-reaching with “LIMIT 1000” in case the end-user went wild with blocking lists or some other filtering before reaching the final “LIMIT 10”. When I change it to “LIMIT 20” in the subquery, it drops almost in half to 115ms… still meeting the needs of the outer “LIMIT 10” by double. More of the core query filtering can be put into the IN subquery, as we aren’t dealing with more than 500 length pages (currently limited to 50).

    SELECT 
       "post"."id" AS post_id, "post"."name" AS post_title,
       -- "post"."url", "post"."body", "post"."creator_id", "post"."community_id", "post"."removed", "post"."locked", "post"."published", "post"."updated", "post"."deleted", "post"."nsfw", "post"."embed_title", "post"."embed_description", "post"."thumbnail_url",
       -- "post"."ap_id", "post"."local", "post"."embed_video_url", "post"."language_id", "post"."featured_community", "post"."featured_local",
         "person"."id" AS p_id, "person"."name",
         -- "person"."display_name", "person"."avatar", "person"."banned", "person"."published", "person"."updated",
         -- "person"."actor_id", "person"."bio", "person"."local", "person"."private_key", "person"."public_key", "person"."last_refreshed_at", "person"."banner", "person"."deleted", "person"."inbox_url", "person"."shared_inbox_url", "person"."matrix_user_id", "person"."admin",
         -- "person"."bot_account", "person"."ban_expires",
         "person"."instance_id" AS p_inst,
       "community"."id" AS c_id, "community"."name" AS community_name,
       -- "community"."title", "community"."description", "community"."removed", "community"."published", "community"."updated", "community"."deleted",
       -- "community"."nsfw", "community"."actor_id", "community"."local", "community"."private_key", "community"."public_key", "community"."last_refreshed_at", "community"."icon", "community"."banner",
       -- "community"."followers_url", "community"."inbox_url", "community"."shared_inbox_url", "community"."hidden", "community"."posting_restricted_to_mods",
       "community"."instance_id" AS c_inst,
       -- "community"."moderators_url", "community"."featured_url",
         ("community_person_ban"."id" IS NOT NULL) AS ban,
       -- "post_aggregates"."id", "post_aggregates"."post_id", "post_aggregates"."comments", "post_aggregates"."score", "post_aggregates"."upvotes", "post_aggregates"."downvotes", "post_aggregates"."published",
       -- "post_aggregates"."newest_comment_time_necro", "post_aggregates"."newest_comment_time", "post_aggregates"."featured_community", "post_aggregates"."featured_local",
       --"post_aggregates"."hot_rank", "post_aggregates"."hot_rank_active", "post_aggregates"."community_id", "post_aggregates"."creator_id", "post_aggregates"."controversy_rank",
       --  "community_follower"."pending",
       ("post_saved"."id" IS NOT NULL) AS save,
       ("post_read"."id" IS NOT NULL) AS read,
       ("person_block"."id" IS NOT NULL) as block,
       "post_like"."score",
       coalesce(("post_aggregates"."comments" - "person_post_aggregates"."read_comments"), "post_aggregates"."comments") AS unread
    
    FROM (
       ((((((((((
       (
    	   (
    	   "post_aggregates" 
    	   INNER JOIN "person" ON ("post_aggregates"."creator_id" = "person"."id")
    	   )
       INNER JOIN "community" ON ("post_aggregates"."community_id" = "community"."id")
       )
       LEFT OUTER JOIN "community_person_ban"
           ON (("post_aggregates"."community_id" = "community_person_ban"."community_id") AND ("community_person_ban"."person_id" = "post_aggregates"."creator_id"))
       )
       INNER JOIN "post" ON ("post_aggregates"."post_id" = "post"."id")
       )
       LEFT OUTER JOIN "community_follower" ON (("post_aggregates"."community_id" = "community_follower"."community_id") AND ("community_follower"."person_id" = 3))
       )
       LEFT OUTER JOIN "community_moderator" ON (("post"."community_id" = "community_moderator"."community_id") AND ("community_moderator"."person_id" = 3))
       )
       LEFT OUTER JOIN "post_saved" ON (("post_aggregates"."post_id" = "post_saved"."post_id") AND ("post_saved"."person_id" = 3))
       )
       LEFT OUTER JOIN "post_read" ON (("post_aggregates"."post_id" = "post_read"."post_id") AND ("post_read"."person_id" = 3))
       )
       LEFT OUTER JOIN "person_block" ON (("post_aggregates"."creator_id" = "person_block"."target_id") AND ("person_block"."person_id" = 3))
       )
       LEFT OUTER JOIN "post_like" ON (("post_aggregates"."post_id" = "post_like"."post_id") AND ("post_like"."person_id" = 3))
       )
       LEFT OUTER JOIN "person_post_aggregates" ON (("post_aggregates"."post_id" = "person_post_aggregates"."post_id") AND ("person_post_aggregates"."person_id" = 3))
       )
       LEFT OUTER JOIN "community_block" ON (("post_aggregates"."community_id" = "community_block"."community_id") AND ("community_block"."person_id" = 3)))
       LEFT OUTER JOIN "local_user_language" ON (("post"."language_id" = "local_user_language"."language_id") AND ("local_user_language"."local_user_id" = 3))
       )
    WHERE 
      post_aggregates.id IN (
         SELECT id FROM post_aggregates
         WHERE "post_aggregates"."creator_id" = 3
         ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
         LIMIT 1000
      )
      AND
      (((((((
      (
      (("community"."deleted" = false) AND ("post"."deleted" = false))
      AND ("community"."removed" = false))
      AND ("post"."removed" = false)
      )
      AND ("post_aggregates"."creator_id" = 3)
      )
      AND ("post"."nsfw" = false))
      AND ("community"."nsfw" = false)
      )
      AND ("local_user_language"."language_id" IS NOT NULL)
      )
      AND ("community_block"."person_id" IS NULL)
      )
      AND ("person_block"."person_id" IS NULL)
      )
    ORDER BY "post_aggregates"."featured_local" DESC , "post_aggregates"."published" DESC
    LIMIT 10
    OFFSET 0
    ;
    

     

    If it isn’t social hazing, then what is going on here? Why has this issue gone on since May and servers are crashing every day?


  • . However, I’m far from an expert,

    Funny, because I’m a published author and expert on messaging systems… like Lemmy. Iv’e been building them since 1986 professionally.

    There was a massive thread I posted dozens of comments on that came before today’s pull request… I suggest you read that too.

    Did you notice them even acknowledge server crashes are happening? Do you think developers ever suggest Memcache or Redis? Or discuss how Reddit solved their scaling in 2010 with PostgreSQL?

    but perhaps they themselves felt attacked. I know that wasn’t your intention, but misunderstanding happen, especially over text.

    I don’t have any trouble understanding a bad SQL statement that has 14 JOINs and being told “JOIN is a distraction” after posting tons of examples.

    Do we really need to spoon fed the stuff I did post?

    Have you never seen social hazing in action? is it possible that I might be on to something going on psychologically besides my autism?

    I can’t believe anyone thinks a server should be crashing with 1 user on it.


  • may I voice my opinion on the exchange? This is coming from a place of trying to help, since I really do appreciate all the work you’ve put in and are putting in, and the fediverse can really use your talents, so I hope I don’t offend you.

    Can you explain to me why it isn’t social hazing?

    it didn’t appear that you were being ignored/hazed

    Do you know how to read a SQL statement? I just can’t grasp how it isn’t social hazing. I’ve been reading SQL statements for decades, this is obviously a problematic one.

    Can you offer alternate explanations of how 3 people could think that SQL statement isn’t … poor performing and gong to cause problems? And how an SQL statement without a WHERE clause took them months to discover and fix?

    Extreme hazing is my best answer. I just can’t accept that the SQL statements don’t speak for themselves along with the server crashes. 57K users for 1300 servers is very… taking several seconds to load 10 posts…

    Look at the date… May… this has been going on since May. If it isn’t social hazing … what is it? I keep asking myself that.



  • Why is there a lack of gifs/videos on Lemmy?

    Lemmy’s internal data performance is so horribly slow and crash-causing that I think the last thing they want is even more popular data.

    Video is simply the most superior type of media there is, and I think that not having easy access to it on Lemmy is hurting it.

    Video is more data, popularity is more data. For whatever reason, at every turn, I’ve seen developers turn away from scaling options like Memcache, Redis, or just abandoning ORM data management and rewriting the data interfaces by hand…

    since the sites on which the videos are hosted can track you.

    That’s already true for images that are hot linked routinely, so I don’t think video really changes it.

    I’ve been baffled since June why data and fixing lemmy’s data coding hasn’t been front and center. It’s pretty wild to witness so many come to Lemmy and then turn away… Elon Musk has been flocking people, Reddit, etc. It’s as if the project wants to make code that won’t work on any data. It’s baffeling.



  • ild rants. i’ve seen you do this many times. you need to step back, relax, and not take technical feedback so personally

    I’ve stepped back and watched them ignore the issue since May when all the servers were crashing. Every single Lemmy server was falling over while they ignored the PostgreSQL problems.

    The mistakes are obvious and huge. These are not minor topics.

    our comments there are exceptionally aggressive. you accuse the developers of “hazing” you

    I think they are hazing the entire World Wide Web, Reddit users, etc. How else can you explain such basic SQL problems that they have allowed to go on for so long?

    Its’ as if the mere concept of Redis or Memcache never existed… and that nobody ever heard of JOIN performance problems. If it isn’t extreme social hazing, what is it?


  • having a meltdown on github doesn’t help anybody.

    I’m glad for you that mental control is so trival and you aren’t near death in your life from your brain damage.

    Go outside and take a breath

    I just got back from dinner ant the months of hazing I’ve witnessed hasn’t gone away. The level of social games being played with PostgreSQL in this project are levels beyond anything I’ve encountered in my 50+ years alive. And I’ve first hand seen Bill Gates and his team do all kinds of odd things to groups.

    I am at a total loss to explain why such fundamentals of basic relational database are avoided in this project. If it isn’t social hazing, what is it?


  • Of course, that would be an insane amount of work, especially if it would get ignored, but something to consider!

    I already did an insane amount of work to populate a Lemmy database with over 10 million posts. It is so incredibly slow out of the box that the normal API would take days to accomplish this. i had to rewrite the SQL TRIGGER logic to allow bulk inserts.

    Here is my work on that:

    DROP TRIGGER site_aggregates_post_insert ON public.post;
    
    
    /*
    TRIGGER will be replaced with per-statement INSERT only
    */
    CREATE TRIGGER site_aggregates_post_insert
       AFTER INSERT ON public.post
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION site_aggregates_post_insert();
    
    
    DROP TRIGGER community_aggregates_post_count ON public.post;
    
    
    /*
    TRIGGER will be replaced with per-statement INSERT only
    */
    CREATE TRIGGER community_aggregates_post_count
       AFTER INSERT ON public.post
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION community_aggregates_post_count();
    
    
    DROP TRIGGER person_aggregates_post_count ON public.post;
    
    
    /*
    TRIGGER will be replaced with per-statement INSERT only
    */
    CREATE TRIGGER person_aggregates_post_count
       AFTER INSERT ON public.post
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION person_aggregates_post_count();
    
    
    
    /*
    TRIGGER will be replaced with per-statement INSERT only
    no Lemmy-delete or SQL DELETE to be performed during this period.
    */
    CREATE OR REPLACE FUNCTION public.site_aggregates_post_insert() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
       UPDATE site_aggregates SET posts = posts +
          (SELECT count(*) FROM new_rows WHERE local = true)
          WHERE site_id = 1
          ;
    
       RETURN NULL;
    END
    $$;
    
    
    CREATE OR REPLACE FUNCTION public.community_aggregates_post_count() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
            UPDATE
                community_aggregates ca
            SET
                posts = posts + p.new_post_count
            FROM (
                SELECT count(*) AS new_post_count, community_id
                FROM new_rows
                GROUP BY community_id
                 ) AS p
            WHERE
                ca.community_id = p.community_id;
    
        RETURN NULL;
    END
    $$;
    
    
    /*
    TRIGGER will be replaced with per-statement INSERT only
    no Lemmy-delete or SQL DELETE to be performed during this period.
    */
    CREATE OR REPLACE FUNCTION public.person_aggregates_post_count() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
            UPDATE
                person_aggregates personagg
            SET
                post_count = post_count + p.new_post_count
            FROM (
                SELECT count(*) AS new_post_count, creator_id
                FROM new_rows
                GROUP BY creator_id
                 ) AS p
            WHERE
                personagg.person_id = p.creator_id;
    
        RETURN NULL;
    END
    $$;
    
    
    /*
    ***********************************************************************************************
    ** comment table
    */
    
    
    DROP TRIGGER post_aggregates_comment_count ON public.comment;
    
    
    /*
    TRIGGER will be replaced with per-statement INSERT only
    */
    CREATE TRIGGER post_aggregates_comment_count
       AFTER INSERT ON public.comment
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION post_aggregates_comment_count();
    
    
    -- IMPORTANT NOTE: this logic for INSERT TRIGGER always assumes that the published datestamp is now(), which was a logical assumption with general use of Lemmy prior to federation being added.
    CREATE OR REPLACE FUNCTION public.post_aggregates_comment_count() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
    
            UPDATE
                -- per statement update 1
                post_aggregates postagg
            SET
                comments = comments + c.new_comment_count
            FROM (
                SELECT count(*) AS new_comment_count, post_id
                FROM new_rows
                GROUP BY post_id
                 ) AS c
            WHERE
                postagg.post_id = c.post_id;
    
    
            UPDATE
                -- per statement update 2
                post_aggregates postagg
            SET
                newest_comment_time = max_published
            FROM (
                SELECT MAX(published) AS max_published, post_id
                FROM new_rows
                GROUP BY post_id
                 ) AS c
            WHERE
                postagg.post_id = c.post_id;
    
            UPDATE
                -- per statement update 3
                post_aggregates postagg
            SET
                newest_comment_time_necro = max_published
            FROM (
                SELECT MAX(published) AS max_published, post_id, creator_id
                FROM new_rows
                WHERE published > ('now'::timestamp - '2 days'::interval)
                GROUP BY post_id, creator_id
                 ) AS c
            WHERE
                postagg.post_id = c.post_id
                AND c.creator_id != postagg.creator_id
                ;
    
        RETURN NULL;
    END
    $$;
    
    
    DROP TRIGGER community_aggregates_comment_count ON public.comment;
    
    CREATE TRIGGER community_aggregates_comment_count
       AFTER INSERT ON public.comment
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION public.community_aggregates_comment_count();
    
    
    CREATE OR REPLACE FUNCTION public.community_aggregates_comment_count() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
    
            UPDATE
                community_aggregates ca
            SET
                comments = comments + p.new_comment_count
            FROM (
                SELECT count(*) AS new_comment_count, community_id
                FROM new_rows AS nr
                JOIN post AS pp ON nr.post_id = pp.id
                GROUP BY pp.community_id
                 ) AS p
            WHERE
                ca.community_id = p.community_id
                ;
    
        RETURN NULL;
    
    END
    $$;
    
    
    DROP TRIGGER person_aggregates_comment_count ON public.comment;
    
    CREATE TRIGGER person_aggregates_comment_count
       AFTER INSERT ON public.comment
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION public.person_aggregates_comment_count();
    
    
    CREATE OR REPLACE FUNCTION public.person_aggregates_comment_count() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
    
            UPDATE
                person_aggregates personagg
            SET
                comment_count = comment_count + p.new_comment_count
            FROM (
                SELECT count(*) AS new_comment_count, creator_id
                FROM new_rows
                GROUP BY creator_id
                 ) AS p
            WHERE
                personagg.person_id = p.creator_id;
    
        RETURN NULL;
    END
    $$;
    
    
    DROP TRIGGER site_aggregates_comment_insert ON public.comment;
    
    CREATE TRIGGER site_aggregates_comment_insert
       AFTER INSERT ON public.comment
       REFERENCING NEW TABLE AS new_rows
       FOR EACH STATEMENT
       EXECUTE FUNCTION public.site_aggregates_comment_insert();
    
    
    CREATE OR REPLACE FUNCTION public.site_aggregates_comment_insert() RETURNS trigger
        LANGUAGE plpgsql
        AS $$
    BEGIN
    
       UPDATE site_aggregates
          SET comments = comments +
             (
                SELECT count(*) FROM new_rows WHERE local = true
             )
          WHERE site_id = 1
          ;
    
        RETURN NULL;
    END
    $$;
    

    With this in place, 300,000 posts a minute can be generated and reaching levels of 5 million or 10 million don’t take too long.





  • As I said, I could see one of the communities I subscribed on the stdout

    but only one, right? See federation is kind of automagic in how if one single post came in for that community it could very well go create the community itself. On an empty database.

    Now did you upgrade lemmy-ui and maybe run into problems there?

    I think the safe thing to do at this point is work with postgresql and try to make sense of the data in 15 and deduce why we think Lemmy started as a virgin instance. But it’s going to take some time. And I need to take a couple breaks… I will be around for the next 5 or 6 hours, but need a 15 minute break and then about 30 or 40 minutes break to travel to dinner (but I’ll be online once I arrive).

    pg_dumpall against 15 will give us EVERYTHING - and we grep through that and see if we can figure out if somehow two different databases got created. That’s what I think might have happened. I normally create a half-dozen different databases in 15 for testing federation locally (lemmy-alpha, beta, gamma, etc).

    I guessing the “/lemmy” at the end isn’t exactly how the .config worked prior to us shifting over to the URL scheme.



  • It looks to me like Lemmy found an empty database and issued all the migrations of a new install…

    So that database URL we gave it was wrong, or the restore we did was wrong parameters, etc.

    And, like I mentioned, some confusion already happens with your federation status as I think it rushes out to register itself as a new server with the Lemmy network. And some data got in…

    So… I’m not sure what to do figure this out. We could do a pg_dumpall of your PostgreSQL 15 data and then sift through it and see if we can make sense of how this happened?