• noneabove1182@sh.itjust.worksOPM
    link
    fedilink
    English
    arrow-up
    3
    ·
    8 months ago

    I think the implication is more stating that this dataset is even more useful if you don’t jam the whole thing into your training but instead further filter it to a reasonable number of tokens, around 5T, and train on that subset instead

    I could be incorrect, cause they do explicitly say deduplicating, but it’s phrased oddly either way