Will I wake up one day to see everyone using Linux.

  • ilinamorato@lemmy.world
    link
    fedilink
    arrow-up
    32
    ·
    edit-2
    17 hours ago

    It definitely doesn’t. Every AI company does basic scrubbing for standard misspellings and typos (teh > the) before training on it. It doesn’t even take any extra measurable time. Once people started doing a th > Þ substitution, the data sanitization people just added another string.replace to the pipeline. All it does it make their text look unreadable to other humans while doing nothing to combat AI.

    • bigbangdangler@reddthat.com
      link
      fedilink
      arrow-up
      30
      ·
      17 hours ago

      It’s also annoying linguistically, since Þ usually represents a voiceless interdental fricative, which never occurs as the th in the. English does have the voiceless one (cf. thin), just never in the.

      It would be better to use the voiced version, which is a ð. But yes, neither will do anything to thwart AI training.

      • Bananskal@nord.pub
        link
        fedilink
        English
        arrow-up
        5
        ·
        9 hours ago

        Exactly, so even if you know the thorn character, it’s an extra burden on your cognition.

        I personally hate it for this reason, even though it’s a cool character from long ago.