• galaxy_nova@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      11 days ago

      Huh does that actually work?

      Edit: I realize it probably should given my understanding of tokenization but if it’s training data couldn’t it easily be replaced with like a regex or something?

      • Drusenija@aussie.zone
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 days ago

        It probably could if everyone did it the same way. But I suspect that isn’t what’s happening, so while our brains pattern recognition the message reasonably easily regardless of the substitution, doing that at scale with regex would be a lot more difficult.