• galaxy_nova@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    11 days ago

    Huh does that actually work?

    Edit: I realize it probably should given my understanding of tokenization but if it’s training data couldn’t it easily be replaced with like a regex or something?

    • Drusenija@aussie.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 days ago

      It probably could if everyone did it the same way. But I suspect that isn’t what’s happening, so while our brains pattern recognition the message reasonably easily regardless of the substitution, doing that at scale with regex would be a lot more difficult.