• panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    8 hours ago

    Batch process turning unstructured free form text data into structured outputs.

    As a crappy example imagine if you wanted to download metadata about your albums but they’re all labelled “Various Artists”. You can use an LLM call to read the album description and fix the track artists for the tracks, now you can properly organize your collection.

    I’m using the same idea, different domain and a complex set of inputs.

    It can be much more cost effective than manually spending days tagging data and writing custom importers.

    You can definitely go lighter than LLMs. You can use gensim to do category matching, you can use sentence transformers and nearest neighbours (this is basically what Semantle does), but LLM performed the best on more complex document input.

    • vxx@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      8 hours ago

      That’s pretty much what google says they use AI for, for structuring.

      Thanks for your insight.