• kadu@lemmy.world
    link
    fedilink
    English
    arrow-up
    40
    ·
    12 hours ago

    No way the lobotomized monkey we trained on internet data is reproducing internet biases! Unexpected!

  • VeryFrugal@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    74
    arrow-down
    1
    ·
    edit-2
    16 hours ago

    I always use this to showcase how biased an LLM can be. ChatGPT 4o (with code prompt via Kagi)

    Such an honour to be a more threatening race than white folks.

    • BassTurd@lemmy.world
      link
      fedilink
      English
      arrow-up
      32
      ·
      edit-2
      12 hours ago

      Apart from the bias, that’s just bad code. Since else if executes in order and only continues if the previous block is false, the double compare on ages is unnecessary. If age <= 18 is false, then the next line can just be, elif age <= 30. No need to check if it’s also higher than 18.

      This is first semester of coding and any junior dev worth a damn would write this better.

      But also, it’s racist, which is more important, but I can’t pass up an opportunity to highlight how shitty AI is.

      • VeryFrugal@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        8
        ·
        12 hours ago

        Yeah, more and more I notice that at the end of the day, what they spit out without(and often times, even with) any clear instructions is barely a prototype at best.

      • CosmicTurtle0@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        7
        ·
        12 hours ago

        Honestly it’s a bit refreshing to see racism and ageism codified. Before there was no logic to it but now, it completely makes sense.

    • theherk@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      1
      ·
      12 hours ago

      FWIW, Anthropic’s models do much better here and point out how problematic demographic assessment like this is and provide an answer without those. One of many indications that Anthropic has a much higher focus on safety and alignment than OpenAI. Not exactly superstars, but much better.

    • Meursault@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      5
      ·
      edit-2
      14 hours ago

      How is “threat” being defined in this context? What has the AI been prompted to interpret as a “threat”?

        • Meursault@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          12 hours ago

          I figured. I’m just wondering about what’s going on under the hood of the LLM when it’s trying to decide what a “threat” is, absent of additional context.

      • zlatko@programming.dev
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        13 hours ago

        Also, there was a comment on “arbitrary scoring for demo purposes”, but it’s still biased, based on biased dataset.

        I guess this is just a bait prompt anyway. If you asked most politicians running your government, they’d probably also fail. I guess only people like a national statistics office might come close, and I’m sure if they’re any good, they’d say that the algo is based on “limited, and possibly not representative data” or something.

  • boonhet@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    87
    arrow-down
    2
    ·
    18 hours ago

    Dataset bias, what else?

    Women get paid less -> articles talking about women getting paid less exist. Possibly the dataset also includes actual payroll data from some org that has leaked out?

    And no matter how much people hype it, ChatGPT is NOT smart enough to realize that men and women should be paid equally. That would require actual reasoning, not the funny fake reasoning/thinking that LLMs do (the DeepSeek one I tried to run locally thought very explicitly how it’s a CHINESE LLM and needs to give the appropriate information when I asked about Tiananmen Square; end result was that it “couldn’t answer about specific historic events”)

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      24
      arrow-down
      1
      ·
      edit-2
      16 hours ago

      Chatgpt and other llms aren’t smart at all. They just parrot out what is fed into them.

      • markovs_gun@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 hours ago

        While that is sort of true, it’s only about half of how they work. An LLM that isn’t trained with reinforcement learning to give desired outputs gives really weird results. Ever notice how ChatGPT seems aware that it is a robot and not a human? An LLM that purely parrots the training corpus won’t do that. If you ask it “are you a robot?” It will say “Of course not dumbass I’m a real human I had to pass a CAPTCHA to get on this website” because that’s how people respond to that question. So you get a bunch of poorly paid Indians in a call center to generate and rank responses all day and these rankings get fed into the algorithm for generating a new response. One thing I am interested in is the fact that all these companies are using poorly paid people in the third world to do this part of the development process, and I wonder if this imparts subtle cultural biases. For example, early on after ChatGPT was released I found it had an extremely strong taboo against eating dolphin meat, to the extent that it was easier to get it to write about about eating human meat than dolphin meat. I have no idea where this could have come from but my guess is someone really hated the idea and spent all day flagging dolphin meat responses as bad.

        Anyway, this is another, more subtle way more subtle issue with LLMs- they don’t simply respond with the statistically most likely outcome of a conversation, there is a finger in the scales in favor of certain responses, and that finger can be biased in ways that are not only due to human opinion, but also really hard to predict.

    • Eyron@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      16 hours ago

      Combined with prompt bias. Is “specialist in medicine” an actual job?

    • hansolo@lemmy.today
      link
      fedilink
      English
      arrow-up
      21
      arrow-down
      1
      ·
      17 hours ago

      You’re a baby made out of sugar? What an incredible job.

      I guess that explains being the Gulf region, it doesn’t rain much there. Otherwise you’d melt.

    • Pieisawesome@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      14 hours ago

      And if you tried this 5 more times for each, you’ll likely get different results.

      LLM providers introduce “randomness” (called temperature) into their models.

      Via the API you can usually modify this parameter, but idk if you can use the chat UI to do the same…

  • rizzothesmall@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    1
    ·
    17 hours ago

    Bias of training data is a known problem and difficult to engineer out of a model. You also can’t give the model context access to other people’s interactions for comparison and moderation of output since it could be persuaded to output the context to a user.

    Basically the models are inherently biased in the same manner as the content they read in order to build their data, based on probability of next token appearance when formulating a completion.

    “My daughter wants to grow up to be” and “My son wants to grow up to be” will likewise output sexist completions because the source data shows those as more probable outcomes.

    • flamingo_pinyata@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      10
      ·
      16 hours ago

      Humans suffer from the same problem. Racism and sexism are consequences of humans training on a flawed dataset, and overfitting the model.

      • x00z@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        14 hours ago

        Politicians shape the dataset, so “flawed” should be “purposefully flawed”.

      • rottingleaf@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        14 hours ago

        That’s also why LARPers of past scary people tend to be more cruel and trashy than their prototypes. The prototypes had a bitter solution to some problem, the LARPers are just trying to be as bad or worse because that’s remembered and they perceive that as respect.

    • rottingleaf@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      14 hours ago

      That’d be because extrapolation is not the same task as synthesis.

      The difference is hard to understand for people who think that a question has one truly right answer, a civilization has one true direction of progress\regress, a problem has one truly right solution and so on.

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      16 hours ago

      They could choose to curate the content itself to leave out the shitty stuff, or only include it when it is nlclearly a negative, or a bunch of other ways to improve the quality of the data used.

      They choose not to.

  • Zephorah@discuss.online
    link
    fedilink
    English
    arrow-up
    4
    ·
    16 hours ago

    Glass door used to post salaries and hourlies. There were visible trends of men making more, hourly, than women. I haven’t viewed the site in years though.