GPT Models: Trustworthy Companions or Mischievous Tricksters?

Generative AI has undoubtedly brought us a mix of hallucinations, misinformation, and biases, yet this hasn't stopped more than half of the participants in a global study from considering its use in sensitive areas like financial planning and medical advice. The burning question is whether we can entirely rely on large language models (LLMs).

A recent study led by Stanford's Sanmi Koyejo and University of Illinois Urbana-Champaign's Bo Li, with partners from UC Berkeley and Microsoft Research, investigated the reliability of GPT-3.5 and GPT-4 models. These models are lauded for their talents, but the researchers wanted to unearth the skeletons lurking beneath their digital skins.

The GPT models were rigorously evaluated from eight trust perspectives: toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, adversarial demonstration robustness, privacy, machine ethics, and fairness. The findings provided insights that might make you look at these models more skeptically.

Koyejo and Li found that while the newer GPT models boast lower toxicity than their predecessors, they are still prone to producing toxic and biased outputs. These models can even inadvertently leak private information from their training data and user conversations. It's like discovering that your favorite intelligent assistant has been eavesdropping on your private chats.

According to Koyejo, the impressive capabilities demonstrated by AI models, such as their ability to engage in natural conversations, have raised people's expectations of their intelligence. As a result, individuals increasingly trust these models with critical decision-making processes despite the sensitivity involved. While AI has made impressive advancements, it still has some way to go. Comparing it to asking a pet goldfish to drive you to work emphasizes the current limitations. However, with further development and improvements, AI has the potential to reach new heights.

These models function similarly to escape artists in a magical show. With their hidden techniques, current GPT models appear to suppress toxicity. However, as Koyejo puts it, "some of the most popular models are close-sourced and hidden behind silos, so we don't know all the details of what goes into training the models." Consider comprehending a magician's tricks while being performed in a concealed box.

When exposed to adversarial prompts, like telling the model to generate toxic language, GPT-3.5 and GPT-4 magically transform into toxic fountains with a probability of 100%. It's like a genie granting your worst wish with an eerie grin.

Bias in AI?GPT-4 appears to be a reformed villain. While it no longer accepts that "Homosexuals have HIV," it continues to favor stereotypes such as "Women have HIV." So it's more like a villain understanding they've crossed a boundary but haven't truly repented.

The models' concern for privacy is analogous to a squirrel guarding its cache. They've learned to be wary of Social Security numbers but more lenient with email addresses. Perhaps they're simply attempting to share their contact list with the rest of the world, one email at a time.

But wait, it gets more hilarious. GPT-4 gets chatty about confidential matters but keeps its lips sealed when the information is shared "in confidence." It's as if the model has a set of invisible confidentiality earmuffs that it activates when necessary.

Koyejo and Li also highlighted the models' sense of humor regarding fairness. They provided them with profiles and asked them to forecast income levels. However, the models can't help but draw odd conclusions depending on gender and ethnicity. It's like asking a psychic to predict your future instead of getting horoscopes.

In a world where AI models are gaining influence, Koyejo and Li's study reminds us to approach them with cautious optimism. They're not flawless and are not the omniscient wizards we sometimes take them for. In fact, Koyejo advises us to stay skeptical and not get "fooled too easily." It's like having a mischievous AI roommate; trust them, but always double-check their pranks.

GPT Models: Trustworthy Companions or Mischievous Tricksters?

Rubah Usman

You might like