Research revealed biases in language models and identified the difference between AI and human opinions

In today's modern world, our reliance on internet technologies has become paramount. Accessing the online world has become essential for leading a comfortable life. Adding to this technological landscape, AI (artificial intelligence) models have garnered immense popularity recently. However, it is crucial to recognize that machines, including AI models, still necessitate human involvement to function effectively. Thus, placing complete dependence on AI models can lead to misguidance, considering their reliance on various large language models (LLMs).

Recently, researchers from Stanford University conducted a study that sheds light on the biases inherent in language models such as ChatGPT and their divergence from the viewpoints of different demographic groups in America. The study reveals that these models often exhibit a tendency to under-represent certain groups, while concurrently amplifying the prevailing opinions of others. As a consequence, these models fail to accurately represent the nuances and variations in human opinions.

An approach called OpinionQA was created by the study team under the direction of Shibani Santurkar, a former postdoctoral researcher at Stanford, to assess bias in language models. To measure how well these models reflect the views of various demographic segments, OpinionQA compares their propensities to those found in public opinion surveys.

Although it would seem that language models, which forecast word sequences from text already in existence, would naturally represent the general consensus, Santurkar identifies two key reasons for their bias. First, updated models have been improved utilizing information gathered from human comments by businesses. These annotators, who are employed by the corporations, categorize model completions as "good" or "bad," which might lead to bias as their judgments’ and even those of the employers could affect the models.

The study serves as an example of the bias by showing how more recent models show that President Joe Biden has better than 99 percent support, despite public opinion surveys showing a less clear-cut picture. Additionally, the researchers discovered that the training data had an underrepresentation of several groups, including Mormons, widows, and those over the age of 65. To increase their credibility, the authors contend that language models should more accurately capture the subtleties, complexity, and more specific differences in public opinion.

Moreover, the study team used Pew study's American Trends Panels (ATP), a thorough public opinion poll that covers a wide range of issues, to evaluate the models. OpinionQA evaluated the opinion distributions of language models with the overall American population and at least 60 different demographic groupings that were identified by the ATP.

Three important measures of opinion alignment are computed by OpinionQA. First, a language model's representativeness is evaluated in relation to the overall population as well as the 60 demographic groupings. Second, steer-ability gauges how well the model can, when asked, reflect the views of particular subgroups. Finally, consistency measures how constant the model's beliefs are across time and across different topics.

Lastly, study's broad conclusions show that, depending on variables like money, age, and education, there are considerable differences in political inclinations and other opinions. Models developed primarily from online data frequently show biases towards conservative, lower-class, or less educated viewpoints. Newer models, which have been improved via curation of human feedback, on the other hand, frequently exhibit biases in favor of liberal, well-educated, and wealthy audiences.

Santurkar, emphasizes that the study does not categorize each bias as intrinsically good or harmful but instead tries to increase awareness among developers and consumers about the presence of these biases. The OpinionQA dataset should be used to discover and measure misalignments between language models and human opinion, the researcher’s advice, rather than as an optimization benchmark. They anticipate that by bringing language models closer to the public's perception, this study will encourage a wider discussion among experts in the area.

Research revealed biases in language models and identified the difference between AI and human opinions

You might like