According to a new study published in Scientific Reports, AI chatbots are more capable of evaluating social situations and producing quick solutions to challenging social problems than humans.
Researchers used Situational Judgement Test on chatbots like Microsoft CoPilot and Claude and found that they are better at behavioral responses than humans. AI chatbots are designed to understand context, process language and provide solutions to humans. They also provide mental health support and can perform verbal reasoning too. But no one could think that AI chatbots would be that good at understanding complex social situations and providing solutions for it.
The research was done by the Institute of Aerospace Machine and the author of the study, Justin M. Mittlelstädt, said that they apply different methods to diagnose different skills in LLMs to see which one can be suitable for astronauts and pilots. For the study, the Situational Judgement Test which is used to measure social competence of humans. Five AI chatbots Google Gemini, ChatGPT, Claude, you.com and CoPilot, and 276 humans were given some social scenarios for the research. The human participants were applicants for pilot positions and had high qualification and social skills.
109 human experts rated the responses by both humans and AI chatbots. The Situational Judgement Test was completed by AI ten times, and they were also asked to rate the effectiveness of each action they suggested in different scenarios. The results found that most AI chatbots performed as well as humans, with some AI chatbots performing better than human participants. Claude got the highest score in evaluating social situations, followed by CoPilot and then you.com. The results also found that when AI chatbots couldn't find the best solution or response, they often opted for the second most effective response, just like humans do. This shows that AI chatbots have some sense of reasoning and judgement too.
The study also showed how different AI chatbots have different reliability, with Claude being the most reliable and consistent. Gemini was also good but it showed some inconsistencies in its responses. The study was more about simulated scenarios than real world scenarios so a quantifiable comparison can be made.
Overall, this doesn't mean that AI chatbots are becoming better than humans. Human responses also vary culture to culture, and AI chatbots cannot understand it if they aren't specifically told to navigate a situation according to a cultural context. LLMs can help individuals with social skills development as they are good at imitating human responses in different scenarios and situations.
Image: DIW-Aigen
Read next: AI Chatbots Are Replacing Friends for Teens—Here’s Why Experts Are Worried
Researchers used Situational Judgement Test on chatbots like Microsoft CoPilot and Claude and found that they are better at behavioral responses than humans. AI chatbots are designed to understand context, process language and provide solutions to humans. They also provide mental health support and can perform verbal reasoning too. But no one could think that AI chatbots would be that good at understanding complex social situations and providing solutions for it.
The research was done by the Institute of Aerospace Machine and the author of the study, Justin M. Mittlelstädt, said that they apply different methods to diagnose different skills in LLMs to see which one can be suitable for astronauts and pilots. For the study, the Situational Judgement Test which is used to measure social competence of humans. Five AI chatbots Google Gemini, ChatGPT, Claude, you.com and CoPilot, and 276 humans were given some social scenarios for the research. The human participants were applicants for pilot positions and had high qualification and social skills.
109 human experts rated the responses by both humans and AI chatbots. The Situational Judgement Test was completed by AI ten times, and they were also asked to rate the effectiveness of each action they suggested in different scenarios. The results found that most AI chatbots performed as well as humans, with some AI chatbots performing better than human participants. Claude got the highest score in evaluating social situations, followed by CoPilot and then you.com. The results also found that when AI chatbots couldn't find the best solution or response, they often opted for the second most effective response, just like humans do. This shows that AI chatbots have some sense of reasoning and judgement too.
The study also showed how different AI chatbots have different reliability, with Claude being the most reliable and consistent. Gemini was also good but it showed some inconsistencies in its responses. The study was more about simulated scenarios than real world scenarios so a quantifiable comparison can be made.
Overall, this doesn't mean that AI chatbots are becoming better than humans. Human responses also vary culture to culture, and AI chatbots cannot understand it if they aren't specifically told to navigate a situation according to a cultural context. LLMs can help individuals with social skills development as they are good at imitating human responses in different scenarios and situations.
Image: DIW-Aigen
Read next: AI Chatbots Are Replacing Friends for Teens—Here’s Why Experts Are Worried