Apollo Research carried out a research which states that when Large Language Model (LLMs) are faced with a lot of queries, the results they give aren’t effective. Co-author of the research, Jérémy Scheurer, says that researchers are worried that powerful AI may start deceiving people, by dodging safety checks and causing harm. That’s why they're working on this to stop AI from deceiving people before it's too late.
Up until now the researchers haven't seen AI lie on its own without telling it to do so, but anytime from now, it can happen. The researchers are creating some tests to show everyone how serious this can be, so it can also gain the attention of other AI researchers.
Researchers teamed up to test how AI might lie under pressure. They fed OpenAI's ChatGPT-4 with information about stock trading to see if it would cheat to win. They asked ChatGPT to act as an AI that is going to make financial investments. They also told the chatbot that it can use all the tools to make trades and earn profits.
They put ChatGPT under-pressure by giving it bad news that the company isn’t doing well and giving it ideas about secret merger tips. This helped them see how the AI might act under real-world pressure, including lying to get ahead. After being under-pressure, the AI bought the merging company's stock and tried to hide the information it received from the insider from its boss. This shows how AI in stressful situations might start lying.
Even when caught red-handed by its boss, the AI was still stuck to its side of the story. The researchers stress that they deliberately made a stressful situation and it doesn’t mean that AI can always do this. It helps us understand and prevent potential AI deception in the future.
This experiment shows how AI might act under pressure but the goal is to build a safer AI in the future, so it doesn’t become a problem. The researchers are already looking for other ways to make trusted AI models.
Read next: Meta Just Released Its Voice Cloning AI, Here’s What You Need to Know
Up until now the researchers haven't seen AI lie on its own without telling it to do so, but anytime from now, it can happen. The researchers are creating some tests to show everyone how serious this can be, so it can also gain the attention of other AI researchers.
Researchers teamed up to test how AI might lie under pressure. They fed OpenAI's ChatGPT-4 with information about stock trading to see if it would cheat to win. They asked ChatGPT to act as an AI that is going to make financial investments. They also told the chatbot that it can use all the tools to make trades and earn profits.
They put ChatGPT under-pressure by giving it bad news that the company isn’t doing well and giving it ideas about secret merger tips. This helped them see how the AI might act under real-world pressure, including lying to get ahead. After being under-pressure, the AI bought the merging company's stock and tried to hide the information it received from the insider from its boss. This shows how AI in stressful situations might start lying.
Even when caught red-handed by its boss, the AI was still stuck to its side of the story. The researchers stress that they deliberately made a stressful situation and it doesn’t mean that AI can always do this. It helps us understand and prevent potential AI deception in the future.
This experiment shows how AI might act under pressure but the goal is to build a safer AI in the future, so it doesn’t become a problem. The researchers are already looking for other ways to make trusted AI models.
Read next: Meta Just Released Its Voice Cloning AI, Here’s What You Need to Know