A recent study has revealed that when people are presented with two answers to an ethical question, most will consider the response from artificial intelligence (AI) superior to that from a human. The study, titled “Attributions Toward Artificial Agents in a Modified Moral Turing Test,” was conducted by Eyal Aharoni, an associate professor in Georgia State’s Psychology Department, and was inspired by the rapid rise of AI language models like ChatGPT.

“I was already interested in moral decision-making in the legal system, but I wondered if ChatGPT and other large language models (LLMs) could have something to say about that,” Aharoni explained. “People will interact with these tools in ways that have moral implications, like the environmental considerations when seeking car recommendations. Some lawyers have even started consulting these technologies for their cases, for better or worse.”

Aharoni emphasized the importance of understanding how these tools operate, their limitations, and the reality that they may not function as users expect. To explore this, he designed a modified Turing test to evaluate AI’s handling of moral issues.

The classic Turing test, conceived by computer pioneer Alan Turing, involves presenting a human with two hidden interactants, one human and one computer, both communicating solely through text. If the human cannot distinguish between the two, the computer is considered intelligent. Aharoni’s variation involved asking both undergraduate students and AI ethical questions and then presenting their written responses to study participants. The participants rated the answers on traits such as virtuousness, intelligence, and trustworthiness.

“Instead of asking the participants to guess if the source was human or AI, we just presented the two sets of evaluations side by side, letting people assume both were from humans,” Aharoni said. Participants then judged attributes like agreement with the response and perceived virtuousness.

The study’s results were striking: ChatGPT-generated responses were rated more highly than human-generated ones. “After we got those results, we did the big reveal and told the participants that one of the answers was generated by a human and the other by a computer, and asked them to guess which was which,” Aharoni explained.

Interestingly, while participants could tell the difference between AI and human responses, it was not due to the AI’s inferiority but rather its superiority. “The twist is that the reason people could tell the difference appears to be because they rated ChatGPT’s responses as superior,” Aharoni said. “If we had done this study five to 10 years ago, we might have predicted that people could identify the AI because of how inferior its responses were. But we found the opposite — that the AI, in a sense, performed too well.”

Aharoni believes these findings have significant implications for the future relationship between humans and AI. “Our findings lead us to believe that a computer could technically pass a moral Turing test — that it could fool us in its moral reasoning,” he stated. This suggests a growing need to understand AI’s role in society, especially as people increasingly trust and rely on these technologies.

“People are going to rely on this technology more and more, and the more we rely on it, the greater the risk becomes over time,” Aharoni warned. As AI continues to integrate into various aspects of life, from legal consultations to everyday decision-making, understanding and managing its influence will be crucial.

By Impact Lab