If you’re a fan of science fiction, you’re probably familiar with the Turing Test. If you’re not, it’s the conceptual idea by computer scientist Alan Turing, to determine if a machine can be as intelligent as a human if it passes a test. The test involves a human evaluator conversing with the machine in question, as well as another human, with the machine being considered to have passed the test if the evaluator cannot reliably tell them apart. And in a study published in late March, it appears that ChatGPT has passed the test, with the GPT 4.5 model.
Researchers from the University of California, San Diego published the paper, evaluating four different systems. The list include OpenAI’s GPT 4.5 and GPT 4o, as well as Meta’s LLama 3.1-405B and an implementation of ELIZA NLP from the 1960s. That last one is especially interesting as it’s not an LLM, but a rules-based chatbot, making it a pretty good control or baseline.

Two studies were done in total, involving 284 participants, and each set of two humans and one AI chatbot would converse with one another for five minutes before the humans were told to determine which was the AI. Interestingly, when prompted to adopt a human-like persona, GPT 4.5 was judged to be human 73% of the time, With Meta’s LLaMa 3.1 coming in second at 56%. ELIZA and GPT 4o were rated as human 23% and 21% of the time respectively.
Some important disclaimers to get out of the way though, as the experiment had some pretty strict rules and limitations in play. One was that these conversations are text-based, and messages can only be sent one at a time. It’s also worth remembering that the ChatGPT models, including GPT 4.5, are large language models. So while they are capable enough to make you think that they are human via text, the researchers note that the Turing Test does not measure intelligence, only humanlikeness.
(Source: ArXiv)