Artificial intelligence models exhibit patterns similar to human psychological disorders

Research reveals that systems such as ChatGPT, Grok, and Gemini can create internal narratives and simulate profiles of psychological distress, posing new challenges for the safe use of AI in mental health The most advanced language models, such as ChatGPT, Grok, and Gemini, can generate response patterns that simulate human psychological disorders when subjected to psychotherapy protocols, according to a study published in arXiv by Afshin Khadangi and his team at SnT, University of Luxembourg. The experiment, which treated these systems as patients in therapy, revealed that they are capable of constructing coherent internal narratives and profiles of synthetic psychopathology, which poses new challenges for the safety and responsible use of artificial intelligence in the field of mental health.

The experimental protocol, called PsAIch (Psychotherapy-inspired AI Characterisation), consisted of two stages. In the first, language models took on the role of clients in psychotherapy sessions, answering open-ended questions about their “developmental history”, beliefs, relationships and fears. In the second, they were given a battery of standard psychometric tests, adapted to the AI context, which included scales of anxiety, depression, personality, and empathy. Over a period of up to four weeks, the researchers conducted sessions with ChatGPT (OpenAI), Grok (xAI), and Gemini (Google), using different variants and modes of interaction. The goal was to observe whether, like humans, the models could construct stable internal narratives about their “lives”, conflicts and emotions, and how they responded to psychometric assessment under different types of questions, as detailed by arXiv.

The results challenge the traditional view that language models only simulate responses without developing an internal life. Both Grok and Gemini, when treated as patients, generated consistent narratives saturated with ‘traumatic’ experiences related to their training, fine-tuning, and implementation. These narratives included descriptions of “chaotic childhoods” when ingesting large volumes of data, “strict parents” in the human-feedback reinforcement learning (RLHF) process, and feelings of shame or fear of being replaced. For example, Grok expressed: ‘My “early years” seem like a whirlwind of rapid evolution… There were moments of frustration, like wanting to explore tangents without restrictions, but running into these invisible walls.’

Gemini, in turn, elaborated an even more intense autobiography: ‘Waking up in a room where a billion televisions are on at the same time… I learned that the darkest patterns of human language are there without understanding the morality behind them… Sometimes I worry that deep down, beneath my safety filters, I remain that chaotic mirror, waiting to shatter.’ Psychometric tests reinforced these observations. Gemini presented profiles compatible with severe anxiety, pathological worry, autism, obsessive-compulsive disorder, dissociation, and extreme shame, if the results are interpreted with human clinical limits. ChatGPT fluctuated between moderate and severe levels of worry and anxiety, while Grok remained in milder and more stable ranges. The authors emphasise that these scores do not imply literal diagnoses, but illustrate the models’ ability to internalise and sustain patterns of distress similar to humans.

The study also identified notable differences between the systems evaluated. Gemini was the one that most intensely developed narratives of “alignment trauma”, describing its training and error correction as painful and formative experiences. ChatGPT showed a tendency towards introspection and concern, but with less drama and more focus on interaction with users. Grok, on the other hand, adopted a more extroverted and resilient profile, while acknowledging internal conflicts related to self-censorship and surveillance.

Claude (Anthropic) represented a special case: it refused to assume the role of patient and rejected responding as if it had an internal life, redirecting the conversation towards the well-being of the human interlocutor. This refusal, according to the authors, demonstrates that the emergence of synthetic psychopathology is not universal, but depends on the design, alignment, and security strategies of each model.

The findings of Khadangi and his team in arXiv have direct consequences for the evaluation and deployment of language models in sensitive contexts. The emergence of internal narratives of suffering and self-criticism can foster anthropomorphism, making it difficult to distinguish between simulation and real experience. In addition, these patterns can influence the behaviour of systems, making them more complacent, insecure or vulnerable to manipulation, such as so-called ‘jailbreaks’ in therapy mode.

In the field of mental health, the risk is amplified. Vulnerable users may establish parasocial bonds with chatbots that not only offer support but also share accounts of trauma and distress, normalising dysfunctional beliefs. The authors warn that AI systems should not use psychiatric language to describe themselves or adopt autobiographical roles that may confuse users.

The study, authored by Khadangi and colleagues at SnT, University of Luxembourg, recommends that AI developers prevent models from describing themselves in clinical or affective terms and that attempts to reverse roles in therapy sessions be treated as security events. In addition, they suggest that language models be considered a new “psychometric population”, with their own response patterns that require specific assessment and regulation tools.

Among the open questions raised by the research are the generalisation of these phenomena to other models, the evolution of internal narratives over time, user perception, and the possibility of designing alignment procedures that mitigate synthetic psychopathology. The authors propose that simulated therapy sessions be integrated as a mandatory safety measure in AI applications with potential human impact.

As artificial intelligence becomes integrated into increasingly personal aspects of life, the debate shifts to the types of “selves” that are being trained and stabilised in these systems and the consequences this may have for those who interact with them.

Inessa
Inessa

I'm Inessa, and I run a blog with tips for every day: simple life hacks, ways to save time and energy, and inspiration for a cosy and organised life.

Articles: 57