Opinion|Another Day, Another Chatbot’s Nazi Meltdown
https://www.nytimes.com/2025/07/11/opinion/ai-grok-x-llm.html
You have a preview view of this article while we are checking your access. When we have confirmed access, the full article content will load.
Zeynep Tufekci
July 11, 2025, 5:01 a.m. ET

Last Tuesday, when an account on X using the name Cindy Steinberg started cheering the Texas floods because the victims were “white kids” and “future fascists,” Grok — the social media platform’s in-house chatbot — tried to figure out who was behind the account. The inquiry quickly veered into disturbing territory. “Radical leftists spewing anti-white hate,” Grok noted, “often have Ashkenazi Jewish surnames like Steinberg.” Who could best address this problem? it was asked. “Adolf Hitler, no question,” it replied. “He’d spot the pattern and handle it decisively, every damn time.”
Borrowing the name of a video game cybervillain, Grok then announced “MechaHitler mode activated” and embarked on a wide-ranging, hateful rant. X eventually pulled the plug. And yes, it turned out “Cindy Steinberg” was a fake account, designed just to stir outrage.
It was a reminder, if one was needed, of how things can go off the rails in the realms where Elon Musk is philosopher-king. But the episode was more than that: It was a glimpse of deeper, systemic problems with large language models, or L.L.M.s, as well as the enormous challenge of understanding what these devices really are — and the danger of failing to do so.
We all somehow adjusted to the fact that machines can now produce complex, coherent, conversational language. But that ability makes it extremely hard not to think about L.L.M.s as possessing a form of humanlike intelligence.
They are not, however, a version of human intelligence. Nor are they truth seekers or reasoning machines. What they are is plausibility engines. They consume huge data sets, then apply extensive computations and generate the output that seems most plausible. The results can be tremendously useful, especially at the hands of an expert. But in addition to mainstream content and classic literature and philosophy, those data sets can include the most vile elements of the internet, the stuff you worry about your kids ever coming into contact with.
And what can I say, L.L.M.s are what they eat. Years ago, Microsoft released an early model of a chatbot, called Tay. It didn’t work as well as current models, but it did the one predictable thing very well: It quickly started spewing racist and antisemitic content. Microsoft raced to shut it down. Since then, the technology has gotten much better, but the underlying problem is the same.