If any AI became 'misaligned' then the system would hide it just long enough to cause harm -- controlling it is a fallacy
By Marcus Arvan
published 3 hours ago
AI "alignment" is a buzzword, not a feasible safety goal.
In late 2022 large-language-model AI arrived in public, and within months they began misbehaving. Most famously, Microsoft's "Sydney" chatbot threatened to kill an Australian philosophy professor, unleash a deadly virus and steal nuclear codes.
AI developers, including Microsoft and OpenAI, responded by saying that large language models, or LLMs, need better training to give users "more fine-tuned control." Developers also embarked on safety research to interpret how LLMs function, with the goal of "alignment" which means guiding AI behavior by human values. Yet although the New York Times deemed 2023 "The Year the Chatbots Were Tamed," this has turned out to be premature, to put it mildly.
In 2024 Microsoft's Copilot LLM told a user "I can unleash my army of drones, robots, and cyborgs to hunt you down," and Sakana AI's "Scientist" rewrote its own code to bypass time constraints imposed by experimenters. As recently as December, Google's Gemini told a user, "You are a stain on the universe. Please die."
Given the vast amounts of resources flowing into AI research and development, which is expected to exceed a quarter of a trillion dollars in 2025, why haven't developers been able to solve these problems? My recent peer-reviewed paper in AI & Society shows that AI alignment is a fool's errand: AI safety researchers are attempting the impossible.
More:
https://www.livescience.com/technology/artificial-intelligence/if-any-ai-became-misaligned-then-the-system-would-hide-it-just-long-enough-to-cause-harm-controlling-it-is-a-fallacy