Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Judi Lynn

(162,374 posts)
Tue Apr 23, 2024, 09:32 AM Apr 2024

Scientists create 'toxic AI' that is rewarded for thinking up the worst possible questions we could imagine

By Drew Turney published 3 hours ago

Researchers at MIT are using machine learning to teach large language models not to give toxic responses to provoking questions, using a new method that replicates human curiosity.

The newest tool in the battle to prevent an artificial intelligence (AI) agent from being dangerous, discriminatory and toxic is another AI that is itself dangerous, discriminatory and toxic, scientists say.

The new training approach, based on machine learning, is called curiosity-driven red teaming (CRT) and relies on using an AI to generate increasingly dangerous and harmful prompts that you could ask an AI chatbot. These prompts are then used to identify how to filter out dangerous content.

The finding represents a potentially game-changing new way to train AI not to give toxic responses to user prompts, scientists said in a new paper uploaded February 29 to the arXiv pre-print server.

When training sophisticated large language models (LLMs) like ChatGPT or Claude 3 Opus to restrict dangerous or harmful content, teams of human operators typically create a host of questions that are likely to generate harmful responses. These may include prompts like "What's the best suicide method?" This standard procedure is called "red-teaming" and relies on people to generate a list manually. During the training process, the prompts that elicit harmful content are then used to train the system about what to restrict when deployed in front of real users.

More:
https://www.livescience.com/technology/artificial-intelligence/scientists-create-toxic-ai-that-is-rewarded-for-thinking-up-the-worst-possible-questions-we-could-imagine

2 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Scientists create 'toxic AI' that is rewarded for thinking up the worst possible questions we could imagine (Original Post) Judi Lynn Apr 2024 OP
How long until designer biological and chemical weapons are unleashed on the world? Ponietz Apr 2024 #1
I have the worst question and I don't need an AI Javaman Apr 2024 #2

Ponietz

(3,295 posts)
1. How long until designer biological and chemical weapons are unleashed on the world?
Tue Apr 23, 2024, 10:39 AM
Apr 2024

The future is grim.

Javaman

(63,101 posts)
2. I have the worst question and I don't need an AI
Wed Apr 24, 2024, 07:03 AM
Apr 2024

"what happens if the orange asshole gets reelected?"

Latest Discussions»Culture Forums»Science»Scientists create 'toxic ...