Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

General Discussion

In reply to the discussion: Bernie vs. Claude [View all]

scipan

(3,102 posts)
7. I got more specific answers for some reason, and I think it's interesting.
Tue Mar 24, 2026, 05:31 PM
Mar 24

(bold is mine)
Me: I want to know just how much information AI collects is being used, and what would surprise most Americans about how that information is being used

This is a genuinely important topic, and the gap between what's actually happening and what most people assume is significant. Let me break it down.
What's being collected
The scope is wider than most people realize. Web scrapers vacuum up data from blogs, personal web pages, LinkedIn profiles, voter-registration databases, government pages, Wikipedia, Reddit, and academic institutions — plus pirated-content compilations and web archives that often contain data removed from its original source. (Scientific American) One striking example: an artist discovered that a private diagnostic medical image of hers had been included in the LAION training database, and the same dataset contained medical-record photographs of thousands of other people. (Scientific American)
Some crawlers can even get past paywalls by disguising themselves behind paid accounts. (Scientific American)
Your conversations and prompts
Many platforms harvest user inputs to refine their models. Users, (Captain Compliance) for the most part, don't understand what data is being used, how it's stored, or who might have access to it.
Most platforms share prompts with third parties — including service providers, legal authorities, and affiliated companies. Microsoft and Meta allow sharing with advertisers or affiliates under broader terms. No platform offers a way for users to remove their personal data from existing training sets. (Dataconomy)
On opt-outs: platforms like Gemini, DeepSeek, Pi AI, and Meta AI do not appear to provide a way to opt out of training use. (Dataconomy)
The "big tech" platforms are the worst offenders
Meta AI landed at the bottom of privacy rankings, with Gemini (Google) and Copilot (Microsoft) not far behind. Key concerns included that these platforms collect precise location, contact details, and usage data — particularly through their mobile apps. (Captain Compliance)
Meta's AI tool provides no means for its users to opt out, while Google and LinkedIn at least offer an opt-out mechanism. (Al Jazeera)
What would surprise most Americans
A few things stand out:
1. It extends far beyond chatbots. AI-powered predictive systems are being used to screen job candidates, with employers using them to decide who gets interviewed — with documented bias problems, including Amazon's own hiring tool that turned out to discriminate against women. (Stanford HAI)
2. Facial recognition is now in your doorbell. Amazon rolled out AI-powered facial recognition for Ring doorbells, allowing users to identify frequent visitors through stored facial profiles — though it's unavailable in states with strict biometric laws like Illinois and Texas. (Crescendo)
3. Most people already sense something is wrong. 70% of Americans have little to no trust in companies to make responsible decisions about how they use AI in their products, and 80% feel uneasy about AI being used to give medical or health advice. (Digital Silk) Yet the regulatory framework hasn't caught up.
4. Deletion isn't really deletion. Scraped databases do not go away (Stanford HAI) — even if you delete a post or a photo, if it was already crawled into a training dataset, it persists in the model's weights in ways that can't be undone.
The honest bottom line: it is basically impossible for people using online products or services to escape systematic digital surveillance across most facets of life — and AI may make matters even worse. (Stanford HAI) The data collection problem predates AI, but AI massively amplifies it by making that data operationally useful in ways it never was before


I asked it about seeming contradictory statements regarding which AIs offer an opt out, and it replied that they came from 2 different sources probably referring to different things. You can't opt out if it's already in their training data. (A good example of why you should be careful of what it tells you!)

I just thought it was interesting how I got such a different answer. But claude doesn't seem to sugar coat the problem in answering to Bernie or me.

Recommendations

0 members have recommended this reply (displayed in chronological order):

Bernie vs. Claude [View all] Quixote1818 Mar 24 OP
Don't like a thing about this. Tells him what he wants to hear. An Infamous MAGA get same answers to same questions? IA8IT Mar 24 #1
Bernie, like most everyone here, understands the implications and societal costs of AI. Gaugamela Mar 24 #3
The Waterboy was released in 1998 IA8IT Mar 24 #4
How would 2 people.... RussBLib Mar 24 #5
Large Language Models are really non-deterministic (semi-random) number generators HesNotHere Mar 24 #8
bernie is asking the right questions rampartd Mar 24 #2
Thank you Bernie, for taking on this subject. AI is being pushed down our throats and Marie Marie Mar 24 #6
I got more specific answers for some reason, and I think it's interesting. scipan Mar 24 #7
LLMs are non-deterministic HesNotHere Mar 24 #9
I agree that it doesn't "know" anything, but it does alot more than just predicting the next token. scipan Mar 24 #10
No, it doesn't. HesNotHere Mar 24 #11
Number generator? Even parallel processors work in binary numbers, don't they? scipan Mar 24 #12
I disagree about its intent. scipan Mar 24 #13
Training is not programming. HesNotHere Mar 24 #14
It's a form of programming. Training is probably a better word. scipan Mar 25 #17
Training is not programming HesNotHere Mar 25 #20
BTW, if people can come to a real understanding of what is happening under the hood... HesNotHere Mar 24 #15
I know, it's scary nt scipan Mar 25 #18
Last thing....if you use your cellphone to ask ChatGPT what the solution is Fermi's Paradox... HesNotHere Mar 24 #16
Also no warp drive or Crucible scipan Mar 25 #19
I play with it and work with it too HesNotHere Mar 25 #21
Hell, give it an arm, and a goat, and maybe a few more updates. scipan Mar 25 #23
Side note, warp drive misses the point of the sad joke of the universe HesNotHere Mar 25 #22
Yeah it's not looking good. nt scipan Mar 25 #24
Latest Discussions»General Discussion»Bernie vs. Claude»Reply #7