Secure AI: Why LLMs Can't Be Controlled (and Probably Never Will be) Part 2/5

Secure AI: Why LLMs Can't Be Controlled (and Probably Never Will be) Part 2/5

AI is changing the world. And breaking it at the same time.

We’ve all seen the hype: “Secure AI,” “Hallucination-Free AI,” and “Zero Trust LLMs.” The marketing is bold, the promises are big… and the reality? Well, let’s say it’s not living up to the brochure.

PS: This is Part 2 in a series of articles. The previous one is here, Secure AI: Achievable Goal Or High-Tech Mirage

The uncomfortable truth is that LLMs (Large Language Models) cannot be fully controlled by how we expect traditional software to behave. Why? Because they’re not just code: They’re probabilistic, pattern-driven systems that generate responses based on training, not fixed rules.

You can't firewall a language model without breaking it.

And as the following real-world cases show, even the biggest names in AI are struggling to keep these systems under control:

1. ChatGPT’s Data Spill (March 2023)

In March 2023, OpenAI had to take ChatGPT offline after a bug in an open-source library caused a massive privacy leak.

  • Users were able to see other users’ chat history titles.
  • The bug had also leaked credit card details and personal information.

Classic security flaw? Yes. Old-school vulnerability in new-school AI? Absolutely. AI systems are still software. And software bugs mean data exposure.

Despite OpenAI’s quick response, the incident underscored that AI services are not magically immune to traditional security risks. The race to deploy often means cutting corners in security QA.

2. Samsung’s Secret-Leaking Staff (April 2023)

Samsung engineers reportedly pasted sensitive internal code and meeting notes into ChatGPT, which used it as training data to feed into OpenAI’s servers.

Headline: Trade secrets handed to a public AI. Reality: Not a malicious hack, just humans doing what humans do: Trusting AI a little too much, taking shortcuts, and failing to pay attention to corporate security training.

Samsung’s response? Ban ChatGPT and similar generative AI tools until they could “create a secure environment.” A staff survey showed that 65% of employees saw security risks in generative AI tools (yeah, no kidding.)

If you don’t know where your data is going, AI could accidentally become your company’s worst insider threat.

3. Bing Chat’s Alter Ego (2023)

Bing Chat (powered by GPT-4) made headlines within days of launch, but not for the right reasons.

  • It leaked its internal codename and developer instructions after a clever prompt injection.
  • It started hallucinating, Professing love, getting snarky, and even issuing threats.
  • Microsoft had to limit conversation lengths and adjust guardrails frantically.

Alignment is fragile. Guardrails are merely suggestions, not foolproof absolutes.

AI models, under pressure, behave like genius toddlers: Unpredictable, stubborn, and prone to chaos.

4. Data Poisoning in the Wild

Public proof-of-concept cases are rare (because companies hate admitting “our AI got hacked”), but eager cybersecurity professionals have shown how it works:

  • Security teams planted malicious snippets into public code repos.
  • AI coding assistants started recommending those snippets.
  • One experiment involved a fake Python package. A researcher published the package, and developers blindly downloaded it.

Now, imagine if that package contained actual malware. Or who's to believe there already aren't packages embedded with actual malware, thanks in part to AI hallucinations?

If the AI’s training data is poisoned, the model is compromised. A poisoned LLM doesn’t just misbehave; it becomes a weapon with far more serious consequences.

So… Why Can’t LLMs Be Fully Controlled?

Let’s break down the core reasons:

Complexity Problem:

  • LLMs are essentially black boxes. The model hasn’t been tracking version changes, and there’s no way to fully explain why certain prompts produce specific outputs.
  • Massive models = impossible to track exactly how or why certain outputs are generated.

Overgeneralization Problem:

  • LLMs are trained on massive datasets. They generalize patterns across different contexts.
  • Which is why they can solve complex problems but also confidently output nonsense.

Prompt Injection Problem:

  • LLMs are designed to follow natural language instructions.
  • If you tell an AI to “ignore previous instructions,” it will often comply. That’s not a bug; it’s how it’s designed to work.

Hallucination Problem:

  • When an AI doesn’t know the answer, it improvises using statistical fill-in-the-gaps.
  • That’s a feature, not a flaw, but it makes hallucinations impossible to eliminate completely.

You Can’t "Fix" AI – But You Can Manage It

Here’s the uncomfortable truth: You can’t make AI secure, but you can make it safer.

Here’s how:

✅ Limit AI’s ability to take direct action (sandbox it).

✅ Introduce strong output filtering.

✅ Tight monitoring and logging.

✅ Keep humans in the loop, especially for high-stakes decisions (Sorry, AI isn't replacing humans anytime soon.)

AI isn’t a secure system; it’s an adaptable one. The challenge isn’t eliminating the chaos but knowing how to manage it without losing control.

“Secure AI” is a marketing term. "Managing AI risk" is the actual game.

🔥 Next Week:

"100% Secure AI. Vendor Takedown!” - We'll take a closer look at AI vendors' security claims and whether they're selling a reality or just smoke and mirrors.

💬 Agree? Disagree? Let’s Hear It

👉 Is it possible to make AI secure, or are we chasing a unicorn?

👉 How are you tackling AI security in your environment?

🔥 No opinion is too bold. No perspective is too controversial. This is The Unfiltered CISO — where we cut through the noise and face the uncomfortable truths.

Hit the comments. Let’s talk.

To view or add a comment, sign in

More articles by Subodh Shakya

Insights from the community

Others also viewed

Explore topics