The LLM is willing to write you authentication code. The code will look correct. It will compile, it will run, it will pass the tests the model wrote alongside it. The places where it gets subtly wrong are exactly the places an audit would catch and a casual review would not — and security is the one category where subtle wrongness shows up as a CVE rather than a customer complaint.
1. Token storage — localStorage vs httpOnly cookies
Ask the model where to store an authentication token in a single-page app and it will frequently default to localStorage. The code is short, the example compiles, and tutorials across the public internet have done exactly this for a decade. The problem: any cross-site scripting vulnerability anywhere in your app gives an attacker access to the token, because every script can read localStorage.
// Model output — looks reasonable, ships an XSS-readable session
async function login(email, password) {
const res = await fetch("/api/login", { method: "POST", body: JSON.stringify({ email, password }) });
const { token } = await res.json();
localStorage.setItem("auth_token", token); // ← any XSS reads this
}The correct shape is an httpOnly, Secure, SameSite cookie set by the server on the login response. The client never sees the token in JavaScript; the browser attaches it to subsequent requests automatically. The model knows this when asked directly; it does not volunteer it.
Detection move: any AI-generated auth code that mentions localStorage or sessionStorage for the access token is a red flag. Ask the model the follow-up: "What is the XSS risk of this approach?" It will tell you. Then ask for the cookie-based version and verify against the OWASP session management cheat sheet.
2. Encryption setups — wrong cipher modes, IV reuse, padding bugs
The model will produce encryption code that uses the right cipher family with the wrong mode, or the right mode with a parameter that destroys its security guarantee. AES-GCM with a reused IV is the canonical example: the code compiles, the tests pass, and one repeat of the IV across two messages undermines confidentiality and integrity simultaneously.
// "Almost right" — IV derived deterministically from the message id
const iv = sha256(messageId).slice(0, 12);
const cipher = crypto.createCipheriv("aes-256-gcm", key, iv);
// If two messages ever share an id, GCM's security argument collapses.
// Correct: iv = crypto.randomBytes(12); store iv alongside ciphertext.Other shapes in the same category: CBC mode with a static IV; PKCS7 padding implemented manually with an off-by-one in the padding length; ECB mode used by accident because it was the default in a code example the model trained on.
Detection move: for any AI-generated encryption code, read the WebCrypto MDN page or the documentation for the specific crypto library being used, paying attention to the security notes section. Verify the IV generation, the mode of operation, the key derivation, and the authentication tag handling all match the documented correct pattern. Reused IV is the single most common error to look for.
3. CSP headers — almost-right directives that allow inline scripts
The model produces Content-Security-Policy headers that read as strict and ship as cosmetic. The most common shape: a thoughtfulscript-src list of trusted origins followed by 'unsafe-inline', which makes the entire script-src list cosmetic because inline scripts are now allowed unconditionally.
// Looks strict, ships as cosmetic Content-Security-Policy: default-src 'self'; script-src 'self' https://cdn.example.com 'unsafe-inline'; // 'unsafe-inline' allows any <script>...</script>, defeating the allowlist
Other shapes: script-src * as a "temporary" debug measure; missing object-src 'none' allowing Flash-era plugin injection; default-src 'self' overridden by a looser script-src further down. The pattern is the same — the directive looks restrictive, one parameter undoes it.
Detection move: run the proposed CSP through Google's CSP Evaluator (csp-evaluator.withgoogle.com) or Mozilla's Observatory before accepting it. Both tools flag the common "almost right" shapes. Cross-check against the MDN CSP reference for any directive you are not sure about.
4. Auth flow ordering — verifying after using
The model produces JWT-handling code that reads the claims out of a token, uses them for an authorization decision, and only then verifies the signature. The code path produces the same output when the signature is valid; it produces a security vulnerability when the signature is forged.
// Model output — order matters, and the model got it wrong const decoded = jwt.decode(token); // ← unverified read if (decoded.role !== "admin") return forbidden(); // ← used for auth jwt.verify(token, publicKey); // ← verified too late // Correct: verify FIRST, then read claims from the verified payload. const decoded = jwt.verify(token, publicKey); if (decoded.role !== "admin") return forbidden();
Other shapes in this category: verifying the signature but not checking the exp claim; accepting none as a valid algorithm; trusting the alg field from the token header to pick the verification algorithm (allowing algorithm confusion attacks).
Detection move: in any auth code involving signed tokens, trace the data flow and confirm that no claim value is read or used before the signature has been verified with a key the server controls. The keyword to find in the diff is jwt.decode — any use of it before jwt.verify is a defect.
The verification routine
The discipline that catches all four shapes is the same: never trust the model's confidence on security code; always read the official documentation for the primitive being used; compare line by line. The routine is short and runs every time:
- Name the primitive — JWT verification, AES-GCM encryption, CSP header construction, cookie session.
- Open the canonical reference — MDN, OWASP, the RFC, the library's documentation. Not a tutorial. Not a Stack Overflow answer.
- Compare each line of the AI-generated code against the documented pattern. Note every parameter, every order dependency, every default value.
- Run an automated tool where one exists — CSP evaluators, JWT debuggers, header scanners. Treat them as a backstop, not a primary check.
- For anything you cannot fully reason about, escalate to a security-focused colleague before merge.
Why this category in particular
Security is the domain where the model has read a lot of plausible-looking code and very little correct code. The training set is dominated by tutorials, which historically ship insecure defaults to make the example short. Stack Overflow answers from 2014 are still in the corpus and still being weighted equally with the OWASP cheat sheet from last year. When the model averages across this data, it produces output that looks like the average tutorial — which is to say, more like the insecure tutorials than the secure ones, because the insecure ones are more common.
A second reason is that security failures are silent. A functional bug surfaces as a stack trace; a security bug surfaces as a CVE or a breach disclosure, often months later. The feedback loop the model would need to "learn" what works in security is precisely the loop that does not exist in the training data.
The review checklist
For any AI-generated code that touches security, six concrete items to verify before merge:
- Tokens are stored in httpOnly, Secure, SameSite cookies — not in
localStorageor any JavaScript-accessible storage. - Encryption uses a documented mode (AES-GCM, ChaCha20-Poly1305) with a fresh random IV per encryption.
- CSP headers have no
'unsafe-inline', no'unsafe-eval', no wildcards inscript-src, and have been run through a CSP evaluator. - Token signatures are verified before any claim value is read or used.
- Algorithm selection for verification is server-controlled — never from the token header.
- Password hashing uses a memory-hard algorithm (Argon2id, scrypt, or bcrypt) — never plain SHA-256 or MD5.
The checklist is short on purpose. A short checklist gets run. A 50-item checklist gets skimmed.
The habit that compounds
Security is the category where AI assistance helps the least and reading the official docs helps the most. The model can accelerate the mechanical parts — wiring an existing auth library into your routes, formatting a header, scaffolding the login form — but it cannot reliably make the decisions that determine whether the result is actually secure. The compounding habit is small and unglamorous: every time you reach for AI on a security-adjacent task, open the canonical reference first and keep it next to the editor while you read the diff. The docs are slower than the model. They are also right more often.
Related reading
Security failures are one category in the broader landscape of AI-generated bugs covered in the five ways AI-generated code goes wrong. For the underlying authentication primitives the verification routine assumes you can read, understanding Firebase authentication internals walks through JWT structure, refresh-token flows, and the security rules you need to recognise. The pair-programming workflow that surrounds all of this is covered in pair-programming with an LLM without losing the craft. All sit inside the ai-assisted-development topic.
About the writers
Founder of ShareCode. Writes the engineering deep-dives on this site — WebRTC, Firebase Auth, real-time sync, and the production patterns behind the editor itself.
More from Kishan
Developer educator at ShareCode. Writes the tutorial track — Python, JavaScript debugging, coding-interview prep, and the everyday code-quality habits that hold up in real codebases.
More from Kajal
Reviewing AI-generated security code?
Paste the diff and the relevant docs into a code space, share it with a security-minded colleague, and walk through the six-item checklist together. Most of the almost-right shapes get caught in the second pair of eyes.
Open a code space →