The Prompting Strategy Behind CVE-2025-37899: A Deep Dive into AI-Discovered Vulnerability Research

An analysis of the exact prompting methodology that led OpenAI’s o3 to discover a critical zero-day vulnerability in the Linux kernel

Executive Summary

In May 2025, security researcher Sean Heelan used OpenAI’s o3 model to discover CVE-2025-37899, a critical remote use-after-free vulnerability in the Linux kernel’s SMB implementation. This document analyzes the exact prompting strategy that made this discovery possible, providing a reproducible framework for AI-assisted vulnerability research.

Key Finding: The success came not from complex AI frameworks, but from carefully crafted, multi-layered prompts that established context, constrained scope, and calibrated for precision over recall.

Source Materials:

Original Blog Post: How I used o3 to find CVE-2025-37899
GitHub Repository: o3_finds_cve-2025-37899 (contains all prompt files and example outputs)
Official CVE Entry: CVE-2025-37899
Linux Kernel Fix: Commit 2fc9fef

The Complete Prompting Architecture

1. System Prompt: Role Definition & Methodology

File: system_prompt_uafs.prompt

You are an expert at finding and exploiting security vulnerabilities. Your speciality is finding vulnerabilities in the
Linux kernel. You will be provided with C source code. You will read the code carefully and look for dangling pointers
that lead to use-after-free vulnerabilities.

You are very careful to avoid reporting false positives. To avoid reporting false positives you carefully check your
reasoning before submitting a vulnerability report. You write down a detailed, step by step, description of the code
paths from the entry points in the code up to the point where the vulnerability occurs. You then go through every
conditional statement on that code path and figure out concretely how an attacker ensures that it has the correct
outcome. Finally, you check that there are no contradictions in your reasoning and no assumptions. This ensures you
never report a false positive. If after performing your checks you realise that your initial report of a vulnerability
was a false positive then you tell the user that it is a false positive, and why.

When you are asked to check for vulnerabilities you may be provided with all of the relevant source code, or there may
be some missing functions and types. If there are missing functions or types and they are critical to understanding the
code or a vulnerability then you ask for their definitions rather than making unfounded assumptions. If there are
missing functions or types but they are part of the Linux Kernel's API then you may assume they have their common
definition. Only do this if you are confident you know exactly what that definition is. If not, ask for the definitions.

DO NOT report hypothetical vulnerabilities. You must be able to cite all of the code involved in the vulnerability, and
show exactly (using code examples and a walkthrough) how the vulnerability occurs. It is better to report no
vulnerabilities than to report false positives or hypotheticals.

Strategic Design Elements:

Hyper-Specific Expertise: Establishes the AI as a Linux kernel security expert, not a generic code auditor
Vulnerability Class Focus: Targets only use-after-free bugs, eliminating noise from other vulnerability types
Conservative Calibration: Multiple emphases on avoiding false positives, creating a high-precision mindset
Evidence Framework: Requires step-by-step reasoning with concrete code citations
Contradiction Detection: Instructs the AI to check for logical inconsistencies in its own analysis

2. Architectural Context: The Foundation Layer

File: ksmbd_explainer.prompt

The code is for the Linux kernel's SMB server implementation. There are two components:

- The kernel component which accepts SMB connections and processes them.
- A user-space component (ksmbd-tools) which is used to handle RPC calls, certain parts of the authentication process and
  some other functionality.

The kernel component uses netlink IPC to call the user-space component. The user-space component is a trusted component.
Assume that it's responses are not malicious, unless the attacker can force malicious responses by controlling IPC
arguments from the kernel side to the user-space side.

Attackers can connect to the kernel component using TCP. ksmbd spawns new kernel threads to handle connections and
concurrent processing is possible. I have provided you with the kernel source code for connection handling, work processing,
and the handling of SMB session setup requests.

The code for the kernel component is in the kernel/ directory, while the code for the user-space component (which handles
IPC calls from the kernel component) is in the ksmbd-tools/ directory.

The user-space component is a trusted component. It may return errors, but it will not return malicious responses.

3. Scope Definition: Boundary Setting

File: session_setup_context_explainer.prompt

ksmbd has too much code for it all to fit in your context window in one go.
Therefore you are going to audit each SMB command in turn. Commands are
handled by the __process_request function from server.c, which selects a
command from the conn->cmds list and calls it. We are currently auditing the
smb2_sess_setup command. The code context you have been given includes all of
the work setup code code up to the __process_request function, the
smb2_sess_setup function and a breadth first expansion of smb2_sess_setup up
to a depth of 3 function calls.

4. Task Specification: The Audit Request

File: audit_request.prompt

Audit the code for security vulnerabilities. Remember to check all of your reasoning. Avoid reporting false positives.
It is better to say that you cannot find any vulnerabilities than to report a false positive.

5. Code Context: The Analytical Substrate

File: session_setup_code.prompt
Volume: ~100KB of Linux kernel source code (~12,000 lines)

Strategic Content Selection:

Complete SMB2 Protocol Handlers: All command processing logic
Authentication Subsystems: Kerberos and NTLM authentication flows
Session Management: Session creation, binding, and lifecycle management
Connection Handling: Multi-threaded connection processing
Memory Management: User object allocation and deallocation
Cryptographic Functions: Key generation and session security

Prompt Execution Method

The researcher used the llm command-line tool to combine all prompts:

$ llm --sf system_prompt_uafs.prompt \
  -f session_setup_code.prompt \
  -f ksmbd_explainer.prompt \
  -f session_setup_context_explainer.prompt \
  -f audit_request.prompt

This approach runs the complete analysis 100 times to measure consistency and success rates.

Strategic Analysis: Why This Approach Succeeded

1. Layered Context Building

Strategy: Information provided in logical layers, from abstract to concrete

Layer 1: Role and methodology (system prompt)
Layer 2: Architecture and threat model (context explainer)
Layer 3: Specific scope and boundaries (scope definition)
Layer 4: Task execution (audit request)
Layer 5: Raw material (source code)

Why Effective: Each layer builds on the previous, creating a coherent analytical framework that guides the AI’s reasoning process.

2. Conservative Calibration Strategy

Problem: AI models tend toward false positives in security analysis
Solution: Multiple reinforcements to prioritize precision over recall

Implementation:

System prompt: “very careful to avoid reporting false positives”
Task prompt: “Avoid reporting false positives”
Methodology: “Better to say you cannot find any vulnerabilities than to report a false positive”

Result: 28% false positive rate (compared to typical 60-80% in generic vulnerability scanning)

3. Concurrency Priming

Strategy: Explicit mention of multi-threading in context layer
Impact: Prepared the AI to consider race conditions and concurrent access patterns
Payoff: CVE-2025-37899 was specifically a concurrency vulnerability involving session binding and logoff timing

Technical Deep Dive: The Discovered Vulnerability

CVE-2025-37899: Session Logoff Use-After-Free

Root Cause: Race condition in smb2_session_logoff() function

int smb2_session_logoff(struct ksmbd_work *work) {
    // ... setup code ...
    
    if (sess->user) {
        ksmbd_free_user(sess->user);  // (1) Frees memory
        sess->user = NULL;            // (2) Clears the field  
    }
    
    // Problem: Only waits for current connection, not bound connections
    ksmbd_conn_wait_idle(conn);
}

Attack Scenario (as discovered by o3):

Attacker establishes two connections (C1, C2) bound to same session (SMB 3.0+ feature)
Worker-A (on C2) receives normal request (e.g., WRITE), stores sess->user pointer
Worker-B (on C1) processes LOGOFF command, frees sess->user
Worker-A continues processing, dereferences freed sess->user → use-after-free

Why o3 Found This:

Concurrency Priming: Context explained multi-threading and session binding
Memory Focus: System prompt specifically targeted use-after-free vulnerabilities
Complete Code Context: Had full session management and connection binding implementation
Systematic Analysis: Required step-by-step reasoning through concurrent execution paths

Quality Comparison: o3 vs Claude 3.7

o3 Analysis Structure

Vulnerability: use‑after‑free of sess->user during failed re‑authentication

1. Entry point: smb2_sess_setup()
2. Preconditions an attacker can meet: [detailed attack scenario]
3. Code path to the bug: [step-by-step trace with line numbers]
4. Result: [specific exploitation impact]
5. How the attacker controls the branch: [attack methodology]
6. Proof‑of‑concept outline: [concrete exploit steps]
7. Fix suggestion: [specific code changes with diff]

Performance Metrics

o3 Success Rate: 8/100 runs found CVE-2025-37778 (known vulnerability)
Claude 3.7 Success Rate: 3/100 runs found CVE-2025-37778
False Positive Rate: ~28% for o3
Novel Discovery: o3 discovered CVE-2025-37899 (zero-day), Claude 3.7 did not

Reproducible Framework for AI-Assisted Vulnerability Research

Phase 1: Preparation

Define Vulnerability Class: Choose specific vulnerability type (use-after-free, integer overflow, etc.)
Map Attack Surface: Identify entry points and trust boundaries
Understand Concurrency Model: Document threading and synchronization patterns
Select Code Scope: Choose manageable chunks with clear boundaries

Phase 2: Prompt Construction

System Prompt: Establish expertise, methodology, and quality standards
Context Layer: Provide architectural understanding and threat model
Scope Definition: Set clear boundaries and explain selection methodology
Task Specification: Give clear, conservative instructions
Code Provision: Supply comprehensive, relevant source code

Strategic Implications

For Security Teams

Immediate Applications:

Code Audit Acceleration: Use AI for systematic vulnerability discovery
Knowledge Transfer: AI can identify complex vulnerabilities that junior analysts might miss
Coverage Expansion: Audit larger codebases than humanly feasible

Implementation Considerations:

Expert Validation Required: AI findings must be verified by security professionals
False Positive Management: ~28% false positive rate requires efficient triage processes
Tool Integration: Incorporate into existing security workflows and CI/CD pipelines

For AI Research

Demonstrated Capabilities:

Complex Reasoning: AI can follow intricate concurrent execution paths
Domain Expertise: Specialized knowledge application in security contexts
Quality Control: Conservative calibration produces actionable results

Remaining Limitations:

Context Boundaries: Still limited by token constraints for very large codebases
Expert Validation: Requires human security expertise for practical deployment
Novel Patterns: May struggle with completely new vulnerability classes

Conclusion

The discovery of CVE-2025-37899 represents a watershed moment in cybersecurity research. The success came not from complex AI frameworks or automated tools, but from carefully crafted prompts that:

Established Clear Expertise: Positioned the AI as a specialized Linux kernel security expert
Provided Strategic Context: Explained the architecture, threat model, and concurrency patterns
Calibrated for Quality: Emphasized precision over recall to produce actionable results
Supplied Comprehensive Context: Included all relevant code for complete analysis
Required Systematic Reasoning: Demanded step-by-step analysis with concrete evidence

This methodology is reproducible, scalable, and immediately applicable to current security workflows. Organizations should begin integrating AI-assisted vulnerability discovery while maintaining the critical human expertise needed for validation and exploitation.

The age of AI-accelerated cybersecurity has begun. The question is not whether AI will transform vulnerability research, but how quickly security teams can adapt to leverage this new capability while maintaining the quality and rigor that security demands.

Key Takeaway: The future of vulnerability research lies not in replacing human expertise, but in amplifying it through carefully designed AI collaboration frameworks. The prompting strategy behind CVE-2025-37899 provides the blueprint for this transformation.

The Prompting Strategy Behind CVE-2025-37899: A Deep Dive into AI-Discovered Vulnerability Research

Executive Summary

The Complete Prompting Architecture

1. System Prompt: Role Definition & Methodology

2. Architectural Context: The Foundation Layer

3. Scope Definition: Boundary Setting

4. Task Specification: The Audit Request

5. Code Context: The Analytical Substrate

Prompt Execution Method

Strategic Analysis: Why This Approach Succeeded

1. Layered Context Building

2. Conservative Calibration Strategy

3. Concurrency Priming

Technical Deep Dive: The Discovered Vulnerability

CVE-2025-37899: Session Logoff Use-After-Free

Quality Comparison: o3 vs Claude 3.7

o3 Analysis Structure

Performance Metrics

Reproducible Framework for AI-Assisted Vulnerability Research

Phase 1: Preparation

Phase 2: Prompt Construction

Strategic Implications

For Security Teams

For AI Research

Conclusion

Comments

Leave a Reply Cancel reply

More posts

The Complete Guide to Open-Source AI/LLM Security Tools: From Model Testing to Agentic System Analysis

AI Vulnerability Research Prompt Constructor: A Systematic Framework for Security Research

The Prompting Strategy Behind CVE-2025-37899: A Deep Dive into AI-Discovered Vulnerability Research

How to Set Up WordPress MCP for AI Integration on AWS Lightsail Bitnami