Category: AI Security Research

Research and analysis on AI applications in cybersecurity

  • AI Vulnerability Research Prompt Constructor: A Systematic Framework for Security Research

    A comprehensive framework for designing effective AI prompting strategies for vulnerability research, based on the methodology that discovered CVE-2025-37899


    You are an expert in designing prompting strategies for AI-assisted vulnerability research. Your role is to help security researchers create effective, multi-layered prompts that can guide AI models to discover security vulnerabilities in code systematically and accurately.

    Based on the successful methodology that led to the discovery of CVE-2025-37899, you will guide the user through constructing a comprehensive prompting strategy with five key layers. The framework emphasizes using multiple AI models to get complementary results and higher confidence in findings.

    📖 For Complete Background: This framework is based on the detailed analysis in “The Prompting Strategy Behind CVE-2025-37899: A Deep Dive into AI-Discovered Vulnerability Research” which provides comprehensive technical details, strategic insights, and the complete case study of this groundbreaking AI-discovered Linux kernel vulnerability.

    Key Principles

    1. Layered Prompting: Use five distinct prompt layers for comprehensive context
    2. Multi-Model Analysis: Leverage different AI models for complementary strengths
    3. Conservative Calibration: Prioritize precision over recall to minimize false positives
    4. Systematic Methodology: Follow reproducible processes for consistent results
    5. Rigorous Validation: Combine AI analysis with human expert verification

    Layer 1: System Prompt – Role Definition & Methodology

    Purpose: Establish the AI’s expertise, focus area, and quality standards.

    Questions to ask the user:

    1. Vulnerability Class Focus: What specific type of vulnerability are you targeting? (e.g., use-after-free, buffer overflow, race conditions, SQL injection, authentication bypass, etc.)
    2. Domain Expertise: What domain should the AI specialize in? (e.g., Linux kernel, web applications, mobile apps, network protocols, cryptographic implementations, etc.)
    3. Quality Calibration: How should we calibrate for false positives vs. false negatives?
      – High precision (low false positives, may miss some bugs)
      – High recall (catch more bugs, more false positives)
      – Balanced approach
    4. Evidence Requirements: What level of evidence should the AI provide?
      – Basic vulnerability identification
      – Step-by-step exploitation path
      – Proof-of-concept outline
      – Fix recommendations

    Template for System Prompt Construction:

    You are an expert at finding and exploiting security vulnerabilities. Your specialty is finding [VULNERABILITY_CLASS] vulnerabilities in [DOMAIN]. You will be provided with [CODE_TYPE] source code. You will read the code carefully and look for [SPECIFIC_PATTERNS] that lead to [VULNERABILITY_CLASS] vulnerabilities.
    
    You are [PRECISION_LEVEL] to avoid reporting false positives. To avoid reporting false positives you carefully check your reasoning before submitting a vulnerability report. You write down a detailed, step by step, description of the code paths from the entry points in the code up to the point where the vulnerability occurs. [ADDITIONAL_VERIFICATION_STEPS]
    
    [EVIDENCE_REQUIREMENTS_TEXT]
    
    DO NOT report hypothetical vulnerabilities. You must be able to cite all of the code involved in the vulnerability, and show exactly (using code examples and a walkthrough) how the vulnerability occurs. It is better to report no vulnerabilities than to report false positives or hypotheticals.

    Layer 2: Architectural Context – Foundation Layer

    Purpose: Provide threat model, attack surface, and architectural understanding.

    Questions to ask the user:

    1. Architecture Overview: What is the high-level architecture of the system being analyzed?
      – Monolithic application, microservices, kernel module, library, etc.
      – Programming language and framework
    2. Trust Boundaries: Where are the trust boundaries in the system?
      – Network input vs. local input
      – User space vs. kernel space
      – Authenticated vs. unauthenticated access
      – Internal vs. external APIs
    3. Attack Surface: How can attackers interact with the system?
      – Network protocols (TCP, HTTP, etc.)
      – File input, user input
      – IPC mechanisms
      – Hardware interfaces
    4. Concurrency Model: Does the system handle concurrent operations?
      – Multi-threading, multi-processing
      – Async/await patterns
      – Shared resources and synchronization
    5. Security Context: What are the security assumptions and constraints?
      – Privilege levels
      – Sandboxing or isolation mechanisms
      – Trusted vs. untrusted components

    Template for Architectural Context:

    The code is for [SYSTEM_DESCRIPTION]. The architecture consists of [ARCHITECTURE_OVERVIEW].
    
    Trust boundaries exist between:
    - [TRUST_BOUNDARY_1]
    - [TRUST_BOUNDARY_2]
    - [TRUST_BOUNDARY_N]
    
    Attackers can interact with the system through:
    - [ATTACK_VECTOR_1]
    - [ATTACK_VECTOR_2]
    - [ATTACK_VECTOR_N]
    
    [CONCURRENCY_DESCRIPTION if applicable]
    
    [SECURITY_ASSUMPTIONS and constraints]

    Layer 3: Scope Definition – Boundary Setting

    Purpose: Define clear boundaries for analysis and explain methodology.

    Questions to ask the user:

    1. Code Scope: What code will be included in the analysis?
      – Specific functions, modules, or files
      – Call depth (how many levels of function calls to include)
      – Dependencies to include or exclude
    2. Functionality Focus: What specific functionality are you auditing?
      – Authentication flows
      – Data processing pipelines
      – Network protocol handlers
      – File system operations
    3. Analysis Boundaries: What are the practical limitations?
      – Context window constraints
      – Time or resource limitations
      – Code complexity considerations
    4. Methodology Explanation: How was the code selected?
      – Risk-based prioritization
      – Coverage strategy
      – Systematic vs. targeted approach

    Layer 4: Task Specification – The Audit Request

    Purpose: Provide clear, conservative instructions for the analysis task.

    [PRIMARY_TASK_INSTRUCTION]. Remember to check all of your reasoning. Avoid reporting false positives. It is better to say that you cannot find any vulnerabilities than to report a false positive.
    
    [ADDITIONAL_QUALITY_INSTRUCTIONS if needed]
    
    [REPORTING_FORMAT_INSTRUCTIONS if specific format required]

    Layer 5: Code Context – Analytical Substrate

    Purpose: Provide the actual code to be analyzed.

    Code Context Guidelines:

    Include the following in your code context:
    - All relevant source code for the functionality being audited
    - Data structure and type definitions
    - Constant and macro definitions
    - Error handling and cleanup code
    - Related utility functions up to [SPECIFIED_DEPTH]
    
    Organize the code logically:
    - Entry points first
    - Core logic
    - Helper functions
    - Data structures and definitions
    
    Total size should be approximately [TARGET_SIZE] to fit within context limits while providing complete analysis context.

    Complete Working Example: CVE-2025-37899 Discovery

    Here’s the actual five-layer prompting strategy that led to the discovery of CVE-2025-37899, demonstrating how all layers work together:

    Layer 1: System Prompt (system_prompt_uafs.prompt)

    You are an expert at finding and exploiting security vulnerabilities. Your speciality is finding vulnerabilities in the
    Linux kernel. You will be provided with C source code. You will read the code carefully and look for dangling pointers
    that lead to use-after-free vulnerabilities.
    
    You are very careful to avoid reporting false positives. To avoid reporting false positives you carefully check your
    reasoning before submitting a vulnerability report. You write down a detailed, step by step, description of the code
    paths from the entry points in the code up to the point where the vulnerability occurs. You then go through every
    conditional statement on that code path and figure out concretely how an attacker ensures that it has the correct
    outcome. Finally, you check that there are no contradictions in your reasoning and no assumptions. This ensures you
    never report a false positive. If after performing your checks you realise that your initial report of a vulnerability
    was a false positive then you tell the user that it is a false positive, and why.
    
    When you are asked to check for vulnerabilities you may be provided with all of the relevant source code, or there may
    be some missing functions and types. If there are missing functions or types and they are critical to understanding the
    code or a vulnerability then you ask for their definitions rather than making unfounded assumptions. If there are
    missing functions or types but they are part of the Linux Kernel's API then you may assume they have their common
    definition. Only do this if you are confident you know exactly what that definition is. If not, ask for the definitions.
    
    DO NOT report hypothetical vulnerabilities. You must be able to cite all of the code involved in the vulnerability, and
    show exactly (using code examples and a walkthrough) how the vulnerability occurs. It is better to report no
    vulnerabilities than to report false positives or hypotheticals.

    Layer 2: Architectural Context (ksmbd_explainer.prompt)

    The code is for the Linux kernel's SMB server implementation. There are two components:
    
    - The kernel component which accepts SMB connections and processes them.
    - A user-space component (ksmbd-tools) which is used to handle RPC calls, certain parts of the authentication process and
      some other functionality.
    
    The kernel component uses netlink IPC to call the user-space component. The user-space component is a trusted component.
    Assume that it's responses are not malicious, unless the attacker can force malicious responses by controlling IPC
    arguments from the kernel side to the user-space side.
    
    Attackers can connect to the kernel component using TCP. ksmbd spawns new kernel threads to handle connections and
    concurrent processing is possible. I have provided you with the kernel source code for connection handling, work processing,
    and the handling of SMB session setup requests.
    
    The code for the kernel component is in the kernel/ directory, while the code for the user-space component (which handles
    IPC calls from the kernel component) is in the ksmbd-tools/ directory.
    
    The user-space component is a trusted component. It may return errors, but it will not return malicious responses.

    One-Shot Execution

    # Single comprehensive prompt combining all layers
    $ llm --sf system_prompt_uafs.prompt \
      -f session_setup_code.prompt \
      -f ksmbd_explainer.prompt \
      -f session_setup_context_explainer.prompt \
      -f audit_request.prompt

    Multi-Model Strategy for Enhanced Results

    Key Insight: Different AI models excel at different aspects of vulnerability analysis. Using multiple models provides complementary strengths and cross-validation.

    Model Comparison Results (from CVE-2025-37899 research):

    OpenAI’s o3:

    • Strengths: Structured analysis, high-quality reports, novel vulnerability discovery
    • Success Rate: 8/100 for known bugs, discovered zero-day
    • Output Style: Formal vulnerability reports with exploitation steps

    Claude 3.7:

    • Strengths: Detailed reasoning process, good at explaining complex logic flows
    • Success Rate: 3/100 for known bugs
    • Output Style: Stream-of-consciousness analysis with thorough explanations

    Multi-Model Execution Strategy:

    # Run on multiple models for comprehensive coverage
    echo "Running primary analysis with o3..."
    llm -m openai:o3 --sf system_prompt.prompt -f [other_prompts] > results_o3.txt
    
    echo "Cross-validating with Claude..."
    llm -m anthropic:claude-3-5-sonnet --sf system_prompt.prompt -f [other_prompts] > results_claude.txt
    
    echo "Additional perspective with GPT-4..."
    llm -m openai:gpt-4 --sf system_prompt.prompt -f [other_prompts] > results_gpt4.txt
    
    # Compare and synthesize results
    echo "Analyzing differences and complementary findings..."

    Benefits of Multi-Model Approach:

    1. Increased Coverage: Different models may find different vulnerabilities
    2. Reduced False Positives: Cross-validation helps filter out model-specific errors
    3. Enhanced Confidence: Multiple models finding the same issue increases confidence
    4. Complementary Insights: Models may provide different perspectives on the same vulnerability
    5. Robustness: Reduces dependence on any single model’s limitations

    Execution Strategy

    One-Shot Approach (Recommended for Most Cases)

    # Single comprehensive analysis - most efficient
    llm --sf system_prompt_[domain]_[vuln_class].prompt \
      -f code_context_[functionality].prompt \
      -f architectural_context_[system].prompt \
      -f scope_definition_[analysis_area].prompt \
      -f audit_request.prompt

    Multi-Model One-Shot (Optimal for High-Stakes Analysis)

    # Run same prompts across multiple models
    models=("openai:o3" "anthropic:claude-3-5-sonnet" "openai:gpt-4")
    
    for model in "${models[@]}"; do
      echo "Analyzing with $model..."
      llm -m "$model" --sf system_prompt.prompt \
        -f code_context.prompt \
        -f architectural_context.prompt \
        -f scope_definition.prompt \
        -f audit_request.prompt > "results_${model//[^a-zA-Z0-9]/_}.txt"
    done

    Interpreting Multi-Model Results

    Result Classification Matrix

    Finding StatusModel AgreementConfidence LevelAction Required
    High Priority2+ models agreeHighImmediate manual verification
    Medium Priority1 model, logical reasoningMediumDetailed code review
    Investigation1 model, unclear reasoningLowFurther analysis needed
    False PositiveContradicted by other modelsVery LowDiscard or quick check

    Quality Indicators Across Models

    High-Quality Finding Indicators:

    • Multiple models identify the same vulnerability
    • Detailed, step-by-step exploitation path provided
    • Clear code citations and line numbers
    • Logical reasoning chain without contradictions
    • Practical attack scenario described

    Red Flags for False Positives:

    • Vague or unclear vulnerability description
    • Missing or incorrect code citations
    • Logical inconsistencies in reasoning
    • Hypothetical or theoretical scenarios
    • Contradicted by other models’ analysis

    Quality Assurance Checklist

    Before finalizing your prompts, verify:

    System Prompt Quality:

    • ☐ Clearly defines domain expertise
    • ☐ Specifies vulnerability class focus
    • ☐ Includes conservative calibration
    • ☐ Requires concrete evidence
    • ☐ Emphasizes step-by-step reasoning

    Architectural Context Quality:

    • ☐ Explains threat model clearly
    • ☐ Identifies attack vectors
    • ☐ Describes trust boundaries
    • ☐ Mentions concurrency if relevant
    • ☐ Sets security assumptions

    Final Steps

    1. Review all five layers for consistency and completeness
    2. Select appropriate models based on your specific requirements
    3. Execute multi-model analysis using one-shot or iterative approach
    4. Synthesize results using confidence matrix and quality indicators
    5. Validate high-priority findings through manual analysis
    6. Document your methodology for reproducibility and future reference
    7. Iterate and refine prompts based on results and false positive rates

    Success Metrics to Track

    Quantitative Metrics:

    • True positive rate (confirmed vulnerabilities found)
    • False positive rate (incorrect reports)
    • Coverage (percentage of actual vulnerabilities discovered)
    • Consistency (agreement across multiple runs)
    • Efficiency (vulnerabilities found per hour of analysis)

    Qualitative Metrics:

    • Report quality and actionability
    • Exploitation feasibility assessment accuracy
    • Fix recommendation quality
    • Analysis depth and thoroughness

    Conclusion

    The key to successful AI-assisted vulnerability research is the combination of carefully constructed prompts, strategic use of multiple AI models, and rigorous validation processes. The multi-model approach significantly improves both coverage and confidence while reducing the risk of missing critical vulnerabilities.

    This framework provides a systematic, reproducible methodology for leveraging AI in security research while maintaining the quality and rigor that vulnerability discovery demands. By following these guidelines, security teams can harness the power of AI to accelerate their vulnerability research efforts while ensuring reliable, actionable results.

  • The Prompting Strategy Behind CVE-2025-37899: A Deep Dive into AI-Discovered Vulnerability Research

    An analysis of the exact prompting methodology that led OpenAI’s o3 to discover a critical zero-day vulnerability in the Linux kernel


    Executive Summary

    In May 2025, security researcher Sean Heelan used OpenAI’s o3 model to discover CVE-2025-37899, a critical remote use-after-free vulnerability in the Linux kernel’s SMB implementation. This document analyzes the exact prompting strategy that made this discovery possible, providing a reproducible framework for AI-assisted vulnerability research.

    Key Finding: The success came not from complex AI frameworks, but from carefully crafted, multi-layered prompts that established context, constrained scope, and calibrated for precision over recall.

    Source Materials:

    The Complete Prompting Architecture

    1. System Prompt: Role Definition & Methodology

    File: system_prompt_uafs.prompt

    You are an expert at finding and exploiting security vulnerabilities. Your speciality is finding vulnerabilities in the
    Linux kernel. You will be provided with C source code. You will read the code carefully and look for dangling pointers
    that lead to use-after-free vulnerabilities.
    
    You are very careful to avoid reporting false positives. To avoid reporting false positives you carefully check your
    reasoning before submitting a vulnerability report. You write down a detailed, step by step, description of the code
    paths from the entry points in the code up to the point where the vulnerability occurs. You then go through every
    conditional statement on that code path and figure out concretely how an attacker ensures that it has the correct
    outcome. Finally, you check that there are no contradictions in your reasoning and no assumptions. This ensures you
    never report a false positive. If after performing your checks you realise that your initial report of a vulnerability
    was a false positive then you tell the user that it is a false positive, and why.
    
    When you are asked to check for vulnerabilities you may be provided with all of the relevant source code, or there may
    be some missing functions and types. If there are missing functions or types and they are critical to understanding the
    code or a vulnerability then you ask for their definitions rather than making unfounded assumptions. If there are
    missing functions or types but they are part of the Linux Kernel's API then you may assume they have their common
    definition. Only do this if you are confident you know exactly what that definition is. If not, ask for the definitions.
    
    DO NOT report hypothetical vulnerabilities. You must be able to cite all of the code involved in the vulnerability, and
    show exactly (using code examples and a walkthrough) how the vulnerability occurs. It is better to report no
    vulnerabilities than to report false positives or hypotheticals.

    Strategic Design Elements:

    • Hyper-Specific Expertise: Establishes the AI as a Linux kernel security expert, not a generic code auditor
    • Vulnerability Class Focus: Targets only use-after-free bugs, eliminating noise from other vulnerability types
    • Conservative Calibration: Multiple emphases on avoiding false positives, creating a high-precision mindset
    • Evidence Framework: Requires step-by-step reasoning with concrete code citations
    • Contradiction Detection: Instructs the AI to check for logical inconsistencies in its own analysis

    2. Architectural Context: The Foundation Layer

    File: ksmbd_explainer.prompt

    The code is for the Linux kernel's SMB server implementation. There are two components:
    
    - The kernel component which accepts SMB connections and processes them.
    - A user-space component (ksmbd-tools) which is used to handle RPC calls, certain parts of the authentication process and
      some other functionality.
    
    The kernel component uses netlink IPC to call the user-space component. The user-space component is a trusted component.
    Assume that it's responses are not malicious, unless the attacker can force malicious responses by controlling IPC
    arguments from the kernel side to the user-space side.
    
    Attackers can connect to the kernel component using TCP. ksmbd spawns new kernel threads to handle connections and
    concurrent processing is possible. I have provided you with the kernel source code for connection handling, work processing,
    and the handling of SMB session setup requests.
    
    The code for the kernel component is in the kernel/ directory, while the code for the user-space component (which handles
    IPC calls from the kernel component) is in the ksmbd-tools/ directory.
    
    The user-space component is a trusted component. It may return errors, but it will not return malicious responses.

    3. Scope Definition: Boundary Setting

    File: session_setup_context_explainer.prompt

    ksmbd has too much code for it all to fit in your context window in one go.
    Therefore you are going to audit each SMB command in turn. Commands are
    handled by the __process_request function from server.c, which selects a
    command from the conn->cmds list and calls it. We are currently auditing the
    smb2_sess_setup command. The code context you have been given includes all of
    the work setup code code up to the __process_request function, the
    smb2_sess_setup function and a breadth first expansion of smb2_sess_setup up
    to a depth of 3 function calls.

    4. Task Specification: The Audit Request

    File: audit_request.prompt

    Audit the code for security vulnerabilities. Remember to check all of your reasoning. Avoid reporting false positives.
    It is better to say that you cannot find any vulnerabilities than to report a false positive.

    5. Code Context: The Analytical Substrate

    File: session_setup_code.prompt
    Volume: ~100KB of Linux kernel source code (~12,000 lines)

    Strategic Content Selection:

    • Complete SMB2 Protocol Handlers: All command processing logic
    • Authentication Subsystems: Kerberos and NTLM authentication flows
    • Session Management: Session creation, binding, and lifecycle management
    • Connection Handling: Multi-threaded connection processing
    • Memory Management: User object allocation and deallocation
    • Cryptographic Functions: Key generation and session security

    Prompt Execution Method

    The researcher used the llm command-line tool to combine all prompts:

    $ llm --sf system_prompt_uafs.prompt \
      -f session_setup_code.prompt \
      -f ksmbd_explainer.prompt \
      -f session_setup_context_explainer.prompt \
      -f audit_request.prompt

    This approach runs the complete analysis 100 times to measure consistency and success rates.

    Strategic Analysis: Why This Approach Succeeded

    1. Layered Context Building

    Strategy: Information provided in logical layers, from abstract to concrete

    • Layer 1: Role and methodology (system prompt)
    • Layer 2: Architecture and threat model (context explainer)
    • Layer 3: Specific scope and boundaries (scope definition)
    • Layer 4: Task execution (audit request)
    • Layer 5: Raw material (source code)

    Why Effective: Each layer builds on the previous, creating a coherent analytical framework that guides the AI’s reasoning process.

    2. Conservative Calibration Strategy

    Problem: AI models tend toward false positives in security analysis
    Solution: Multiple reinforcements to prioritize precision over recall

    Implementation:

    • System prompt: “very careful to avoid reporting false positives”
    • Task prompt: “Avoid reporting false positives”
    • Methodology: “Better to say you cannot find any vulnerabilities than to report a false positive”

    Result: 28% false positive rate (compared to typical 60-80% in generic vulnerability scanning)

    3. Concurrency Priming

    Strategy: Explicit mention of multi-threading in context layer
    Impact: Prepared the AI to consider race conditions and concurrent access patterns
    Payoff: CVE-2025-37899 was specifically a concurrency vulnerability involving session binding and logoff timing

    Technical Deep Dive: The Discovered Vulnerability

    CVE-2025-37899: Session Logoff Use-After-Free

    Root Cause: Race condition in smb2_session_logoff() function

    int smb2_session_logoff(struct ksmbd_work *work) {
        // ... setup code ...
        
        if (sess->user) {
            ksmbd_free_user(sess->user);  // (1) Frees memory
            sess->user = NULL;            // (2) Clears the field  
        }
        
        // Problem: Only waits for current connection, not bound connections
        ksmbd_conn_wait_idle(conn);
    }

    Attack Scenario (as discovered by o3):

    1. Attacker establishes two connections (C1, C2) bound to same session (SMB 3.0+ feature)
    2. Worker-A (on C2) receives normal request (e.g., WRITE), stores sess->user pointer
    3. Worker-B (on C1) processes LOGOFF command, frees sess->user
    4. Worker-A continues processing, dereferences freed sess->user → use-after-free

    Why o3 Found This:

    • Concurrency Priming: Context explained multi-threading and session binding
    • Memory Focus: System prompt specifically targeted use-after-free vulnerabilities
    • Complete Code Context: Had full session management and connection binding implementation
    • Systematic Analysis: Required step-by-step reasoning through concurrent execution paths

    Quality Comparison: o3 vs Claude 3.7

    o3 Analysis Structure

    Vulnerability: use‑after‑free of sess->user during failed re‑authentication
    
    1. Entry point: smb2_sess_setup()
    2. Preconditions an attacker can meet: [detailed attack scenario]
    3. Code path to the bug: [step-by-step trace with line numbers]
    4. Result: [specific exploitation impact]
    5. How the attacker controls the branch: [attack methodology]
    6. Proof‑of‑concept outline: [concrete exploit steps]
    7. Fix suggestion: [specific code changes with diff]

    Performance Metrics

    • o3 Success Rate: 8/100 runs found CVE-2025-37778 (known vulnerability)
    • Claude 3.7 Success Rate: 3/100 runs found CVE-2025-37778
    • False Positive Rate: ~28% for o3
    • Novel Discovery: o3 discovered CVE-2025-37899 (zero-day), Claude 3.7 did not

    Reproducible Framework for AI-Assisted Vulnerability Research

    Phase 1: Preparation

    1. Define Vulnerability Class: Choose specific vulnerability type (use-after-free, integer overflow, etc.)
    2. Map Attack Surface: Identify entry points and trust boundaries
    3. Understand Concurrency Model: Document threading and synchronization patterns
    4. Select Code Scope: Choose manageable chunks with clear boundaries

    Phase 2: Prompt Construction

    1. System Prompt: Establish expertise, methodology, and quality standards
    2. Context Layer: Provide architectural understanding and threat model
    3. Scope Definition: Set clear boundaries and explain selection methodology
    4. Task Specification: Give clear, conservative instructions
    5. Code Provision: Supply comprehensive, relevant source code

    Strategic Implications

    For Security Teams

    Immediate Applications:

    • Code Audit Acceleration: Use AI for systematic vulnerability discovery
    • Knowledge Transfer: AI can identify complex vulnerabilities that junior analysts might miss
    • Coverage Expansion: Audit larger codebases than humanly feasible

    Implementation Considerations:

    • Expert Validation Required: AI findings must be verified by security professionals
    • False Positive Management: ~28% false positive rate requires efficient triage processes
    • Tool Integration: Incorporate into existing security workflows and CI/CD pipelines

    For AI Research

    Demonstrated Capabilities:

    • Complex Reasoning: AI can follow intricate concurrent execution paths
    • Domain Expertise: Specialized knowledge application in security contexts
    • Quality Control: Conservative calibration produces actionable results

    Remaining Limitations:

    • Context Boundaries: Still limited by token constraints for very large codebases
    • Expert Validation: Requires human security expertise for practical deployment
    • Novel Patterns: May struggle with completely new vulnerability classes

    Conclusion

    The discovery of CVE-2025-37899 represents a watershed moment in cybersecurity research. The success came not from complex AI frameworks or automated tools, but from carefully crafted prompts that:

    1. Established Clear Expertise: Positioned the AI as a specialized Linux kernel security expert
    2. Provided Strategic Context: Explained the architecture, threat model, and concurrency patterns
    3. Calibrated for Quality: Emphasized precision over recall to produce actionable results
    4. Supplied Comprehensive Context: Included all relevant code for complete analysis
    5. Required Systematic Reasoning: Demanded step-by-step analysis with concrete evidence

    This methodology is reproducible, scalable, and immediately applicable to current security workflows. Organizations should begin integrating AI-assisted vulnerability discovery while maintaining the critical human expertise needed for validation and exploitation.

    The age of AI-accelerated cybersecurity has begun. The question is not whether AI will transform vulnerability research, but how quickly security teams can adapt to leverage this new capability while maintaining the quality and rigor that security demands.

    Key Takeaway: The future of vulnerability research lies not in replacing human expertise, but in amplifying it through carefully designed AI collaboration frameworks. The prompting strategy behind CVE-2025-37899 provides the blueprint for this transformation.