SQL Query Cookbook for DuckDB CPG

A practical collection of SQL queries for common code analysis tasks on DuckDB CPG.

Table of Contents

  1. Method Queries
  2. Call Analysis
  3. Control Flow
  4. Data Flow
  5. File Analysis
  6. Security Patterns
  7. Code Quality
  8. Statistics
  9. Pre-computed Metrics (v6.0)
  10. Advanced Patterns
  11. OOP Analysis
  12. Reference and Containment
  13. Static Analysis Results
  14. Import Analysis
  15. ID-Based Joins
  16. PGQ Prerequisites
  17. SQL/PGQ Graph Queries
  18. Usage Tips
  19. Performance Notes
  20. Next Steps
  21. See Also

Method Queries

Find Method by Exact Name

SELECT id, name, full_name, filename, line_number, signature
FROM nodes_method
WHERE name = 'authenticate'
LIMIT 10;

Find Methods by Pattern

SELECT id, name, full_name, filename, line_number
FROM nodes_method
WHERE name LIKE '%process%'
ORDER BY name
LIMIT 50;

Methods by Signature Pattern

SELECT name, full_name, signature, filename
FROM nodes_method
WHERE signature LIKE '%char*%'  -- Methods taking char* parameter
ORDER BY name
LIMIT 100;

External vs Internal Methods

-- External methods (from libraries)
SELECT name, full_name, COUNT(*) as count
FROM nodes_method
WHERE is_external = true
GROUP BY name, full_name
ORDER BY count DESC
LIMIT 20;

-- Internal methods (your code)
SELECT filename, COUNT(*) as method_count
FROM nodes_method
WHERE is_external = false
GROUP BY filename
ORDER BY method_count DESC
LIMIT 20;

Methods by Line Count

SELECT
    name,
    filename,
    line_number,
    line_number_end,
    (line_number_end - line_number) as lines_of_code
FROM nodes_method
WHERE line_number_end IS NOT NULL
  AND (line_number_end - line_number) > 50  -- Long methods
ORDER BY lines_of_code DESC
LIMIT 20;

Call Analysis

Direct Callees (What does X call?)

SELECT DISTINCT
    callee.name,
    callee.full_name,
    callee.filename,
    callee.line_number
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON ec.dst = callee.id
WHERE caller.name = 'main'
ORDER BY callee.name
LIMIT 100;

Direct Callers (Who calls X?)

SELECT DISTINCT
    caller.name,
    caller.full_name,
    caller.filename,
    caller.line_number
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method callee ON ec.dst = callee.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
WHERE callee.name = 'malloc'
ORDER BY caller.name
LIMIT 100;

Call Chain (Transitive Callees)

WITH RECURSIVE call_chain AS (
    -- Base: direct calls from starting method
    SELECT
        ec.dst as method_id,
        1 as depth,
        m_start.name as start_method
    FROM edges_call ec
    JOIN nodes_call c ON ec.src = c.id
    JOIN nodes_method m_start ON c.containing_method_id = m_start.id
    WHERE m_start.name = 'main'

    UNION ALL

    -- Recursive: follow the chain
    SELECT
        ec2.dst,
        cc.depth + 1,
        cc.start_method
    FROM edges_call ec2
    JOIN nodes_call c2 ON ec2.src = c2.id
    JOIN nodes_method m2 ON c2.containing_method_id = m2.id
    JOIN call_chain cc ON m2.id = cc.method_id
    WHERE cc.depth < 5
)
SELECT DISTINCT
    m.name,
    m.full_name,
    MIN(cc.depth) as min_depth
FROM call_chain cc
JOIN nodes_method m ON cc.method_id = m.id
GROUP BY m.id, m.name, m.full_name
ORDER BY min_depth, m.name
LIMIT 100;

Top Callers (Methods Making Most Calls)

SELECT
    m.name,
    m.full_name,
    m.filename,
    COUNT(DISTINCT c.id) as outgoing_calls
FROM nodes_method m
LEFT JOIN nodes_call c ON c.containing_method_id = m.id
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY outgoing_calls DESC
LIMIT 20;

Top Callees (Most Called Methods)

SELECT
    m.name,
    m.full_name,
    m.filename,
    COUNT(ec.src) as incoming_calls
FROM nodes_method m
LEFT JOIN edges_call ec ON m.id = ec.dst
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY incoming_calls DESC
LIMIT 20;

Call Pairs (Frequent Caller-Callee Patterns)

SELECT
    caller.name as caller,
    callee.name as callee,
    COUNT(*) as frequency
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON ec.dst = callee.id
GROUP BY caller.name, callee.name
ORDER BY frequency DESC
LIMIT 50;

Leaf Methods (No Outgoing Calls)

SELECT
    m.name,
    m.full_name,
    m.filename
FROM nodes_method m
WHERE NOT EXISTS (
    SELECT 1
    FROM nodes_call c
    WHERE c.containing_method_id = m.id
)
AND m.is_external = false
ORDER BY m.name
LIMIT 100;

Root Methods (Not Called by Anyone)

SELECT
    m.name,
    m.full_name,
    m.filename,
    m.line_number
FROM nodes_method m
WHERE NOT EXISTS (
    SELECT 1
    FROM edges_call ec
    WHERE ec.dst = m.id
)
AND m.is_external = false
ORDER BY m.filename, m.line_number
LIMIT 100;

Control Flow

Methods with Branches

-- Find methods containing IF/SWITCH structures via AST
SELECT DISTINCT
    m.name,
    m.full_name,
    m.filename,
    COUNT(DISTINCT cs.id) as branch_count
FROM nodes_method m
JOIN edges_ast ast ON m.id = ast.src
JOIN nodes_control_structure cs ON ast.dst = cs.id
WHERE cs.control_structure_type IN ('IF', 'SWITCH')
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY branch_count DESC
LIMIT 20;

Methods with Loops

-- Find methods containing loop structures via AST
SELECT DISTINCT
    m.name,
    m.full_name,
    m.filename,
    COUNT(DISTINCT cs.id) as loop_count
FROM nodes_method m
JOIN edges_ast ast ON m.id = ast.src
JOIN nodes_control_structure cs ON ast.dst = cs.id
WHERE cs.control_structure_type IN ('WHILE', 'FOR', 'DO')
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY loop_count DESC
LIMIT 20;

Cyclomatic Complexity Approximation

-- Approximate complexity by counting control structures
-- NOTE: For GoCPG databases, prefer the pre-computed
-- cyclomatic_complexity column on nodes_method (see "Pre-computed Metrics")
SELECT
    m.name,
    m.filename,
    1 + COUNT(DISTINCT cs.id) as approx_complexity
FROM nodes_method m
LEFT JOIN edges_ast ast ON m.id = ast.src
LEFT JOIN nodes_control_structure cs ON ast.dst = cs.id
WHERE m.is_external = false
GROUP BY m.id, m.name, m.filename
ORDER BY approx_complexity DESC
LIMIT 30;

Control Flow Graph Traversal (SQL/PGQ)

-- Follow CFG edges using property graph (requires PGQ prerequisites)
SELECT *
FROM GRAPH_TABLE(cpg
    MATCH (start:CPG_NODE)-[:CFG*1..3]->(end_node:CPG_NODE)
    COLUMNS (start.id AS start_node, end_node.id AS end_node)
)
LIMIT 100;

Data Flow

Variable Flow Paths

WITH RECURSIVE var_flow AS (
    -- Base: initial definitions
    SELECT src, dst, variable, 1 as depth
    FROM edges_reaching_def
    WHERE variable = 'password'

    UNION ALL

    -- Recursive: follow the flow
    SELECT erd.src, erd.dst, erd.variable, vf.depth + 1
    FROM edges_reaching_def erd
    JOIN var_flow vf ON erd.src = vf.dst
    WHERE vf.depth < 10
)
SELECT
    variable,
    COUNT(DISTINCT src) as definition_points,
    COUNT(DISTINCT dst) as use_points,
    MAX(depth) as max_depth
FROM var_flow
GROUP BY variable;

Tainted Parameters

-- Methods receiving tainted input
SELECT DISTINCT
    m.name,
    m.full_name,
    p.name as parameter_name,
    p.type_full_name as parameter_type
FROM nodes_method m
JOIN nodes_param p ON p.method_id = m.id
WHERE p.name IN ('userInput', 'request', 'input', 'data')
ORDER BY m.name
LIMIT 50;

File Analysis

Methods Per File

SELECT
    filename,
    COUNT(*) as method_count,
    SUM(CASE WHEN is_external THEN 1 ELSE 0 END) as external_methods,
    SUM(CASE WHEN is_external THEN 0 ELSE 1 END) as internal_methods
FROM nodes_method
GROUP BY filename
ORDER BY method_count DESC
LIMIT 30;

Files by Call Activity

SELECT
    m.filename,
    COUNT(DISTINCT c.id) as total_calls,
    COUNT(DISTINCT c.name) as unique_calls
FROM nodes_method m
JOIN nodes_call c ON c.containing_method_id = m.id
GROUP BY m.filename
ORDER BY total_calls DESC
LIMIT 20;

Cross-File Calls

SELECT
    caller.filename as from_file,
    callee.filename as to_file,
    COUNT(*) as call_count
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON ec.dst = callee.id
WHERE caller.filename != callee.filename
GROUP BY caller.filename, callee.filename
ORDER BY call_count DESC
LIMIT 50;

Security Patterns

Memory Management Issues

-- Methods that malloc but don't free
SELECT DISTINCT m.name, m.full_name
FROM nodes_method m
WHERE EXISTS (
    SELECT 1 FROM nodes_call c
    WHERE c.containing_method_id = m.id
      AND c.name = 'malloc'
)
AND NOT EXISTS (
    SELECT 1 FROM nodes_call c
    WHERE c.containing_method_id = m.id
      AND c.name = 'free'
)
LIMIT 50;

Dangerous Function Calls

SELECT
    m.name as caller,
    m.filename,
    c.name as dangerous_function,
    m.line_number
FROM nodes_call c
JOIN nodes_method m ON c.containing_method_id = m.id
WHERE c.name IN ('strcpy', 'sprintf', 'gets', 'system', 'eval')
ORDER BY m.filename, m.line_number
LIMIT 100;

SQL Injection Candidates

SELECT DISTINCT
    m.name,
    m.full_name,
    m.filename
FROM nodes_method m
WHERE EXISTS (
    -- Has SQL-related calls
    SELECT 1 FROM nodes_call c
    WHERE c.containing_method_id = m.id
      AND (c.name LIKE '%query%' OR c.name LIKE '%execute%')
)
AND EXISTS (
    -- Receives user input
    SELECT 1 FROM nodes_param p
    WHERE p.method_id = m.id
      AND (p.name LIKE '%input%' OR p.name LIKE '%request%')
)
LIMIT 50;

Authentication Bypass Candidates

SELECT
    m.name,
    m.full_name,
    m.filename
FROM nodes_method m
WHERE m.name LIKE '%auth%'
  OR m.name LIKE '%login%'
  OR m.name LIKE '%verify%'
ORDER BY m.name
LIMIT 100;

Code Quality

God Methods (Too Long)

SELECT
    name,
    filename,
    line_number,
    line_number_end,
    (line_number_end - line_number) as loc
FROM nodes_method
WHERE line_number_end IS NOT NULL
  AND (line_number_end - line_number) > 100
ORDER BY loc DESC
LIMIT 20;

High Fan-Out (Too Many Calls)

SELECT
    m.name,
    m.filename,
    COUNT(DISTINCT c.id) as fan_out
FROM nodes_method m
JOIN nodes_call c ON c.containing_method_id = m.id
GROUP BY m.id, m.name, m.filename
HAVING COUNT(DISTINCT c.id) > 10
ORDER BY fan_out DESC
LIMIT 20;

High Fan-In (Too Many Callers)

SELECT
    m.name,
    m.filename,
    COUNT(DISTINCT ec.src) as fan_in
FROM nodes_method m
JOIN edges_call ec ON m.id = ec.dst
GROUP BY m.id, m.name, m.filename
HAVING COUNT(DISTINCT ec.src) > 10
ORDER BY fan_in DESC
LIMIT 20;

Unused Methods (Potential Dead Code)

SELECT
    m.name,
    m.full_name,
    m.filename
FROM nodes_method m
WHERE m.is_external = false
  AND NOT EXISTS (
      SELECT 1 FROM edges_call ec WHERE ec.dst = m.id
  )
  AND m.name != 'main'
  AND m.name NOT LIKE 'test%'  -- Exclude entry points and tests
ORDER BY m.filename, m.name
LIMIT 100;

Statistics

CPG Overview

SELECT
    (SELECT COUNT(*) FROM nodes_method) as total_methods,
    (SELECT COUNT(*) FROM nodes_method WHERE is_external = false) as internal_methods,
    (SELECT COUNT(*) FROM nodes_call) as total_calls,
    (SELECT COUNT(*) FROM edges_call) as call_edges,
    (SELECT COUNT(DISTINCT filename) FROM nodes_method) as file_count;

Call Distribution

SELECT
    call_count_bucket,
    COUNT(*) as methods_in_bucket
FROM (
    SELECT
        m.id,
        CASE
            WHEN call_count = 0 THEN '0 calls'
            WHEN call_count <= 5 THEN '1-5 calls'
            WHEN call_count <= 10 THEN '6-10 calls'
            WHEN call_count <= 20 THEN '11-20 calls'
            ELSE '20+ calls'
        END as call_count_bucket
    FROM nodes_method m
    LEFT JOIN (
        SELECT
            c.containing_method_id as id,
            COUNT(*) as call_count
        FROM nodes_call c
        WHERE c.containing_method_id IS NOT NULL
        GROUP BY c.containing_method_id
    ) counts ON m.id = counts.id
)
GROUP BY call_count_bucket
ORDER BY
    CASE call_count_bucket
        WHEN '0 calls' THEN 1
        WHEN '1-5 calls' THEN 2
        WHEN '6-10 calls' THEN 3
        WHEN '11-20 calls' THEN 4
        ELSE 5
    END;

Language Distribution (by filename extension)

SELECT
    CASE
        WHEN filename LIKE '%.c' THEN 'C'
        WHEN filename LIKE '%.cpp' OR filename LIKE '%.cc' THEN 'C++'
        WHEN filename LIKE '%.java' THEN 'Java'
        WHEN filename LIKE '%.py' THEN 'Python'
        WHEN filename LIKE '%.js' THEN 'JavaScript'
        ELSE 'Other'
    END as language,
    COUNT(*) as method_count
FROM nodes_method
WHERE is_external = false
GROUP BY language
ORDER BY method_count DESC;

Method Signature Complexity

SELECT
    m.name,
    m.signature,
    LENGTH(m.signature) as sig_length,
    (LENGTH(m.signature) - LENGTH(REPLACE(m.signature, ',', ''))) + 1 as param_count
FROM nodes_method m
WHERE m.signature IS NOT NULL
  AND m.is_external = false
ORDER BY param_count DESC, sig_length DESC
LIMIT 30;

Pre-computed Metrics (v6.0)

GoCPG v6.0+ pre-computes metrics and pattern flags directly on nodes_method, eliminating the need for expensive JOIN-based approximations. These columns are populated during CPG generation.

Complexity Hotspots

-- Methods with highest cyclomatic complexity
SELECT
    name,
    full_name,
    filename,
    cyclomatic_complexity,
    fan_in,
    fan_out,
    (line_number_end - line_number) as loc
FROM nodes_method
WHERE is_external = false
  AND cyclomatic_complexity > 10
ORDER BY cyclomatic_complexity DESC
LIMIT 30;

High Fan-In / Fan-Out (Pre-computed)

-- Hub methods: high fan-in AND fan-out (architectural risk)
SELECT
    name,
    full_name,
    filename,
    fan_in,
    fan_out,
    cyclomatic_complexity
FROM nodes_method
WHERE is_external = false
  AND fan_in > 5
  AND fan_out > 5
ORDER BY (fan_in + fan_out) DESC
LIMIT 20;

Test vs Production Code

-- Test coverage distribution
SELECT
    is_test,
    COUNT(*) as method_count,
    AVG(cyclomatic_complexity) as avg_complexity,
    AVG(fan_out) as avg_fan_out
FROM nodes_method
WHERE is_external = false
GROUP BY is_test;

-- Find public API entry points with high complexity
SELECT full_name, cyclomatic_complexity, fan_in
FROM nodes_method
WHERE is_entry_point = TRUE AND cyclomatic_complexity > 10
ORDER BY cyclomatic_complexity DESC;

Code Hygiene Flags

-- Methods with TODO/FIXME comments
SELECT name, full_name, filename
FROM nodes_method
WHERE has_todo_fixme = true AND is_external = false
ORDER BY filename, name;

-- Deprecated methods still being called
SELECT
    m.name,
    m.full_name,
    m.filename,
    COUNT(ec.src) as call_count
FROM nodes_method m
JOIN edges_call ec ON m.id = ec.dst
WHERE m.has_deprecated = true
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY call_count DESC;

-- Methods with debug code left in
SELECT name, full_name, filename
FROM nodes_method
WHERE has_debug_code = true AND is_external = false
ORDER BY filename;

-- Methods with disabled code blocks (#if 0, if false, etc.)
SELECT name, full_name, filename
FROM nodes_method
WHERE has_disabled_code = true AND is_external = false
ORDER BY filename;

Nested Method Detection

-- Find nested/inner functions (closures, lambdas, inner classes)
SELECT
    name,
    full_name,
    filename,
    cyclomatic_complexity
FROM nodes_method
WHERE is_nested = true AND is_external = false
ORDER BY cyclomatic_complexity DESC
LIMIT 50;

Advanced Patterns

Find Method Chains (A -> B -> C)

WITH chain AS (
    SELECT
        caller.name as method_a,
        intermediate.name as method_b,
        callee.name as method_c
    FROM edges_call ec1
    JOIN nodes_call c1 ON ec1.src = c1.id
    JOIN nodes_method caller ON c1.containing_method_id = caller.id
    JOIN nodes_method intermediate ON ec1.dst = intermediate.id
    JOIN nodes_call c2 ON c2.containing_method_id = intermediate.id
    JOIN edges_call ec2 ON c2.id = ec2.src
    JOIN nodes_method callee ON ec2.dst = callee.id
    WHERE caller.name = 'main'
)
SELECT DISTINCT method_a, method_b, method_c
FROM chain
LIMIT 100;

Connected Components (Method Clusters)

-- Find methods in same call graph
WITH RECURSIVE component AS (
    SELECT id, name, id as root
    FROM nodes_method
    WHERE name = 'main'

    UNION

    SELECT m.id, m.name, c.root
    FROM nodes_method m
    JOIN edges_call ec ON m.id = ec.dst
    JOIN nodes_call nc ON nc.id = ec.src
    JOIN component c ON nc.containing_method_id = c.id
)
SELECT
    root,
    COUNT(DISTINCT id) as component_size,
    STRING_AGG(DISTINCT name, ', ') as methods
FROM component
GROUP BY root
ORDER BY component_size DESC
LIMIT 10;

OOP Analysis

Queries using nodes_type_decl, nodes_member, and edges_inherits_from for object-oriented code analysis.

Find All Classes/Types

SELECT
    td.name,
    td.full_name,
    td.filename,
    td.line_number,
    td.is_external,
    td.inherits_from_type_full_name,
    td.alias_type_full_name
FROM nodes_type_decl td
WHERE td.is_external = false
ORDER BY td.filename, td.line_number
LIMIT 100;

Class Members (Fields)

-- All fields of a specific class
SELECT
    td.name as class_name,
    mb.name as field_name,
    mb.type_full_name as field_type,
    mb.code,
    mb.line_number
FROM nodes_type_decl td
JOIN edges_ast ea ON td.id = ea.src
JOIN nodes_member mb ON ea.dst = mb.id
WHERE td.name = 'Point'
ORDER BY mb.order_index;

Classes with Most Members

SELECT
    td.name as class_name,
    td.full_name,
    td.filename,
    COUNT(mb.id) as member_count
FROM nodes_type_decl td
JOIN edges_ast ea ON td.id = ea.src
JOIN nodes_member mb ON ea.dst = mb.id
WHERE td.is_external = false
GROUP BY td.id, td.name, td.full_name, td.filename
ORDER BY member_count DESC
LIMIT 20;

Inheritance Hierarchy (SQL version)

-- Direct inheritance relationships
SELECT
    td.name as derived_class,
    td.full_name as derived_full_name,
    td.inherits_from_type_full_name as base_types,
    td.filename
FROM nodes_type_decl td
WHERE td.inherits_from_type_full_name IS NOT NULL
  AND LEN(td.inherits_from_type_full_name) > 0
ORDER BY td.name;

Inheritance via Edge Table

-- Using edges_inherits_from for resolved hierarchy
SELECT
    derived.name as derived_class,
    derived.full_name as derived_full_name,
    t.name as base_type_name,
    t.full_name as base_type_full_name
FROM edges_inherits_from eif
JOIN nodes_type_decl derived ON eif.src = derived.id
JOIN nodes_type t ON eif.dst = t.id
ORDER BY derived.name;

Type Aliases

-- Find type aliases (typedef, using, type alias)
SELECT
    td.name as alias_name,
    td.full_name as alias_full_name,
    td.alias_type_full_name as actual_type,
    td.filename
FROM nodes_type_decl td
WHERE td.alias_type_full_name IS NOT NULL
ORDER BY td.name;

Classes Without Methods (Data-Only Types)

-- Structs/classes that have members but no methods defined on them
SELECT
    td.name,
    td.full_name,
    td.filename,
    COUNT(DISTINCT mb.id) as field_count
FROM nodes_type_decl td
JOIN edges_ast ea ON td.id = ea.src
JOIN nodes_member mb ON ea.dst = mb.id
WHERE td.is_external = false
  AND NOT EXISTS (
      SELECT 1 FROM nodes_method m
      WHERE m.ast_parent_full_name = td.full_name
  )
GROUP BY td.id, td.name, td.full_name, td.filename
ORDER BY field_count DESC
LIMIT 20;

Reference and Containment

Queries using edges_ref, edges_contains, and related edge tables for structural analysis.

Variable References

-- Find all references to a specific variable/identifier
SELECT
    i.name as identifier,
    i.filename,
    i.line_number,
    i.type_full_name
FROM nodes_identifier i
JOIN edges_ref er ON i.id = er.src
WHERE i.name = 'config'
ORDER BY i.filename, i.line_number
LIMIT 100;

Identifier to Declaration

-- Resolve identifiers to their declarations (locals, params, methods)
SELECT
    i.name as identifier,
    i.filename as use_file,
    i.line_number as use_line,
    l.name as local_name,
    l.type_full_name as local_type
FROM nodes_identifier i
JOIN edges_ref er ON i.id = er.src
JOIN nodes_local l ON er.dst = l.id
WHERE i.name = 'result'
LIMIT 50;

Method Containment

-- All nodes contained within a specific method
SELECT
    ec.dst as node_id,
    cn.node_type
FROM edges_contains ec
JOIN cpg_nodes cn ON ec.dst = cn.id
WHERE ec.src = (
    SELECT id FROM nodes_method WHERE name = 'main' LIMIT 1
)
LIMIT 100;

Call Sites in a Method (via edges_contains)

-- Find all call sites within a method using containment edges
SELECT
    m.name as method_name,
    c.name as called_function,
    c.line_number,
    c.code
FROM nodes_method m
JOIN edges_contains ec ON m.id = ec.src
JOIN nodes_call c ON ec.dst = c.id
WHERE m.name = 'process_request'
ORDER BY c.line_number
LIMIT 50;

Methods Referencing a Type

-- Find methods that reference a given type declaration
SELECT DISTINCT
    m.name as method_name,
    m.full_name,
    m.filename
FROM nodes_identifier i
JOIN edges_ref er ON i.id = er.src
JOIN nodes_type_decl td ON er.dst = td.id
JOIN nodes_method m ON i.containing_method_id = m.id
WHERE td.name = 'Connection'
ORDER BY m.name
LIMIT 50;

Static Analysis Results

Queries using nodes_finding and nodes_tag_v2 for working with static analysis findings and semantic tags. These tables are populated by GoCPG.

All Findings by Severity

SELECT
    severity,
    category,
    COUNT(*) as finding_count
FROM nodes_finding
WHERE status = 'open'
GROUP BY severity, category
ORDER BY
    CASE severity
        WHEN 'error' THEN 1
        WHEN 'warning' THEN 2
        WHEN 'info' THEN 3
        WHEN 'hint' THEN 4
        ELSE 5
    END,
    category;

Findings by CWE / Rule ID

-- Findings grouped by rule ID (often maps to CWE)
SELECT
    rule_id,
    severity,
    COUNT(*) as count,
    STRING_AGG(DISTINCT title, '; ') as titles
FROM nodes_finding
WHERE status = 'open'
  AND rule_id IS NOT NULL
GROUP BY rule_id, severity
ORDER BY count DESC
LIMIT 30;

Finding Details with Evidence

-- Detailed findings with their key-value evidence
SELECT
    f.title,
    f.severity,
    f.category,
    f.description,
    f.source,
    kv.key as evidence_key,
    kv.value as evidence_value
FROM nodes_finding f
LEFT JOIN edges_ast ea ON f.id = ea.src
LEFT JOIN nodes_key_value_pair kv ON ea.dst = kv.id
WHERE f.severity = 'error'
ORDER BY f.id
LIMIT 50;

Security Findings

SELECT
    title,
    description,
    rule_id,
    confidence,
    source
FROM nodes_finding
WHERE category = 'security'
  AND status = 'open'
ORDER BY
    CASE severity WHEN 'error' THEN 1 WHEN 'warning' THEN 2 ELSE 3 END,
    confidence DESC
LIMIT 50;

Tagged Methods

-- Methods tagged by the semantic tag system
SELECT
    m.name as method_name,
    m.full_name,
    t.name as tag_name,
    t.value as tag_value,
    t.external_source,
    t.confidence
FROM edges_tagged_by etb
JOIN nodes_method m ON etb.src = m.id
JOIN nodes_tag_v2 t ON etb.dst = t.id
ORDER BY m.name, t.name
LIMIT 100;

Tag Statistics

SELECT
    t.name as tag_name,
    t.external_source,
    COUNT(*) as tagged_count,
    AVG(t.confidence) as avg_confidence
FROM nodes_tag_v2 t
GROUP BY t.name, t.external_source
ORDER BY tagged_count DESC;

Import Analysis

Queries using nodes_import for analyzing dependencies and module structure. This table is populated by GoCPG.

All Imports

SELECT
    filename,
    imported_entity,
    imported_as,
    is_wildcard,
    code
FROM nodes_import
ORDER BY filename, line_number
LIMIT 200;

Wildcard Imports

-- Wildcard imports are a common code quality issue
SELECT
    filename,
    imported_entity,
    code,
    line_number
FROM nodes_import
WHERE is_wildcard = true
ORDER BY filename;

Import Frequency (Most Used Libraries)

SELECT
    imported_entity,
    COUNT(*) as import_count,
    COUNT(DISTINCT filename) as file_count
FROM nodes_import
GROUP BY imported_entity
ORDER BY import_count DESC
LIMIT 30;

Import Dependencies Between Files

-- Which files import entities from which modules
SELECT
    i.filename as importing_file,
    i.imported_entity,
    COUNT(*) as import_count
FROM nodes_import i
GROUP BY i.filename, i.imported_entity
ORDER BY i.filename, import_count DESC
LIMIT 100;

Aliased Imports

SELECT
    filename,
    imported_entity,
    imported_as as alias,
    code
FROM nodes_import
WHERE imported_as IS NOT NULL
ORDER BY filename;

Unused Import Candidates

-- Imports where the imported name is never referenced in identifiers
-- (heuristic; may have false positives for re-exports or side-effect imports)
SELECT
    i.filename,
    i.imported_entity,
    i.imported_as
FROM nodes_import i
WHERE NOT EXISTS (
    SELECT 1
    FROM nodes_identifier id
    WHERE id.filename = i.filename
      AND (id.name = i.imported_as OR id.name = i.imported_entity)
)
AND i.is_wildcard = false
ORDER BY i.filename
LIMIT 50;

ID-Based Joins

The CPG schema provides ID-based foreign keys for efficient, accurate joins. These are faster and more precise than the legacy LIKE '%' || name || '%' string-matching pattern.

Key ID Columns

Table Column Points To
nodes_call containing_method_id nodes_method.id (method containing the call site)
nodes_call callee_method_id nodes_method.id (resolved callee method)
nodes_param method_id nodes_method.id (method owning the parameter)
nodes_identifier containing_method_id nodes_method.id
nodes_literal containing_method_id nodes_method.id
nodes_local containing_method_id nodes_method.id
nodes_return containing_method_id nodes_method.id
nodes_block containing_method_id nodes_method.id
nodes_control_structure containing_method_id nodes_method.id
nodes_comment containing_method_id nodes_method.id

Comparison: ID-Based vs String-Based Joins

ID-based (preferred) – uses indexed BIGINT join:

-- Find all calls made by a method (fast, exact)
SELECT c.name as called_function, c.line_number
FROM nodes_call c
JOIN nodes_method m ON c.containing_method_id = m.id
WHERE m.name = 'process_request';

String-based (legacy) – uses LIKE with string matching:

-- Same query using string matching (slow, may produce false matches)
SELECT c.name as called_function, c.line_number
FROM nodes_call c
JOIN nodes_method m ON c.method_full_name LIKE '%' || m.name || '%'
WHERE m.name = 'process_request';

The ID-based join is indexed (idx_call_containing_method), returns exact results, and runs in constant time. The string-based join requires a full table scan and can produce false positives when method names are substrings of unrelated fully-qualified names.

Parameters by Method (ID-based)

-- All parameters of a method, ordered by position
SELECT
    m.name as method_name,
    p.name as param_name,
    p.type_full_name as param_type,
    p.index as position,
    p.is_variadic
FROM nodes_param p
JOIN nodes_method m ON p.method_id = m.id
WHERE m.name = 'authenticate'
ORDER BY p.index;

Callee Resolution (ID-based)

-- Resolve call sites to callee methods using callee_method_id
SELECT
    caller.name as caller_method,
    c.name as call_site,
    c.line_number,
    callee.name as callee_method,
    callee.filename as callee_file
FROM nodes_call c
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON c.callee_method_id = callee.id
WHERE caller.name = 'main'
ORDER BY c.line_number;

PGQ Prerequisites

DuckDB PGQ (Property Graph Queries) enable intuitive graph traversal using MATCH patterns. Before running PGQ queries, the database must have:

  1. A materialized cpg_nodes table (UNION ALL of all node tables)
  2. A CREATE PROPERTY GRAPH cpg definition

GoCPG databases do not generate cpg_nodes by default. The CodeGraph compatibility layer creates it on demand when PGQ queries are needed. If you are running PGQ queries manually, execute these steps first.

Step 1: Create cpg_nodes

DROP TABLE IF EXISTS cpg_nodes;

CREATE TABLE cpg_nodes AS
SELECT id, 'FILE' as node_type FROM nodes_file
UNION ALL SELECT id, 'NAMESPACE_BLOCK' FROM nodes_namespace_block
UNION ALL SELECT id, 'METHOD' FROM nodes_method
UNION ALL SELECT id, 'METHOD_REF' FROM nodes_method_ref
UNION ALL SELECT id, 'CALL' FROM nodes_call
UNION ALL SELECT id, 'IDENTIFIER' FROM nodes_identifier
UNION ALL SELECT id, 'FIELD_IDENTIFIER' FROM nodes_field_identifier
UNION ALL SELECT id, 'LITERAL' FROM nodes_literal
UNION ALL SELECT id, 'LOCAL' FROM nodes_local
UNION ALL SELECT id, 'PARAM' FROM nodes_param
UNION ALL SELECT id, 'PARAM_OUT' FROM nodes_method_parameter_out
UNION ALL SELECT id, 'METHOD_RETURN' FROM nodes_method_return
UNION ALL SELECT id, 'RETURN' FROM nodes_return
UNION ALL SELECT id, 'BLOCK' FROM nodes_block
UNION ALL SELECT id, 'CONTROL_STRUCTURE' FROM nodes_control_structure
UNION ALL SELECT id, 'MEMBER' FROM nodes_member
UNION ALL SELECT id, 'TYPE_DECL' FROM nodes_type_decl
UNION ALL SELECT id, 'TYPE_REF' FROM nodes_type_ref
UNION ALL SELECT id, 'TYPE_PARAM' FROM nodes_type_param
UNION ALL SELECT id, 'TYPE_ARGUMENT' FROM nodes_type_argument
UNION ALL SELECT id, 'UNKNOWN' FROM nodes_unknown
UNION ALL SELECT id, 'JUMP_TARGET' FROM nodes_jump_target
UNION ALL SELECT id, 'BINDING' FROM nodes_binding
UNION ALL SELECT id, 'CLOSURE_BINDING' FROM nodes_closure_binding
UNION ALL SELECT id, 'COMMENT' FROM nodes_comment
UNION ALL SELECT id, 'COLLECTION_DECL' FROM nodes_collection_decl
UNION ALL SELECT id, 'TYPE' FROM nodes_type
UNION ALL SELECT id, 'MODIFIER' FROM nodes_modifier
UNION ALL SELECT id, 'ANNOTATION' FROM nodes_annotation
UNION ALL SELECT id, 'JUMP_LABEL' FROM nodes_jump_label
UNION ALL SELECT id, 'TAG' FROM nodes_tag_v2
UNION ALL SELECT id, 'FINDING' FROM nodes_finding
UNION ALL SELECT id, 'MACRO' FROM nodes_macro
UNION ALL SELECT id, 'MACRO_PARAM' FROM nodes_macro_param
UNION ALL SELECT id, 'NAMESPACE' FROM nodes_namespace
UNION ALL SELECT id, 'ANNOTATION_LITERAL' FROM nodes_annotation_literal
UNION ALL SELECT id, 'ANNOTATION_PARAMETER' FROM nodes_annotation_parameter
UNION ALL SELECT id, 'ANNOTATION_PARAMETER_ASSIGN' FROM nodes_annotation_parameter_assign
UNION ALL SELECT id, 'KEY_VALUE_PAIR' FROM nodes_key_value_pair
UNION ALL SELECT id, 'LOCATION' FROM nodes_location
UNION ALL SELECT id, 'CONFIG_FILE' FROM nodes_config_file
UNION ALL SELECT id, 'IMPORT' FROM nodes_import
UNION ALL SELECT id, 'METADATA' FROM nodes_metadata;

ALTER TABLE cpg_nodes ADD PRIMARY KEY (id);
CREATE INDEX idx_cpg_nodes_type ON cpg_nodes(node_type);

Step 2: Create Property Graph

CREATE PROPERTY GRAPH cpg
VERTEX TABLES (
    cpg_nodes LABEL CPG_NODE,
    nodes_method LABEL METHOD,
    nodes_call LABEL CALL_NODE,
    nodes_identifier LABEL IDENTIFIER,
    nodes_type_decl LABEL TYPE_DECL
    -- ... (see SCHEMA.md for the full vertex table list)
)
EDGE TABLES (
    edges_ast
        SOURCE KEY (src) REFERENCES cpg_nodes (id)
        DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
        LABEL AST,
    edges_cfg
        SOURCE KEY (src) REFERENCES cpg_nodes (id)
        DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
        LABEL CFG,
    edges_call
        SOURCE KEY (src) REFERENCES nodes_call (id)
        DESTINATION KEY (dst) REFERENCES nodes_method (id)
        LABEL CALLS,
    edges_ref
        SOURCE KEY (src) REFERENCES cpg_nodes (id)
        DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
        LABEL REF,
    edges_reaching_def
        SOURCE KEY (src) REFERENCES cpg_nodes (id)
        DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
        LABEL REACHING_DEF,
    edges_inherits_from
        SOURCE KEY (src) REFERENCES nodes_type_decl (id)
        DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
        LABEL INHERITS_FROM
    -- ... (see SCHEMA.md for the full edge table list)
);

For the complete property graph definition with all vertex and edge tables, see SCHEMA.md section “Property Graph Definition”.

SQL/PGQ Graph Queries

DuckDB’s SQL/PGQ extension enables intuitive graph traversal queries. All queries in this section require the PGQ prerequisites to be set up first.

Call Graph Traversal

-- Find all methods called by main (direct calls)
-- CALLS edges go from CALL_NODE to METHOD, so we traverse through call sites
SELECT *
FROM GRAPH_TABLE(cpg
    MATCH (caller:METHOD)-[:CALLS]-(call:CALL_NODE)-[:CALLS]->(callee:METHOD)
    WHERE caller.name = 'main'
    COLUMNS (caller.full_name AS caller_name, callee.full_name AS callee_name)
);

Data Flow Analysis

-- Track data flow through reaching definitions
-- REACHING_DEF edges are polymorphic (between any CPG node types)
SELECT *
FROM GRAPH_TABLE(cpg
    MATCH (source:CPG_NODE)-[:REACHING_DEF*1..5]->(sink:CPG_NODE)
    COLUMNS (source.id, sink.id)
)
LIMIT 100;

AST Traversal

-- Find all nodes under a method in AST
FROM GRAPH_TABLE(cpg
    MATCH (method:METHOD)-[:AST*1..5]->(child:CPG_NODE)
    WHERE method.name = 'authenticate'
    COLUMNS (
        method.name AS method_name,
        child.id AS child_id,
        child.node_type
    )
)
LIMIT 100;

CFG Traversal

-- Follow control flow graph edges
-- CFG edges are polymorphic (between any CPG node types)
SELECT *
FROM GRAPH_TABLE(cpg
    MATCH (start:CPG_NODE)-[:CFG*1..3]->(end_node:CPG_NODE)
    COLUMNS (start.id AS start_node, end_node.id AS end_node)
)
LIMIT 100;

Type Hierarchy

-- Find inheritance relationships
-- INHERITS_FROM goes from TYPE_DECL to cpg_nodes (not nodes_type directly)
SELECT *
FROM GRAPH_TABLE(cpg
    MATCH (derived:TYPE_DECL)-[:INHERITS_FROM]->(base:CPG_NODE)
    COLUMNS (derived.full_name AS derived_type, base.id AS base_type)
);

Identifier References

-- Find all identifiers and what they reference
SELECT *
FROM GRAPH_TABLE(cpg
    MATCH (id:IDENTIFIER)-[:REF]->(decl:CPG_NODE)
    COLUMNS (id.name AS identifier_name, decl.id AS declaration_id)
);

Methods with Most Incoming Calls

SELECT m.full_name, COUNT(*) as call_count
FROM GRAPH_TABLE(cpg
    MATCH (c:CALL_NODE)-[:CALLS]->(m:METHOD)
    COLUMNS (m.full_name, m.id)
)
GROUP BY m.full_name
ORDER BY call_count DESC
LIMIT 10;

Usage Tips

  1. Always add LIMIT to prevent returning millions of rows
  2. Use specific columns instead of SELECT *
  3. Test on small datasets first
  4. Add WHERE clauses to filter irrelevant results
  5. Use EXPLAIN to check query plans
  6. Monitor execution time (should be < 100ms for most queries)
  7. Prefer ID-based joins (containing_method_id, method_id, callee_method_id) over string matching
  8. Use pre-computed metrics (cyclomatic_complexity, fan_in, fan_out) instead of computing them at query time

Performance Notes

  • Simple lookups: 1-3 ms
  • JOINs (1-2 tables): 2-5 ms
  • Aggregations: 4-8 ms
  • Recursive CTEs: 10-50 ms (depends on depth)
  • PGQ queries: 5-50 ms (depends on path length and graph size)

All queries tested on sample database with 5 methods, 4 calls. Performance scales sub-linearly with data size due to indexes.

ID-based joins (containing_method_id = m.id) are significantly faster than string-based joins (method_full_name LIKE '%' || m.name || '%') because they use indexed BIGINT comparisons instead of full-text scans.


Next Steps

  • Try these queries on your CPG database
  • Modify parameters to match your needs
  • Combine patterns for complex analysis
  • Monitor performance and optimize as needed
  • Add your own patterns to this cookbook!

See Also