A practical collection of SQL queries for common code analysis tasks on DuckDB CPG.
Table of Contents¶
- Method Queries
- Call Analysis
- Control Flow
- Data Flow
- File Analysis
- Security Patterns
- Code Quality
- Statistics
- Pre-computed Metrics (v6.0)
- Advanced Patterns
- OOP Analysis
- Reference and Containment
- Static Analysis Results
- Import Analysis
- ID-Based Joins
- PGQ Prerequisites
- SQL/PGQ Graph Queries
- Usage Tips
- Performance Notes
- Next Steps
- See Also
Method Queries¶
Find Method by Exact Name¶
SELECT id, name, full_name, filename, line_number, signature
FROM nodes_method
WHERE name = 'authenticate'
LIMIT 10;
Find Methods by Pattern¶
SELECT id, name, full_name, filename, line_number
FROM nodes_method
WHERE name LIKE '%process%'
ORDER BY name
LIMIT 50;
Methods by Signature Pattern¶
SELECT name, full_name, signature, filename
FROM nodes_method
WHERE signature LIKE '%char*%' -- Methods taking char* parameter
ORDER BY name
LIMIT 100;
External vs Internal Methods¶
-- External methods (from libraries)
SELECT name, full_name, COUNT(*) as count
FROM nodes_method
WHERE is_external = true
GROUP BY name, full_name
ORDER BY count DESC
LIMIT 20;
-- Internal methods (your code)
SELECT filename, COUNT(*) as method_count
FROM nodes_method
WHERE is_external = false
GROUP BY filename
ORDER BY method_count DESC
LIMIT 20;
Methods by Line Count¶
SELECT
name,
filename,
line_number,
line_number_end,
(line_number_end - line_number) as lines_of_code
FROM nodes_method
WHERE line_number_end IS NOT NULL
AND (line_number_end - line_number) > 50 -- Long methods
ORDER BY lines_of_code DESC
LIMIT 20;
Call Analysis¶
Direct Callees (What does X call?)¶
SELECT DISTINCT
callee.name,
callee.full_name,
callee.filename,
callee.line_number
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON ec.dst = callee.id
WHERE caller.name = 'main'
ORDER BY callee.name
LIMIT 100;
Direct Callers (Who calls X?)¶
SELECT DISTINCT
caller.name,
caller.full_name,
caller.filename,
caller.line_number
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method callee ON ec.dst = callee.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
WHERE callee.name = 'malloc'
ORDER BY caller.name
LIMIT 100;
Call Chain (Transitive Callees)¶
WITH RECURSIVE call_chain AS (
-- Base: direct calls from starting method
SELECT
ec.dst as method_id,
1 as depth,
m_start.name as start_method
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method m_start ON c.containing_method_id = m_start.id
WHERE m_start.name = 'main'
UNION ALL
-- Recursive: follow the chain
SELECT
ec2.dst,
cc.depth + 1,
cc.start_method
FROM edges_call ec2
JOIN nodes_call c2 ON ec2.src = c2.id
JOIN nodes_method m2 ON c2.containing_method_id = m2.id
JOIN call_chain cc ON m2.id = cc.method_id
WHERE cc.depth < 5
)
SELECT DISTINCT
m.name,
m.full_name,
MIN(cc.depth) as min_depth
FROM call_chain cc
JOIN nodes_method m ON cc.method_id = m.id
GROUP BY m.id, m.name, m.full_name
ORDER BY min_depth, m.name
LIMIT 100;
Top Callers (Methods Making Most Calls)¶
SELECT
m.name,
m.full_name,
m.filename,
COUNT(DISTINCT c.id) as outgoing_calls
FROM nodes_method m
LEFT JOIN nodes_call c ON c.containing_method_id = m.id
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY outgoing_calls DESC
LIMIT 20;
Top Callees (Most Called Methods)¶
SELECT
m.name,
m.full_name,
m.filename,
COUNT(ec.src) as incoming_calls
FROM nodes_method m
LEFT JOIN edges_call ec ON m.id = ec.dst
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY incoming_calls DESC
LIMIT 20;
Call Pairs (Frequent Caller-Callee Patterns)¶
SELECT
caller.name as caller,
callee.name as callee,
COUNT(*) as frequency
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON ec.dst = callee.id
GROUP BY caller.name, callee.name
ORDER BY frequency DESC
LIMIT 50;
Leaf Methods (No Outgoing Calls)¶
SELECT
m.name,
m.full_name,
m.filename
FROM nodes_method m
WHERE NOT EXISTS (
SELECT 1
FROM nodes_call c
WHERE c.containing_method_id = m.id
)
AND m.is_external = false
ORDER BY m.name
LIMIT 100;
Root Methods (Not Called by Anyone)¶
SELECT
m.name,
m.full_name,
m.filename,
m.line_number
FROM nodes_method m
WHERE NOT EXISTS (
SELECT 1
FROM edges_call ec
WHERE ec.dst = m.id
)
AND m.is_external = false
ORDER BY m.filename, m.line_number
LIMIT 100;
Control Flow¶
Methods with Branches¶
-- Find methods containing IF/SWITCH structures via AST
SELECT DISTINCT
m.name,
m.full_name,
m.filename,
COUNT(DISTINCT cs.id) as branch_count
FROM nodes_method m
JOIN edges_ast ast ON m.id = ast.src
JOIN nodes_control_structure cs ON ast.dst = cs.id
WHERE cs.control_structure_type IN ('IF', 'SWITCH')
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY branch_count DESC
LIMIT 20;
Methods with Loops¶
-- Find methods containing loop structures via AST
SELECT DISTINCT
m.name,
m.full_name,
m.filename,
COUNT(DISTINCT cs.id) as loop_count
FROM nodes_method m
JOIN edges_ast ast ON m.id = ast.src
JOIN nodes_control_structure cs ON ast.dst = cs.id
WHERE cs.control_structure_type IN ('WHILE', 'FOR', 'DO')
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY loop_count DESC
LIMIT 20;
Cyclomatic Complexity Approximation¶
-- Approximate complexity by counting control structures
-- NOTE: For GoCPG databases, prefer the pre-computed
-- cyclomatic_complexity column on nodes_method (see "Pre-computed Metrics")
SELECT
m.name,
m.filename,
1 + COUNT(DISTINCT cs.id) as approx_complexity
FROM nodes_method m
LEFT JOIN edges_ast ast ON m.id = ast.src
LEFT JOIN nodes_control_structure cs ON ast.dst = cs.id
WHERE m.is_external = false
GROUP BY m.id, m.name, m.filename
ORDER BY approx_complexity DESC
LIMIT 30;
Control Flow Graph Traversal (SQL/PGQ)¶
-- Follow CFG edges using property graph (requires PGQ prerequisites)
SELECT *
FROM GRAPH_TABLE(cpg
MATCH (start:CPG_NODE)-[:CFG*1..3]->(end_node:CPG_NODE)
COLUMNS (start.id AS start_node, end_node.id AS end_node)
)
LIMIT 100;
Data Flow¶
Variable Flow Paths¶
WITH RECURSIVE var_flow AS (
-- Base: initial definitions
SELECT src, dst, variable, 1 as depth
FROM edges_reaching_def
WHERE variable = 'password'
UNION ALL
-- Recursive: follow the flow
SELECT erd.src, erd.dst, erd.variable, vf.depth + 1
FROM edges_reaching_def erd
JOIN var_flow vf ON erd.src = vf.dst
WHERE vf.depth < 10
)
SELECT
variable,
COUNT(DISTINCT src) as definition_points,
COUNT(DISTINCT dst) as use_points,
MAX(depth) as max_depth
FROM var_flow
GROUP BY variable;
Tainted Parameters¶
-- Methods receiving tainted input
SELECT DISTINCT
m.name,
m.full_name,
p.name as parameter_name,
p.type_full_name as parameter_type
FROM nodes_method m
JOIN nodes_param p ON p.method_id = m.id
WHERE p.name IN ('userInput', 'request', 'input', 'data')
ORDER BY m.name
LIMIT 50;
File Analysis¶
Methods Per File¶
SELECT
filename,
COUNT(*) as method_count,
SUM(CASE WHEN is_external THEN 1 ELSE 0 END) as external_methods,
SUM(CASE WHEN is_external THEN 0 ELSE 1 END) as internal_methods
FROM nodes_method
GROUP BY filename
ORDER BY method_count DESC
LIMIT 30;
Files by Call Activity¶
SELECT
m.filename,
COUNT(DISTINCT c.id) as total_calls,
COUNT(DISTINCT c.name) as unique_calls
FROM nodes_method m
JOIN nodes_call c ON c.containing_method_id = m.id
GROUP BY m.filename
ORDER BY total_calls DESC
LIMIT 20;
Cross-File Calls¶
SELECT
caller.filename as from_file,
callee.filename as to_file,
COUNT(*) as call_count
FROM edges_call ec
JOIN nodes_call c ON ec.src = c.id
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON ec.dst = callee.id
WHERE caller.filename != callee.filename
GROUP BY caller.filename, callee.filename
ORDER BY call_count DESC
LIMIT 50;
Security Patterns¶
Memory Management Issues¶
-- Methods that malloc but don't free
SELECT DISTINCT m.name, m.full_name
FROM nodes_method m
WHERE EXISTS (
SELECT 1 FROM nodes_call c
WHERE c.containing_method_id = m.id
AND c.name = 'malloc'
)
AND NOT EXISTS (
SELECT 1 FROM nodes_call c
WHERE c.containing_method_id = m.id
AND c.name = 'free'
)
LIMIT 50;
Dangerous Function Calls¶
SELECT
m.name as caller,
m.filename,
c.name as dangerous_function,
m.line_number
FROM nodes_call c
JOIN nodes_method m ON c.containing_method_id = m.id
WHERE c.name IN ('strcpy', 'sprintf', 'gets', 'system', 'eval')
ORDER BY m.filename, m.line_number
LIMIT 100;
SQL Injection Candidates¶
SELECT DISTINCT
m.name,
m.full_name,
m.filename
FROM nodes_method m
WHERE EXISTS (
-- Has SQL-related calls
SELECT 1 FROM nodes_call c
WHERE c.containing_method_id = m.id
AND (c.name LIKE '%query%' OR c.name LIKE '%execute%')
)
AND EXISTS (
-- Receives user input
SELECT 1 FROM nodes_param p
WHERE p.method_id = m.id
AND (p.name LIKE '%input%' OR p.name LIKE '%request%')
)
LIMIT 50;
Authentication Bypass Candidates¶
SELECT
m.name,
m.full_name,
m.filename
FROM nodes_method m
WHERE m.name LIKE '%auth%'
OR m.name LIKE '%login%'
OR m.name LIKE '%verify%'
ORDER BY m.name
LIMIT 100;
Code Quality¶
God Methods (Too Long)¶
SELECT
name,
filename,
line_number,
line_number_end,
(line_number_end - line_number) as loc
FROM nodes_method
WHERE line_number_end IS NOT NULL
AND (line_number_end - line_number) > 100
ORDER BY loc DESC
LIMIT 20;
High Fan-Out (Too Many Calls)¶
SELECT
m.name,
m.filename,
COUNT(DISTINCT c.id) as fan_out
FROM nodes_method m
JOIN nodes_call c ON c.containing_method_id = m.id
GROUP BY m.id, m.name, m.filename
HAVING COUNT(DISTINCT c.id) > 10
ORDER BY fan_out DESC
LIMIT 20;
High Fan-In (Too Many Callers)¶
SELECT
m.name,
m.filename,
COUNT(DISTINCT ec.src) as fan_in
FROM nodes_method m
JOIN edges_call ec ON m.id = ec.dst
GROUP BY m.id, m.name, m.filename
HAVING COUNT(DISTINCT ec.src) > 10
ORDER BY fan_in DESC
LIMIT 20;
Unused Methods (Potential Dead Code)¶
SELECT
m.name,
m.full_name,
m.filename
FROM nodes_method m
WHERE m.is_external = false
AND NOT EXISTS (
SELECT 1 FROM edges_call ec WHERE ec.dst = m.id
)
AND m.name != 'main'
AND m.name NOT LIKE 'test%' -- Exclude entry points and tests
ORDER BY m.filename, m.name
LIMIT 100;
Statistics¶
CPG Overview¶
SELECT
(SELECT COUNT(*) FROM nodes_method) as total_methods,
(SELECT COUNT(*) FROM nodes_method WHERE is_external = false) as internal_methods,
(SELECT COUNT(*) FROM nodes_call) as total_calls,
(SELECT COUNT(*) FROM edges_call) as call_edges,
(SELECT COUNT(DISTINCT filename) FROM nodes_method) as file_count;
Call Distribution¶
SELECT
call_count_bucket,
COUNT(*) as methods_in_bucket
FROM (
SELECT
m.id,
CASE
WHEN call_count = 0 THEN '0 calls'
WHEN call_count <= 5 THEN '1-5 calls'
WHEN call_count <= 10 THEN '6-10 calls'
WHEN call_count <= 20 THEN '11-20 calls'
ELSE '20+ calls'
END as call_count_bucket
FROM nodes_method m
LEFT JOIN (
SELECT
c.containing_method_id as id,
COUNT(*) as call_count
FROM nodes_call c
WHERE c.containing_method_id IS NOT NULL
GROUP BY c.containing_method_id
) counts ON m.id = counts.id
)
GROUP BY call_count_bucket
ORDER BY
CASE call_count_bucket
WHEN '0 calls' THEN 1
WHEN '1-5 calls' THEN 2
WHEN '6-10 calls' THEN 3
WHEN '11-20 calls' THEN 4
ELSE 5
END;
Language Distribution (by filename extension)¶
SELECT
CASE
WHEN filename LIKE '%.c' THEN 'C'
WHEN filename LIKE '%.cpp' OR filename LIKE '%.cc' THEN 'C++'
WHEN filename LIKE '%.java' THEN 'Java'
WHEN filename LIKE '%.py' THEN 'Python'
WHEN filename LIKE '%.js' THEN 'JavaScript'
ELSE 'Other'
END as language,
COUNT(*) as method_count
FROM nodes_method
WHERE is_external = false
GROUP BY language
ORDER BY method_count DESC;
Method Signature Complexity¶
SELECT
m.name,
m.signature,
LENGTH(m.signature) as sig_length,
(LENGTH(m.signature) - LENGTH(REPLACE(m.signature, ',', ''))) + 1 as param_count
FROM nodes_method m
WHERE m.signature IS NOT NULL
AND m.is_external = false
ORDER BY param_count DESC, sig_length DESC
LIMIT 30;
Pre-computed Metrics (v6.0)¶
GoCPG v6.0+ pre-computes metrics and pattern flags directly on nodes_method,
eliminating the need for expensive JOIN-based approximations. These columns are
populated during CPG generation.
Complexity Hotspots¶
-- Methods with highest cyclomatic complexity
SELECT
name,
full_name,
filename,
cyclomatic_complexity,
fan_in,
fan_out,
(line_number_end - line_number) as loc
FROM nodes_method
WHERE is_external = false
AND cyclomatic_complexity > 10
ORDER BY cyclomatic_complexity DESC
LIMIT 30;
High Fan-In / Fan-Out (Pre-computed)¶
-- Hub methods: high fan-in AND fan-out (architectural risk)
SELECT
name,
full_name,
filename,
fan_in,
fan_out,
cyclomatic_complexity
FROM nodes_method
WHERE is_external = false
AND fan_in > 5
AND fan_out > 5
ORDER BY (fan_in + fan_out) DESC
LIMIT 20;
Test vs Production Code¶
-- Test coverage distribution
SELECT
is_test,
COUNT(*) as method_count,
AVG(cyclomatic_complexity) as avg_complexity,
AVG(fan_out) as avg_fan_out
FROM nodes_method
WHERE is_external = false
GROUP BY is_test;
-- Find public API entry points with high complexity
SELECT full_name, cyclomatic_complexity, fan_in
FROM nodes_method
WHERE is_entry_point = TRUE AND cyclomatic_complexity > 10
ORDER BY cyclomatic_complexity DESC;
Code Hygiene Flags¶
-- Methods with TODO/FIXME comments
SELECT name, full_name, filename
FROM nodes_method
WHERE has_todo_fixme = true AND is_external = false
ORDER BY filename, name;
-- Deprecated methods still being called
SELECT
m.name,
m.full_name,
m.filename,
COUNT(ec.src) as call_count
FROM nodes_method m
JOIN edges_call ec ON m.id = ec.dst
WHERE m.has_deprecated = true
GROUP BY m.id, m.name, m.full_name, m.filename
ORDER BY call_count DESC;
-- Methods with debug code left in
SELECT name, full_name, filename
FROM nodes_method
WHERE has_debug_code = true AND is_external = false
ORDER BY filename;
-- Methods with disabled code blocks (#if 0, if false, etc.)
SELECT name, full_name, filename
FROM nodes_method
WHERE has_disabled_code = true AND is_external = false
ORDER BY filename;
Nested Method Detection¶
-- Find nested/inner functions (closures, lambdas, inner classes)
SELECT
name,
full_name,
filename,
cyclomatic_complexity
FROM nodes_method
WHERE is_nested = true AND is_external = false
ORDER BY cyclomatic_complexity DESC
LIMIT 50;
Advanced Patterns¶
Find Method Chains (A -> B -> C)¶
WITH chain AS (
SELECT
caller.name as method_a,
intermediate.name as method_b,
callee.name as method_c
FROM edges_call ec1
JOIN nodes_call c1 ON ec1.src = c1.id
JOIN nodes_method caller ON c1.containing_method_id = caller.id
JOIN nodes_method intermediate ON ec1.dst = intermediate.id
JOIN nodes_call c2 ON c2.containing_method_id = intermediate.id
JOIN edges_call ec2 ON c2.id = ec2.src
JOIN nodes_method callee ON ec2.dst = callee.id
WHERE caller.name = 'main'
)
SELECT DISTINCT method_a, method_b, method_c
FROM chain
LIMIT 100;
Connected Components (Method Clusters)¶
-- Find methods in same call graph
WITH RECURSIVE component AS (
SELECT id, name, id as root
FROM nodes_method
WHERE name = 'main'
UNION
SELECT m.id, m.name, c.root
FROM nodes_method m
JOIN edges_call ec ON m.id = ec.dst
JOIN nodes_call nc ON nc.id = ec.src
JOIN component c ON nc.containing_method_id = c.id
)
SELECT
root,
COUNT(DISTINCT id) as component_size,
STRING_AGG(DISTINCT name, ', ') as methods
FROM component
GROUP BY root
ORDER BY component_size DESC
LIMIT 10;
OOP Analysis¶
Queries using nodes_type_decl, nodes_member, and edges_inherits_from for
object-oriented code analysis.
Find All Classes/Types¶
SELECT
td.name,
td.full_name,
td.filename,
td.line_number,
td.is_external,
td.inherits_from_type_full_name,
td.alias_type_full_name
FROM nodes_type_decl td
WHERE td.is_external = false
ORDER BY td.filename, td.line_number
LIMIT 100;
Class Members (Fields)¶
-- All fields of a specific class
SELECT
td.name as class_name,
mb.name as field_name,
mb.type_full_name as field_type,
mb.code,
mb.line_number
FROM nodes_type_decl td
JOIN edges_ast ea ON td.id = ea.src
JOIN nodes_member mb ON ea.dst = mb.id
WHERE td.name = 'Point'
ORDER BY mb.order_index;
Classes with Most Members¶
SELECT
td.name as class_name,
td.full_name,
td.filename,
COUNT(mb.id) as member_count
FROM nodes_type_decl td
JOIN edges_ast ea ON td.id = ea.src
JOIN nodes_member mb ON ea.dst = mb.id
WHERE td.is_external = false
GROUP BY td.id, td.name, td.full_name, td.filename
ORDER BY member_count DESC
LIMIT 20;
Inheritance Hierarchy (SQL version)¶
-- Direct inheritance relationships
SELECT
td.name as derived_class,
td.full_name as derived_full_name,
td.inherits_from_type_full_name as base_types,
td.filename
FROM nodes_type_decl td
WHERE td.inherits_from_type_full_name IS NOT NULL
AND LEN(td.inherits_from_type_full_name) > 0
ORDER BY td.name;
Inheritance via Edge Table¶
-- Using edges_inherits_from for resolved hierarchy
SELECT
derived.name as derived_class,
derived.full_name as derived_full_name,
t.name as base_type_name,
t.full_name as base_type_full_name
FROM edges_inherits_from eif
JOIN nodes_type_decl derived ON eif.src = derived.id
JOIN nodes_type t ON eif.dst = t.id
ORDER BY derived.name;
Type Aliases¶
-- Find type aliases (typedef, using, type alias)
SELECT
td.name as alias_name,
td.full_name as alias_full_name,
td.alias_type_full_name as actual_type,
td.filename
FROM nodes_type_decl td
WHERE td.alias_type_full_name IS NOT NULL
ORDER BY td.name;
Classes Without Methods (Data-Only Types)¶
-- Structs/classes that have members but no methods defined on them
SELECT
td.name,
td.full_name,
td.filename,
COUNT(DISTINCT mb.id) as field_count
FROM nodes_type_decl td
JOIN edges_ast ea ON td.id = ea.src
JOIN nodes_member mb ON ea.dst = mb.id
WHERE td.is_external = false
AND NOT EXISTS (
SELECT 1 FROM nodes_method m
WHERE m.ast_parent_full_name = td.full_name
)
GROUP BY td.id, td.name, td.full_name, td.filename
ORDER BY field_count DESC
LIMIT 20;
Reference and Containment¶
Queries using edges_ref, edges_contains, and related edge tables for
structural analysis.
Variable References¶
-- Find all references to a specific variable/identifier
SELECT
i.name as identifier,
i.filename,
i.line_number,
i.type_full_name
FROM nodes_identifier i
JOIN edges_ref er ON i.id = er.src
WHERE i.name = 'config'
ORDER BY i.filename, i.line_number
LIMIT 100;
Identifier to Declaration¶
-- Resolve identifiers to their declarations (locals, params, methods)
SELECT
i.name as identifier,
i.filename as use_file,
i.line_number as use_line,
l.name as local_name,
l.type_full_name as local_type
FROM nodes_identifier i
JOIN edges_ref er ON i.id = er.src
JOIN nodes_local l ON er.dst = l.id
WHERE i.name = 'result'
LIMIT 50;
Method Containment¶
-- All nodes contained within a specific method
SELECT
ec.dst as node_id,
cn.node_type
FROM edges_contains ec
JOIN cpg_nodes cn ON ec.dst = cn.id
WHERE ec.src = (
SELECT id FROM nodes_method WHERE name = 'main' LIMIT 1
)
LIMIT 100;
Call Sites in a Method (via edges_contains)¶
-- Find all call sites within a method using containment edges
SELECT
m.name as method_name,
c.name as called_function,
c.line_number,
c.code
FROM nodes_method m
JOIN edges_contains ec ON m.id = ec.src
JOIN nodes_call c ON ec.dst = c.id
WHERE m.name = 'process_request'
ORDER BY c.line_number
LIMIT 50;
Methods Referencing a Type¶
-- Find methods that reference a given type declaration
SELECT DISTINCT
m.name as method_name,
m.full_name,
m.filename
FROM nodes_identifier i
JOIN edges_ref er ON i.id = er.src
JOIN nodes_type_decl td ON er.dst = td.id
JOIN nodes_method m ON i.containing_method_id = m.id
WHERE td.name = 'Connection'
ORDER BY m.name
LIMIT 50;
Static Analysis Results¶
Queries using nodes_finding and nodes_tag_v2 for working with static
analysis findings and semantic tags. These tables are populated by GoCPG.
All Findings by Severity¶
SELECT
severity,
category,
COUNT(*) as finding_count
FROM nodes_finding
WHERE status = 'open'
GROUP BY severity, category
ORDER BY
CASE severity
WHEN 'error' THEN 1
WHEN 'warning' THEN 2
WHEN 'info' THEN 3
WHEN 'hint' THEN 4
ELSE 5
END,
category;
Findings by CWE / Rule ID¶
-- Findings grouped by rule ID (often maps to CWE)
SELECT
rule_id,
severity,
COUNT(*) as count,
STRING_AGG(DISTINCT title, '; ') as titles
FROM nodes_finding
WHERE status = 'open'
AND rule_id IS NOT NULL
GROUP BY rule_id, severity
ORDER BY count DESC
LIMIT 30;
Finding Details with Evidence¶
-- Detailed findings with their key-value evidence
SELECT
f.title,
f.severity,
f.category,
f.description,
f.source,
kv.key as evidence_key,
kv.value as evidence_value
FROM nodes_finding f
LEFT JOIN edges_ast ea ON f.id = ea.src
LEFT JOIN nodes_key_value_pair kv ON ea.dst = kv.id
WHERE f.severity = 'error'
ORDER BY f.id
LIMIT 50;
Security Findings¶
SELECT
title,
description,
rule_id,
confidence,
source
FROM nodes_finding
WHERE category = 'security'
AND status = 'open'
ORDER BY
CASE severity WHEN 'error' THEN 1 WHEN 'warning' THEN 2 ELSE 3 END,
confidence DESC
LIMIT 50;
Tagged Methods¶
-- Methods tagged by the semantic tag system
SELECT
m.name as method_name,
m.full_name,
t.name as tag_name,
t.value as tag_value,
t.external_source,
t.confidence
FROM edges_tagged_by etb
JOIN nodes_method m ON etb.src = m.id
JOIN nodes_tag_v2 t ON etb.dst = t.id
ORDER BY m.name, t.name
LIMIT 100;
Tag Statistics¶
SELECT
t.name as tag_name,
t.external_source,
COUNT(*) as tagged_count,
AVG(t.confidence) as avg_confidence
FROM nodes_tag_v2 t
GROUP BY t.name, t.external_source
ORDER BY tagged_count DESC;
Import Analysis¶
Queries using nodes_import for analyzing dependencies and module structure.
This table is populated by GoCPG.
All Imports¶
SELECT
filename,
imported_entity,
imported_as,
is_wildcard,
code
FROM nodes_import
ORDER BY filename, line_number
LIMIT 200;
Wildcard Imports¶
-- Wildcard imports are a common code quality issue
SELECT
filename,
imported_entity,
code,
line_number
FROM nodes_import
WHERE is_wildcard = true
ORDER BY filename;
Import Frequency (Most Used Libraries)¶
SELECT
imported_entity,
COUNT(*) as import_count,
COUNT(DISTINCT filename) as file_count
FROM nodes_import
GROUP BY imported_entity
ORDER BY import_count DESC
LIMIT 30;
Import Dependencies Between Files¶
-- Which files import entities from which modules
SELECT
i.filename as importing_file,
i.imported_entity,
COUNT(*) as import_count
FROM nodes_import i
GROUP BY i.filename, i.imported_entity
ORDER BY i.filename, import_count DESC
LIMIT 100;
Aliased Imports¶
SELECT
filename,
imported_entity,
imported_as as alias,
code
FROM nodes_import
WHERE imported_as IS NOT NULL
ORDER BY filename;
Unused Import Candidates¶
-- Imports where the imported name is never referenced in identifiers
-- (heuristic; may have false positives for re-exports or side-effect imports)
SELECT
i.filename,
i.imported_entity,
i.imported_as
FROM nodes_import i
WHERE NOT EXISTS (
SELECT 1
FROM nodes_identifier id
WHERE id.filename = i.filename
AND (id.name = i.imported_as OR id.name = i.imported_entity)
)
AND i.is_wildcard = false
ORDER BY i.filename
LIMIT 50;
ID-Based Joins¶
The CPG schema provides ID-based foreign keys for efficient, accurate joins.
These are faster and more precise than the legacy LIKE '%' || name || '%'
string-matching pattern.
Key ID Columns¶
| Table | Column | Points To |
|---|---|---|
nodes_call |
containing_method_id |
nodes_method.id (method containing the call site) |
nodes_call |
callee_method_id |
nodes_method.id (resolved callee method) |
nodes_param |
method_id |
nodes_method.id (method owning the parameter) |
nodes_identifier |
containing_method_id |
nodes_method.id |
nodes_literal |
containing_method_id |
nodes_method.id |
nodes_local |
containing_method_id |
nodes_method.id |
nodes_return |
containing_method_id |
nodes_method.id |
nodes_block |
containing_method_id |
nodes_method.id |
nodes_control_structure |
containing_method_id |
nodes_method.id |
nodes_comment |
containing_method_id |
nodes_method.id |
Comparison: ID-Based vs String-Based Joins¶
ID-based (preferred) – uses indexed BIGINT join:
-- Find all calls made by a method (fast, exact)
SELECT c.name as called_function, c.line_number
FROM nodes_call c
JOIN nodes_method m ON c.containing_method_id = m.id
WHERE m.name = 'process_request';
String-based (legacy) – uses LIKE with string matching:
-- Same query using string matching (slow, may produce false matches)
SELECT c.name as called_function, c.line_number
FROM nodes_call c
JOIN nodes_method m ON c.method_full_name LIKE '%' || m.name || '%'
WHERE m.name = 'process_request';
The ID-based join is indexed (idx_call_containing_method), returns exact
results, and runs in constant time. The string-based join requires a full
table scan and can produce false positives when method names are substrings
of unrelated fully-qualified names.
Parameters by Method (ID-based)¶
-- All parameters of a method, ordered by position
SELECT
m.name as method_name,
p.name as param_name,
p.type_full_name as param_type,
p.index as position,
p.is_variadic
FROM nodes_param p
JOIN nodes_method m ON p.method_id = m.id
WHERE m.name = 'authenticate'
ORDER BY p.index;
Callee Resolution (ID-based)¶
-- Resolve call sites to callee methods using callee_method_id
SELECT
caller.name as caller_method,
c.name as call_site,
c.line_number,
callee.name as callee_method,
callee.filename as callee_file
FROM nodes_call c
JOIN nodes_method caller ON c.containing_method_id = caller.id
JOIN nodes_method callee ON c.callee_method_id = callee.id
WHERE caller.name = 'main'
ORDER BY c.line_number;
PGQ Prerequisites¶
DuckDB PGQ (Property Graph Queries) enable intuitive graph traversal using
MATCH patterns. Before running PGQ queries, the database must have:
- A materialized
cpg_nodestable (UNION ALL of all node tables) - A
CREATE PROPERTY GRAPH cpgdefinition
GoCPG databases do not generate cpg_nodes by default. The CodeGraph
compatibility layer creates it on demand when PGQ queries are needed. If you
are running PGQ queries manually, execute these steps first.
Step 1: Create cpg_nodes¶
DROP TABLE IF EXISTS cpg_nodes;
CREATE TABLE cpg_nodes AS
SELECT id, 'FILE' as node_type FROM nodes_file
UNION ALL SELECT id, 'NAMESPACE_BLOCK' FROM nodes_namespace_block
UNION ALL SELECT id, 'METHOD' FROM nodes_method
UNION ALL SELECT id, 'METHOD_REF' FROM nodes_method_ref
UNION ALL SELECT id, 'CALL' FROM nodes_call
UNION ALL SELECT id, 'IDENTIFIER' FROM nodes_identifier
UNION ALL SELECT id, 'FIELD_IDENTIFIER' FROM nodes_field_identifier
UNION ALL SELECT id, 'LITERAL' FROM nodes_literal
UNION ALL SELECT id, 'LOCAL' FROM nodes_local
UNION ALL SELECT id, 'PARAM' FROM nodes_param
UNION ALL SELECT id, 'PARAM_OUT' FROM nodes_method_parameter_out
UNION ALL SELECT id, 'METHOD_RETURN' FROM nodes_method_return
UNION ALL SELECT id, 'RETURN' FROM nodes_return
UNION ALL SELECT id, 'BLOCK' FROM nodes_block
UNION ALL SELECT id, 'CONTROL_STRUCTURE' FROM nodes_control_structure
UNION ALL SELECT id, 'MEMBER' FROM nodes_member
UNION ALL SELECT id, 'TYPE_DECL' FROM nodes_type_decl
UNION ALL SELECT id, 'TYPE_REF' FROM nodes_type_ref
UNION ALL SELECT id, 'TYPE_PARAM' FROM nodes_type_param
UNION ALL SELECT id, 'TYPE_ARGUMENT' FROM nodes_type_argument
UNION ALL SELECT id, 'UNKNOWN' FROM nodes_unknown
UNION ALL SELECT id, 'JUMP_TARGET' FROM nodes_jump_target
UNION ALL SELECT id, 'BINDING' FROM nodes_binding
UNION ALL SELECT id, 'CLOSURE_BINDING' FROM nodes_closure_binding
UNION ALL SELECT id, 'COMMENT' FROM nodes_comment
UNION ALL SELECT id, 'COLLECTION_DECL' FROM nodes_collection_decl
UNION ALL SELECT id, 'TYPE' FROM nodes_type
UNION ALL SELECT id, 'MODIFIER' FROM nodes_modifier
UNION ALL SELECT id, 'ANNOTATION' FROM nodes_annotation
UNION ALL SELECT id, 'JUMP_LABEL' FROM nodes_jump_label
UNION ALL SELECT id, 'TAG' FROM nodes_tag_v2
UNION ALL SELECT id, 'FINDING' FROM nodes_finding
UNION ALL SELECT id, 'MACRO' FROM nodes_macro
UNION ALL SELECT id, 'MACRO_PARAM' FROM nodes_macro_param
UNION ALL SELECT id, 'NAMESPACE' FROM nodes_namespace
UNION ALL SELECT id, 'ANNOTATION_LITERAL' FROM nodes_annotation_literal
UNION ALL SELECT id, 'ANNOTATION_PARAMETER' FROM nodes_annotation_parameter
UNION ALL SELECT id, 'ANNOTATION_PARAMETER_ASSIGN' FROM nodes_annotation_parameter_assign
UNION ALL SELECT id, 'KEY_VALUE_PAIR' FROM nodes_key_value_pair
UNION ALL SELECT id, 'LOCATION' FROM nodes_location
UNION ALL SELECT id, 'CONFIG_FILE' FROM nodes_config_file
UNION ALL SELECT id, 'IMPORT' FROM nodes_import
UNION ALL SELECT id, 'METADATA' FROM nodes_metadata;
ALTER TABLE cpg_nodes ADD PRIMARY KEY (id);
CREATE INDEX idx_cpg_nodes_type ON cpg_nodes(node_type);
Step 2: Create Property Graph¶
CREATE PROPERTY GRAPH cpg
VERTEX TABLES (
cpg_nodes LABEL CPG_NODE,
nodes_method LABEL METHOD,
nodes_call LABEL CALL_NODE,
nodes_identifier LABEL IDENTIFIER,
nodes_type_decl LABEL TYPE_DECL
-- ... (see SCHEMA.md for the full vertex table list)
)
EDGE TABLES (
edges_ast
SOURCE KEY (src) REFERENCES cpg_nodes (id)
DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
LABEL AST,
edges_cfg
SOURCE KEY (src) REFERENCES cpg_nodes (id)
DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
LABEL CFG,
edges_call
SOURCE KEY (src) REFERENCES nodes_call (id)
DESTINATION KEY (dst) REFERENCES nodes_method (id)
LABEL CALLS,
edges_ref
SOURCE KEY (src) REFERENCES cpg_nodes (id)
DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
LABEL REF,
edges_reaching_def
SOURCE KEY (src) REFERENCES cpg_nodes (id)
DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
LABEL REACHING_DEF,
edges_inherits_from
SOURCE KEY (src) REFERENCES nodes_type_decl (id)
DESTINATION KEY (dst) REFERENCES cpg_nodes (id)
LABEL INHERITS_FROM
-- ... (see SCHEMA.md for the full edge table list)
);
For the complete property graph definition with all vertex and edge tables, see SCHEMA.md section “Property Graph Definition”.
SQL/PGQ Graph Queries¶
DuckDB’s SQL/PGQ extension enables intuitive graph traversal queries. All queries in this section require the PGQ prerequisites to be set up first.
Call Graph Traversal¶
-- Find all methods called by main (direct calls)
-- CALLS edges go from CALL_NODE to METHOD, so we traverse through call sites
SELECT *
FROM GRAPH_TABLE(cpg
MATCH (caller:METHOD)-[:CALLS]-(call:CALL_NODE)-[:CALLS]->(callee:METHOD)
WHERE caller.name = 'main'
COLUMNS (caller.full_name AS caller_name, callee.full_name AS callee_name)
);
Data Flow Analysis¶
-- Track data flow through reaching definitions
-- REACHING_DEF edges are polymorphic (between any CPG node types)
SELECT *
FROM GRAPH_TABLE(cpg
MATCH (source:CPG_NODE)-[:REACHING_DEF*1..5]->(sink:CPG_NODE)
COLUMNS (source.id, sink.id)
)
LIMIT 100;
AST Traversal¶
-- Find all nodes under a method in AST
FROM GRAPH_TABLE(cpg
MATCH (method:METHOD)-[:AST*1..5]->(child:CPG_NODE)
WHERE method.name = 'authenticate'
COLUMNS (
method.name AS method_name,
child.id AS child_id,
child.node_type
)
)
LIMIT 100;
CFG Traversal¶
-- Follow control flow graph edges
-- CFG edges are polymorphic (between any CPG node types)
SELECT *
FROM GRAPH_TABLE(cpg
MATCH (start:CPG_NODE)-[:CFG*1..3]->(end_node:CPG_NODE)
COLUMNS (start.id AS start_node, end_node.id AS end_node)
)
LIMIT 100;
Type Hierarchy¶
-- Find inheritance relationships
-- INHERITS_FROM goes from TYPE_DECL to cpg_nodes (not nodes_type directly)
SELECT *
FROM GRAPH_TABLE(cpg
MATCH (derived:TYPE_DECL)-[:INHERITS_FROM]->(base:CPG_NODE)
COLUMNS (derived.full_name AS derived_type, base.id AS base_type)
);
Identifier References¶
-- Find all identifiers and what they reference
SELECT *
FROM GRAPH_TABLE(cpg
MATCH (id:IDENTIFIER)-[:REF]->(decl:CPG_NODE)
COLUMNS (id.name AS identifier_name, decl.id AS declaration_id)
);
Methods with Most Incoming Calls¶
SELECT m.full_name, COUNT(*) as call_count
FROM GRAPH_TABLE(cpg
MATCH (c:CALL_NODE)-[:CALLS]->(m:METHOD)
COLUMNS (m.full_name, m.id)
)
GROUP BY m.full_name
ORDER BY call_count DESC
LIMIT 10;
Usage Tips¶
- Always add LIMIT to prevent returning millions of rows
- Use specific columns instead of SELECT *
- Test on small datasets first
- Add WHERE clauses to filter irrelevant results
- Use EXPLAIN to check query plans
- Monitor execution time (should be < 100ms for most queries)
- Prefer ID-based joins (
containing_method_id,method_id,callee_method_id) over string matching - Use pre-computed metrics (
cyclomatic_complexity,fan_in,fan_out) instead of computing them at query time
Performance Notes¶
- Simple lookups: 1-3 ms
- JOINs (1-2 tables): 2-5 ms
- Aggregations: 4-8 ms
- Recursive CTEs: 10-50 ms (depends on depth)
- PGQ queries: 5-50 ms (depends on path length and graph size)
All queries tested on sample database with 5 methods, 4 calls. Performance scales sub-linearly with data size due to indexes.
ID-based joins (containing_method_id = m.id) are significantly faster
than string-based joins (method_full_name LIKE '%' || m.name || '%')
because they use indexed BIGINT comparisons instead of full-text scans.
Next Steps¶
- Try these queries on your CPG database
- Modify parameters to match your needs
- Combine patterns for complex analysis
- Monitor performance and optimize as needed
- Add your own patterns to this cookbook!
See Also¶
- Analysis Modules - CFGAnalyzer, FieldSensitiveTracer documentation
- Hypothesis System - Security hypothesis generation
- DuckDB SQL/PGQ Documentation
- CPG Specification