GoCPG vs Joern: Comparative Analysis¶
Date: 2026-02-25
Joern Version: 4.0.426 (flatgraph)
GoCPG Version: 0.1.0 (DuckDB)
Codebases: GoCPG (Go, 215 files), CodeGraph (Python, 1183 files)
Executive Summary¶
GoCPG is 3.9x faster than Joern at CPG generation while producing richer graphs with more edge types (DDG, PDG, eval_type, binds_to). Joern generates 2.6x more CDG edges due to modeling function calls as branch-points (exception paths). Both use identical core algorithms (Cooper-Harvey-Kennedy dominators, Ferrante-Ottenstein-Warren CDG). The key architectural difference: GoCPG is a native Go binary writing directly to DuckDB, while Joern is a JVM/Scala system using in-memory flatgraph.
Parse + Full Analysis Pipeline¶
| Codebase |
GoCPG |
Joern |
Speedup |
| GoCPG (Go, 215 files) |
13.1s |
51.8s |
3.9x |
| CodeGraph (Python, 1183 files) |
85.5s |
97.7s |
1.1x |
Pass-Level Timing (Go Codebase)¶
| Pass |
GoCPG (DAG parallel) |
Joern (sequential) |
| CFG |
included in 4.57s total |
638ms |
| Dominator + PostDom |
included in 4.57s total |
615ms |
| CDG |
included in 4.57s total |
129ms |
| Reaching Definitions |
included in 4.57s total |
2,827ms |
| Call Graph |
included in 4.57s total |
74ms |
| All passes |
4.57s (30 passes, parallel DAG) |
~6.9s (sequential) |
GoCPG advantage: DAG-based parallel pipeline runs all 30 passes with dependency resolution. Joern runs passes sequentially.
Memory & Storage¶
| Metric |
GoCPG |
Joern |
| Output format |
DuckDB (SQL-queryable) |
flatgraph (proprietary binary) |
| Runtime |
Native binary (no JVM) |
JVM startup + Scala |
| Incremental updates |
Git-based diff, branch tracking |
Not supported |
2. Node Count Comparison¶
Go Codebase (GoCPG source, 215 files)¶
| Node Type |
GoCPG |
Joern |
Delta |
Notes |
| method |
10,549 |
5,762 |
+83% |
GoCPG creates external method stubs for all callees |
| call |
83,908 |
95,061 |
-12% |
Joern inlines operator calls |
| file |
430 |
375 |
+15% |
GoCPG includes testdata |
| type_decl |
2,060 |
879 |
+134% |
GoCPG resolves interface types |
| identifier |
79,217 |
87,603 |
-10% |
|
| literal |
24,537 |
31,874 |
-23% |
Joern splits composite literals |
| local |
11,200 |
12,897 |
-13% |
|
| param |
14,730 |
10,333 |
+43% |
GoCPG includes method_parameter_out |
| return |
5,262 |
4,620 |
+14% |
|
| block |
37,449 |
28,449 |
+32% |
|
| control_structure |
11,368 |
13,332 |
-15% |
|
| member |
3,169 |
3,108 |
+2% |
|
| comment |
12,752 |
0 |
— |
Joern drops all comments |
| method_return |
9,303 |
5,762 |
+61% |
|
| type |
2,732 |
0 |
— |
Joern doesn’t persist TYPE nodes for Go |
| namespace |
216 |
49 |
+341% |
GoCPG: one per package path |
| method_ref |
191 |
180 |
+6% |
|
| annotation |
237 |
0 |
— |
Joern ignores Go build tags |
| TOTAL |
349,710 |
332,652 |
+5% |
|
GoCPG-exclusive node types (not in Joern):
- finding (1,199) — pattern/quality findings
- binding (6,253) — type-method bindings
- field_identifier (19,876) — struct field access
- modifier (670) — visibility modifiers
- import (1,097) — import declarations
- method_parameter_out (8,900) — output parameters
- type_ref (2,137) — type references
Python Codebase (CodeGraph source, 1183 files)¶
| Node Type |
GoCPG |
Joern |
Delta |
Notes |
| method |
200,790 |
61,700 |
+225% |
GoCPG creates stubs for all referenced external methods |
| call |
1,431,271 |
661,422 |
+116% |
GoCPG models decorators/comprehensions as calls |
| file |
4,515 |
1,540 |
+193% |
GoCPG includes init.py, config, test files |
| type_decl |
92,998 |
56,569 |
+64% |
|
| identifier |
1,286,815 |
612,328 |
+110% |
|
| literal |
615,354 |
255,001 |
+141% |
GoCPG extracts f-string parts |
| local |
428,845 |
218,693 |
+96% |
|
| param |
292,805 |
133,839 |
+119% |
|
| block |
390,614 |
143,495 |
+172% |
|
| comment |
105,730 |
0 |
— |
Joern drops all comments |
| TOTAL |
6,474,877 |
2,894,558 |
+124% |
|
3. Edge Count Comparison¶
Go Codebase¶
| Edge Type |
GoCPG |
Joern |
Delta |
Notes |
| cfg |
270,587 |
268,470 |
+0.8% |
Nearly identical CFG semantics |
| cdg |
59,123 |
153,152 |
-61% |
See Section 4 (Root Cause Analysis) |
| dominate |
192,710 |
253,562 |
-24% |
|
| post_dominate |
259,891 |
251,657 |
+3% |
|
| reaching_def |
821,197 |
671,111 |
+22% |
GoCPG has interprocedural reaching defs |
| call |
90,844 |
95,426 |
-5% |
|
| contains |
305,724 |
285,865 |
+7% |
|
| argument |
170,722 |
188,583 |
-9% |
|
| ast |
195,808 |
332,152 |
-41% |
Joern materializes full AST edges |
| ref |
92,320 |
66,165 |
+40% |
GoCPG resolves more references |
| condition |
11,039 |
10,345 |
+7% |
|
| receiver |
13,857 |
16,705 |
-17% |
|
| source_file |
335,713 |
5,429 |
+6,084% |
GoCPG links all nodes to source file |
| parameter_link |
8,731 |
10,333 |
-16% |
|
| ddg |
243,723 |
— |
— |
GoCPG-only: data dependency graph |
| pdg |
722,704 |
— |
— |
GoCPG-only: program dependence graph |
| eval_type |
277,592 |
— |
— |
GoCPG-only: type evaluation |
| binds_to |
50,739 |
— |
— |
GoCPG-only: type parameter bindings |
| inherits_from |
2,156 |
— |
— |
GoCPG-only for Go |
| capture |
383 |
— |
— |
GoCPG-only: closure captures |
| TOTAL |
4,132,053 |
2,608,955 |
+58% |
|
Python Codebase¶
| Edge Type |
GoCPG |
Joern |
Delta |
| cfg |
4,170,935 |
1,889,935 |
+121% |
| cdg |
1,786,888 |
668,084 |
+167% |
| reaching_def |
8,200,984 |
4,205,949 |
+95% |
| call |
1,733,711 |
15,906,040 |
-89% |
| ast |
5,851,652 |
2,678,327 |
+118% |
| ddg |
1,927,830 |
— |
GoCPG-only |
| pdg |
7,716,042 |
— |
GoCPG-only |
| eval_type |
4,191,411 |
2,160,473 |
+94% |
| TOTAL |
65,874,890 |
35,928,055 |
+83% |
Note: Joern’s 15.9M call edges for Python comes from NaiveCallLinker which aggressively links all calls by name — this is a known over-approximation.
4. CDG Root Cause Analysis: 59K vs 153K¶
The Key Finding¶
Despite nearly identical CFG edge counts (270K vs 268K), Joern produces 14.2x more branching nodes:
| CFG Metric |
GoCPG |
Joern |
Ratio |
| Linear nodes (1 CFG successor) |
268,401 |
239,514 |
|
| Branching nodes (2+ CFG successors) |
970 |
13,734 |
14.2x |
| — of which CALL nodes |
~841 |
13,031 |
15.5x |
| — of which IDENTIFIER nodes |
~129 |
703 |
5.4x |
Why Joern Has More Branches¶
Joern models every function call as a potential branch point — calls can either return normally or throw an exception (two paths). With Go’s explicit error returns (no exceptions), this creates 13,031 spurious branches that don’t exist in the actual control flow.
Impact Chain¶
More branching nodes → Different post-dominator trees →
Larger post-dominator frontiers → More CDG edges
GoCPG: 970 branching → 59K CDG
Joern: 13,734 branching → 153K CDG
Algorithm Comparison¶
Both use identical algorithms:
| Step |
Joern |
GoCPG |
| Dominators |
Cooper-Harvey-Kennedy |
Cooper-Harvey-Kennedy |
| Post-Dominators |
Same on reversed CFG |
Same on reversed CFG |
| CDG |
Post-dominator frontier (Ferrante-Ottenstein-Warren) |
IPDOM chain walk (Ferrante-Ottenstein-Warren) |
Correctness for Go¶
For Go specifically, GoCPG’s CDG is more precise because Go uses explicit error returns (if err != nil), not try/catch exceptions. Joern’s exception-path modeling creates spurious control dependencies that don’t exist in Go.
5. Architectural Comparison¶
Pipeline Design¶
| Aspect |
GoCPG |
Joern |
| Language |
Go (native binary) |
Scala (JVM) |
| Pass scheduling |
DAG with topological sort, parallel execution |
Sequential pass list |
| Storage |
DuckDB (SQL-queryable, Appender API) |
flatgraph (proprietary, requires Joern console) |
| Incremental |
Git diff-based with branch tracking |
Not supported |
Pass Pipeline¶
| Phase |
GoCPG (30 passes) |
Joern (~25 passes) |
| Parsing |
Frontend → DiffGraph → CPGGraph |
Frontend → flatgraph |
| Types |
TypeNode, TypeDecl, Inheritance, TypeRecovery |
TypeDeclStub, MethodStub, TypeEval |
| Control flow |
CFG, Dominator, PostDominator, CDG |
CfgCreation, CfgDominator, CdgPass |
| Call graph |
CallGraph, CallResolution, ImportResolver |
StaticCallLinker, DynamicCallLinker |
| Data flow |
Ref, AliasAnalysis, ReachingDef, InterproceduralReachingDef |
ReachingDefPass |
| Combined |
PDG (CDG + data), DDG |
(not computed) |
| Enrichment |
Metrics, Comments, Findings, PatternMatch |
ContainsEdge |
| Domain |
Annotation, Tags, UseCasePropagation, VCS |
(not available) |
Unique GoCPG Features (Not in Joern)¶
- DDG — explicit data dependency edges
- PDG — combined CDG + data dependency
- AliasAnalysisPass — intraprocedural points-to analysis (
p = &x → pts[p]={x}, pointer copy, indirect field resolution p->field → obj.field)
- InterproceduralReachingDef — cross-function data flow with stdlib semantics + global side-effect propagation
- Ternary indirect calls —
(cond ? f1 : f2)(args) emits separate CallNodes per branch (C/C++)
- PatternMatchPass — structural pattern search integrated into pipeline
- FindingGenerationPass — quality/security findings as CPG nodes
- DomainAnnotationPass — domain-specific tagging (PostgreSQL, Django, etc.)
- VCSTagPass — git author, change-frequency, last-modified tags
- Incremental updates — git diff-based with branch tracking and merge support
- Watch mode — file system watcher with live CPG updates
- DuckDB output — SQL-queryable without JVM
Unique Joern Features (Not in GoCPG)¶
- Interactive REPL — Scala console for ad-hoc graph queries
- 13 language frontends (vs GoCPG’s 11) — Ruby, Swift
- GHIDRA frontend — binary analysis support
- NaiveCallLinker — aggressive name-based call resolution
- PythonTypeRecovery — iterative type inference (2-pass)
6. Data Flow Comparison¶
Reaching Definitions¶
| Aspect |
GoCPG |
Joern |
| Granularity |
Intraprocedural + interprocedural |
Intraprocedural only |
| Alias analysis |
Points-to analysis for pointer dereference chains |
None (Joern #5580, #5668) |
| Global side effects |
Interprocedural global write propagation |
None (Joern #5581) |
| Stdlib semantics |
YAML configs for 11 languages |
Hard-coded Scala |
| Timeout |
15s per method |
Configurable max-num-def |
| Edges (Go) |
821,197 |
671,111 (+22%) |
| Edges (Python) |
8,200,984 |
4,205,949 (+95%) |
Call Graph¶
| Aspect |
GoCPG |
Joern |
| Strategy |
FQN-based 4-level fallback |
Static + Dynamic + Naive |
| Go (edges) |
90,844 |
95,426 |
| Python (edges) |
1,733,711 |
15,906,040 (~10x FPs) |
7. Python Parity Analysis (Deduplication-Corrected)¶
File Path Duplication Bug¶
GoCPG has a file path normalization bug that inflates Python codebase numbers: each source file is stored with both absolute (D:\work\codegraph\src\...) and relative (..\src\...) paths, creating ~2x FILE nodes and duplicating all contained nodes/edges.
| Metric |
Value |
| Absolute-path files |
2,345 |
| Relative-path files |
2,170 |
| Overlapping (same file, two paths) |
2,336 |
| Unique files after normalization |
2,345 |
| Joern files |
1,540 |
Root cause: orchestrator.go:280 appends absolute paths from filepath.WalkDir(). A separate import/type resolution mechanism in the Python frontend creates entries with relative paths. The CPGGraph.filesByName uses raw paths as keys, so both coexist. Method full_names embed the file path (filepath:ClassName.method), preventing deduplication.
Deduped Node Comparison¶
Using absolute-path-only counts as the true deduplicated values:
| Node Type |
GoCPG (deduped) |
Joern |
Delta |
Notes |
| Internal methods |
42,923 |
47,749 |
-10% |
Near parity |
| External stubs |
117,118 |
13,951 |
+740% |
GoCPG creates stubs for all referenced externals |
| call |
740,272 |
661,422 |
+12% |
GoCPG: decorators, comprehensions as calls |
| identifier |
664,721 |
612,328 |
+9% |
Near parity |
| literal |
317,917 |
255,001 |
+25% |
GoCPG extracts f-string parts |
| type_decl |
47,789 |
56,569 |
-16% |
Joern: PythonTypeRecovery adds more |
| comment |
~52,865 |
0 |
— |
Joern drops all comments |
Deduped Edge Comparison (estimated /2)¶
| Edge Type |
GoCPG Raw |
GoCPG /2 (est.) |
Joern |
Delta |
| cfg |
4,170,935 |
~2,085K |
1,890K |
+10% |
| cdg |
1,786,888 |
~893K |
668K |
+34% |
| reaching_def |
8,200,984 |
~4,100K |
4,206K |
-3% |
| call |
1,733,711 |
~867K |
15,906K |
-95% |
| ast |
5,851,652 |
~2,926K |
2,678K |
+9% |
| ddg |
1,927,830 |
~964K |
— |
GoCPG only |
| pdg |
7,716,042 |
~3,858K |
— |
GoCPG only |
Note: Joern’s 15.9M call edges come from NaiveCallLinker — ~18x over-approximation by aggressively linking all calls by name.
Python Parity Verdict¶
Core parity achieved after correcting for the file duplication bug:
- Internal methods: -10% — minor gap from synthetic method generation differences
- CFG: +10%, Reaching defs: -3%, AST: +9% — all near parity
- CDG: +34% — expected: GoCPG properly models Python exception branching
- Call graph: GoCPG 18x more precise than Joern’s NaiveCallLinker
- Exclusive: DDG, PDG, eval_type, binds_to, capture, inherits_from — only in GoCPG
8. Language Coverage¶
| Language |
GoCPG |
Joern |
| C/C++ |
Yes |
Yes |
| Go |
Yes (native go/parser) |
Yes |
| Python |
Yes |
Yes |
| JavaScript/TypeScript |
Yes |
Yes |
| Java |
Yes |
Yes |
| Kotlin |
Yes |
Yes |
| C# |
Yes |
Yes |
| PHP |
Yes |
Yes |
| 1C:Enterprise |
Yes |
No |
| Ruby |
No |
Yes |
| Swift |
No |
Yes |
| Binary (GHIDRA) |
No |
Yes |
9. Summary Scorecard¶
| Dimension |
GoCPG |
Joern |
Winner |
| Parse speed |
13s |
52s |
GoCPG (3.9x) |
| No JVM dependency |
Yes |
No |
GoCPG |
| SQL-queryable output |
Yes |
No |
GoCPG |
| Node richness |
22 types |
16 types |
GoCPG |
| Edge types |
22 types (DDG, PDG) |
14 types |
GoCPG |
| CDG for Go |
59K (precise) |
153K (exception paths) |
GoCPG |
| Python parity |
-10% methods, +10% CFG |
Baseline |
Tie (near parity) |
| Python call graph |
867K (precise) |
15.9M (~18x FPs) |
GoCPG |
| Data flow depth |
Interprocedural |
Intraprocedural |
GoCPG |
| Incremental updates |
Git-based |
Not supported |
GoCPG |
| Pattern search |
YAML + tree-sitter |
Scala DSL |
Tie |
| Interactive REPL |
No |
Yes |
Joern |
| Language count |
11 |
13 + GHIDRA |
Joern |
| Community/ecosystem |
Proprietary |
Open source |
Joern |
Generated: 2026-02-25 | GoCPG v0.1.0 vs Joern v4.0.426