Benchmark Results

Unsupervised Fidelity-Based
Anomaly Detection

Per-class, per-dimension, operator-tunable results on CICIDS-2017 and CSE-CIC-IDS2018. No attack labels used for calibration or detection.

The system calibrates from benign traffic — under 30 seconds in Precision mode, several minutes in Detection mode. No GPU. No internet. No signatures. The operator controls one parameter — sensitivity — and everything else emerges from calibration on the deployment environment's own traffic. Thresholds are derived from held-out benign rows only.

Every benchmark-pipeline number below is reproducible from our deployment package. We report every class, including the ones where the system is weak.


What Is Different

The system does not optimize a single aggregate score. It measures behavioral fidelity across multiple independent dimensions simultaneously and preserves the decomposition for the operator. The operator sees which dimensions deviate, by how much, and whether independent measurements agree.

This changes what the system reports: not "this flow is malicious" but "this flow deviates on these dimensions, with this level of agreement, at a sensitivity setting you chose."


CICIDS-2017 Corrected

Engelen 2021 · 15 classes · 5-fold CV · Two operating modes from a single codebase

Per-Class Recall (α=0.05)

Attack ClassnPrecision ModeDetection Mode
DoS Hulk158,4681.0001.000
DDoS95,1441.0001.000
Botnet7361.0001.000
Web Brute Force731.0001.000
Web XSS181.0001.000
Heartbleed111.0001.000
Infiltration360.9801.000
DoS Slowhttptest1,7400.9980.998
Infiltration-Portscan71,7670.9930.998
FTP-Patator3,9720.6030.995
SSH-Patator2,9610.9880.995
DoS Slowloris3,8590.9120.984
Portscan159,0660.9470.952
DoS GoldenEye7,5670.9360.939
SQL Injection130.0670.077

Precision mode: 11 of 15 at or above 0.93. Detection mode: 14 of 15. SQL Injection (n=13) is payload-level, below flow metadata resolution.

Tuesday (Patator) is the hardest day — 7K attacks in 315K benign. Precision FTP-Patator R=0.603 at α=0.05; Detection reaches 0.995.

Operating Curves

Precision Mode — Friday

αPrecRecFPRF1
0.0010.9960.2830.1%0.440
0.0050.9900.4780.4%0.644
0.010.9840.6500.8%0.784
0.020.9730.8361.5%0.902
0.030.9640.9442.3%0.954
0.050.9590.9683.8%0.963
0.070.9450.9765.1%0.960
0.100.9270.9937.1%0.959

Detection Mode — Friday

αPrecRecFPRF1
0.0010.9040.5525.3%0.685
0.0050.9080.5895.4%0.715
0.010.9220.7135.4%0.804
0.020.9330.8695.6%0.900
0.030.9370.9445.7%0.941
0.050.9330.9706.3%0.951
0.070.9270.9787.0%0.952
0.100.9150.9948.3%0.953

Precision Mode — Wednesday (DoS)

αPrecRecFPRF1
0.0010.9970.6260.1%0.765
0.0050.9910.9050.5%0.946
0.010.9840.9470.9%0.965
0.020.9710.9701.6%0.971
0.030.9580.9762.4%0.967
0.050.9360.9913.8%0.963
0.100.8881.0007.0%0.940

Precision Mode — Thursday (Infiltration, Web)

αPrecRecFPRF1
0.010.9400.4990.8%0.651
0.020.9310.8251.6%0.874
0.030.9120.9262.3%0.918
0.050.8670.9673.8%0.915
0.100.7780.9737.1%0.864

CSE-CIC-IDS2018 (No Retuning)

Same engine. Same parameters. No modification for CIC-2018.

Attack ClassFilenPrecision (α=0.05)Detection (α=0.05)
SSH-BruteForceWed-1494,1971.0001.000
DDoS-HOICWed-21248,0691.0001.000
DDoS-LOIC-HTTPTue-20246,0490.9801.000
DDoS-LOIC-UDPmultiple8151.0001.000
DoS GoldenEyeThu-1522,5600.9971.000
DoS SlowlorisThu-158,4901.0001.000
DoS HulkFri-16247,1770.9121.000
Web Brute Forcemultiple1311.0001.000
Web XSSmultiple1131.0001.000
Infiltration-NMAPThu-0117,4070.9900.999
Infiltration-NMAPWed-2821,5910.8850.988
Web SQLmultiple390.650–0.8360.650–0.836

Detection Mode improves most classes not already at 1.000 (Web SQL unchanged).


Adversarial Validation

TestResult
Temporal split (train 60%, test 40%, no shuffle)Detection holds across time-shifted traffic
Calibration contaminationGuard-enabled E2E refused 1%, 3%, 5% poisoned calibration
Behavioral head ablationPer-dimension importance quantified
Cross-dataset transfer2017 → 2018 without retuning

Specifications

Labels for calibrationNone
Calibration time< 30s / mins
Engine size560 KB
Deployment footprint< 70 MB
Runtime memory~200 / ~250 MB
ReproducibilityMD5-tracked
HardwareCPU only
Air-gappedYes
Per-flow latency~1 ms / ~4 ms
Throughput~1M / ~250K flows/s
Operator parametersSensitivity (α), mode
Internet requiredNo

Methodology

5-fold StratifiedKFold (shuffle=True, random_state=42). CICIDS-2017: Engelen, Rimmer & Joosen (IEEE SPW 2021), Distrinet/KU Leuven, MD5-verified. CSE-CIC-IDS2018: CSE & CIC with Distrinet corrections. Thresholds from held-out benign rows only. Hardware: Apple M2 Mac Mini, 16GB, CPU only. Engine v2.1.

Independent Evaluation

We welcome independent reproduction of these results. The full deployment package, benchmark scripts, and reproduction instructions are available on request.