📸 Screenshot Guide: Each section is 1080×1080. In DevTools, set device size to 1080×1080 and screenshot each slide.
v0.6.0 • Progress Report

RojuSec

Threat Engine

ML-powered phishing detection with explainable scoring
PHASE 3 COMPLETE • CORE ENGINE LIVE
6
Core Modules
73%
Baseline F1
8.4K
Rows Processed
2
ML Models
Architecture

Core Detection Stack

Production-ready modules wired into a single scoring pipeline
🤖 core.nlp.model
Rule-based text engine with 50+ phishing & spam patterns.
✅ 4 / 4 tests • urgency, credential theft, payment fraud
🌐 core.url_analysis
URL extraction and risk scoring for embedded links.
✅ 13 / 13 tests • shorteners, IPs, dodgy TLDs
🔐 core.auth
API-key auth and rate-limiting for all sensitive endpoints.
✅ 13 / 13 tests • 60/min + 1,000/hr per key
📊 core.telemetry
Privacy-first telemetry with hashed identifiers and subscores.
✅ 6 / 6 tests • 8,390+ records stored
Phase 3B & 3C

ML Models Online

DeBERTa + MiniLM integrated as "boosters" on top of rules
🧠 core.models.text
✅ Phase 3B
DeBERTa-based classifier providing soft evidence on email content.
DeBERTa-base model
Boost range: −5 to +10
Lazy loading
Thread-safe caching
Multi-heuristic scoring
Explainable subscores
🔍 core.models.behavior
✅ Phase 3C
MiniLM-based behavioral anomaly model for sender behavior patterns.
MiniLM embeddings
Cosine similarity
Online baseline updates
Thread-safe store
Boost range: 0 to +10
Telemetry-backed tuning
Security & Data

Hardening & Telemetry

Built as a security product, not just a toy model
🔐 core.auth
✅ Live
API-key authentication and multi-tier rate limiting.
Key generation & validation
60 requests / minute
1,000 requests / hour
Per-key isolation
Rate-limit status endpoint
Defensive defaults
✅ 13 / 13 tests • happy-path + abuse scenarios
📊 core.telemetry
✅ Live
Privacy-first telemetry with hashed email IDs and full subscore storage.
SQLite storage
SHA-256 email hashing
All subscores persisted
User feedback capture
Non-blocking logging
8,390+ events
✅ 6 / 6 tests • schema + privacy guarantees
API

API Surface

Designed to plug into mail gateways, SOC tooling and scripts
GET /health
Liveness probe for containers / orchestrators.
Public
GET /health/ml
Verifies ML models are loaded and ready.
Public
POST /analyze
Main phishing analysis endpoint (NLP + URL + ML + behavior).
Auth Required
GET /rate-limit
Returns remaining quota for the calling API key.
Optional
GET /telemetry/stats
Aggregated telemetry metrics for dashboards.
Auth Required
GET /progress
Progress view for screenshots (this page).
Public
Performance

Current Metrics

Conservative tuning prioritising precision and stability
73%
Baseline F1
Before aggressive telemetry-based tuning
80%
Precision
False positives kept under control
67%
Recall
Balanced against precision guardrails
14
Risk Threshold
Current cut-off for "high risk" emails
⚙️ Conservative Tuning
✅ Active
Automated weight updates driven by datasets + telemetry, with strict guardrails.
Min precision: 70%
Min F1 improvement: +2%
8,424 rows processed
3 datasets in rotation
Testing

Dataset Coverage

Iterative runs across three public phishing corpora
📧 CEAS
10.2% complete
39,154 total rows • 4,000 processed • 35,154 remaining
10.2%
📧 Ling
100% complete
2,859 rows • full pass with current pipeline
100%
📧 Nazario
100% complete
1,565 rows • full pass with current pipeline
100%
8.4K
Total Rows
Processed across CEAS, Ling & Nazario so far
8
Tuning Runs
Iterative passes with conservative constraints
Roadmap

What's Next

From prototype engine to plug-and-play security product
🔗 Phase 4 — URL Intelligence
⏳ Planned
Augment URL scoring with external reputation sources.
VirusTotal API integration
URLhaus lookups
Google Safe Browsing
Redirect-chain following
🎯 Phase 5 — Model Fine-Tuning
⏳ Planned
Close the loop using telemetry and labeled feedback.
DeBERTa fine-tuning pipeline
Model versioning & rollback
A/B testing strategies
"Safe rollout" guardrails