BACK TO ARCHIVECase Study 08
2025
CASE STUDY

PhishGuard

ML-Powered Real-Time Phishing URL Detection & Threat Analysis

JavaScriptNode.jsMachine LearningCybersecurity

01 / Project Overview

A full-stack cybersecurity application that analyzes URLs in real-time for phishing indicators using a multi-layer detection pipeline. PhishGuard extracts 30+ URL structural features (domain age, SSL validity, redirect chains, suspicious keywords, typosquatting similarity), feeds them through a trained gradient boosting classifier, and returns risk scores with detailed explanations. The browser extension integration enables in-line protection during web navigation.

Quick Facts
Released2025
RoleLead Engineer
Core FocusScale & Speed

02 / The Challenge & Problem

Real-World Problem Statement

Phishing attacks are the leading vector for credential theft and malware distribution, yet most users rely solely on browser safe-browsing lists that have an average 12-hour delay between site creation and blacklisting. Novel phishing domains exploit this window to harvest thousands of credentials before detection.

03 / The Engineering Solution

Implementation & Architectural Approach

Built a zero-day phishing detection system that relies on structural URL analysis rather than blacklists, enabling real-time assessment of previously unseen URLs. The heuristic + ML ensemble achieves 96.2% detection accuracy with a false positive rate under 0.8%, making it viable for production use without disrupting legitimate browsing.

04 / Technical Architecture Flow

01Feature Extraction Engine

URL Parser & WHOIS Resolver

Decomposes URLs into 30+ features including domain age, HTTPS presence, redirect depth, special character density, and Levenshtein distance to top-500 domains.

02ML Classification Layer

Gradient Boosting Classifier

XGBoost-based model trained on 50,000 labeled URLs, outputting phishing probability scores with SHAP-based feature attribution for explainability.

03API & Browser Integration

Node.js REST API + Extension

Express API serves real-time predictions; companion browser extension intercepts navigation events and overlays risk indicators on flagged pages.

05 / Key Project Features

Zero-Day URL Analysis

Detects phishing on URLs not in any blacklist by analyzing structural patterns characteristic of malicious domains.

SHAP Explainability Reports

Provides per-URL explanations highlighting the specific features (e.g., domain age, typosquatting) driving the risk score.

Browser Extension Integration

Real-time visual overlays on navigation bar warn users before page load completes, preventing credential entry on phishing sites.

06 / Engineering Challenges & Mitigations

Blocker Difficulty

Legitimate URLs from new domain registrations were incorrectly flagged as phishing due to low domain age.

Resolution Strategy

Added domain reputation signals from VirusTotal API and built a whitelist of verified new registrars to reduce false positives on legitimate new sites.

Blocker Difficulty

WHOIS lookup latency caused prediction delays exceeding 3 seconds for some queries.

Resolution Strategy

Implemented an LRU cache for recent WHOIS lookups and a timeout fallback that uses only local structural features when WHOIS is slow.

07 / Technical & Personal Learnings

01

Gained deep knowledge of phishing attack patterns, URL obfuscation techniques, and the adversarial machine learning arms race in cybersecurity.

02

Mastered SHAP model explainability, enabling non-technical stakeholders to understand and trust automated security decisions.

08 / Categorized Tech Stack

Detection Engine

XGBoost
SHAP
Scikit-Learn
Python Feature Extraction

API & Backend

Node.js
Express.js
VirusTotal API
WHOIS Lookup

Browser Integration

JavaScript
Chrome Extension APIs
Manifest V3
Fetch API