Machine Learning Engineer
Building production-ready ML systems — not just models. Focused on NLP pipelines, predictive automation, and cloud-ready deployment across enterprise environments.
Most ML work stops at the notebook. Mine starts there and ends in production — with monitored pipelines, versioned models, and measurable business outcomes.
With a background spanning insurance, logistics, and pharmaceuticals, I've built ML systems where correctness and reliability are non-negotiable constraints, not afterthoughts.
Every model is a component of a larger system. I design for data ingestion, transformation, training, evaluation, and monitoring from the outset.
Azure Databricks, Data Factory, and SQL are my production environment. I build ETL workflows and ML pipelines that scale without manual intervention.
Accuracy metrics matter. So do cost reductions, turnaround time, and stakeholder adoption. I connect model performance to operational impact.
I build fewer systems, built well — with clean code, reproducible experiments, and deployment documentation that survives handover.
Framed as engineering projects — with architecture, data flow, and deployment strategy.
Clinical gout diagnosis relies heavily on subjective symptom descriptions in free-text patient records. This project applies NLP to extract structured clinical signals from unstructured text and trains a prediction model to flag gout risk — reducing dependence on manual clinical interpretation.
Clinical text preprocessing — tokenisation, stopword removal, medical entity normalisation.
TF-IDF and n-gram features. Symptom keyword extraction for domain-specific signal engineering.
Classification pipeline (Logistic Regression / Random Forest) trained on labelled clinical records.
Precision, recall, F1 and AUC-ROC scoring. Cross-validated on held-out clinical split.
FastAPI inference endpoint → Docker image → Azure Container Instance. CI/CD via GitHub Actions.
Separate data, features, training, and evaluation into clean modules.
POST /predict accepts raw text, returns risk score + confidence.
Docker image → Azure Container Apps. GitHub Actions CI/CD on push.
Bioinformatics researchers working with raw sequence data lacked a fast, accessible tool for computing and visualising key metrics — sequence length distributions, GC content, and compositional profiles — without writing custom scripts for each dataset.
Users upload FASTA/sequence files. Python backend parses records and validates format.
Computes sequence length, GC content, nucleotide frequency distributions, and compositional stats.
Dynamic charts rendered in-browser. Histogram, bar, and line views per metric.
Clean Python-powered web app. No external dependencies required from user.
Containerise with Docker → deploy to Azure App Service with CI/CD via GitHub Actions.
App + dependencies packaged into a reproducible image.
Publicly accessible endpoint with auto-scaling and health monitoring.
Automated test and deploy pipeline on every push to main.
Recovery operations at Call Assist relied on manual demand estimation — planners were reacting to shortfalls rather than anticipating them. No predictive layer existed, leading to inefficient vendor allocation and avoidable transportation costs at scale.
Python + Azure Data Factory pipeline ingesting 3 years of operational records into Databricks.
Lag features, rolling averages, seasonality decomposition, and vendor capacity signals.
XGBoost + LightGBM ensemble with Prophet baseline. Cross-validated on 6-month holdout.
Forecast outputs feed a constraint-based vendor allocation model, eliminating idle capacity.
Forecast results surfaced to operations team with 7-day rolling prediction window.
Measured against 3-month rolling average baseline.
Via demand-matched vendor allocation eliminating over-provisioning.
Through pipeline automation across operational and financial datasets.
Projects are in active development — moving from notebook-stage to Docker-containerised, CI/CD-deployed, monitored pipelines on Azure.
FastAPI + model weights packaged into reproducible images
Automated test → build → deploy pipeline on every push
Scalable serverless hosting for inference endpoints
Model versioning, parameter logging, and metric dashboards
Automated alerts on feature and prediction distribution shifts
Looking for production-focused ML positions at companies where systems thinking and deployment quality matter. FAANG, growth-stage tech, and enterprise SaaS all of interest.