30 Data Science Projects
Spanning ML, deep learning, NLP, computer vision, data engineering, and MLOps.
Satellite Image Analysis Platform
Deep learning system for land use classification and change detection from satellite imagery using PyTorch with ResNet/EfficientNet and interactive Folium maps.
→ Multi-temporal change detectionTime Series Demand Forecasting
Multi-model forecasting combining Prophet, LSTM, and XGBoost for retail demand prediction with seasonality detection, anomaly handling, and ensemble forecasts.
→ Ensemble forecast with backtestingSentiment Analysis Dashboard
Real-time sentiment monitoring using fine-tuned RoBERTa, BERTopic for topic modeling, entity-level sentiment with spaCy NER, and a Plotly Dash dashboard.
→ Real-time multi-model NLP pipelineBI Dashboard Suite
Interactive business intelligence dashboard with Plotly Dash and DuckDB featuring KPI cards, cross-filtering, drill-down charts, role-based access, and PDF/CSV export.
→ Multi-page BI dashboard with RBACMedical Image Segmentation Tool
U-Net architecture in TensorFlow for medical image segmentation with data augmentation, MC Dropout uncertainty, DICOM handling, and Grad-CAM visualization.
→ ONNX-exported production modelReal-time Fraud Detection System
Production-grade ML pipeline for credit card fraud detection with real-time streaming inference, SHAP explainability, A/B testing, and a Streamlit monitoring dashboard.
→ Multi-page real-time dashboardCredit Risk Scoring Model
Interpretable ML model for loan approval using LightGBM with SHAP/LIME explainability, Fairlearn bias testing, traditional WoE/IV scorecard, and regulatory compliance.
→ Fairness-audited production modelIoT Anomaly Detection System
Unsupervised anomaly detection for manufacturing IoT sensors using Isolation Forest, PyTorch Autoencoders, and DBSCAN with real-time scoring and alerting.
→ Real-time anomaly scoring pipelineETL Orchestration Pipeline
Airflow-based ETL pipeline with multi-source extraction, dbt transformations on DuckDB warehouse, SCD Type 2 snapshots, data quality checks, and lineage tracking.
→ Automated ETL with data qualityRAG Knowledge Assistant
Retrieval-augmented generation chatbot ingesting PDF/DOCX/Markdown, embedding into ChromaDB, answering with citations using LangChain and Claude/OpenAI APIs.
→ Multi-format document Q&A systemCustomer Churn Prediction with AutoML
Automated ML pipeline for churn prediction using H2O.ai AutoML and Optuna-tuned LightGBM, with Boruta feature selection and business impact calculator.
→ AutoML + manual model comparisonHybrid Recommendation Engine
Production-ready recommendation system combining collaborative filtering (Surprise) and content-based filtering with cold-start handling, Redis caching, and A/B testing.
→ Hybrid model with cold-start supportText Classification API
FastAPI multi-class text classification service using fine-tuned DistilBERT and BERT on AG News, with model versioning and A/B testing infrastructure.
→ Production NLP API with versioningManufacturing Defect Detection
Computer vision pipeline using ResNet-50 with transfer learning for product defect detection, Grad-CAM localization, active learning, and ONNX Runtime export.
→ Edge-deployable ONNX modelDocument Intelligence OCR System
End-to-end document processing combining Tesseract OCR, OpenCV preprocessing, and transformer-based extraction to extract structured data from invoices and receipts.
→ Structured data from unstructured docsFace Recognition Attendance System
Real-time attendance tracking using FaceNet embeddings and MTCNN detection with anti-spoofing liveness detection, privacy controls, and report generation.
→ Privacy-compliant face recognitionMultilingual Support Classifier
Zero-shot classification for customer support ticket routing across 20+ languages using XLM-RoBERTa, with language detection, urgency scoring, and response templates.
→ 20+ language zero-shot classificationPredictive Maintenance System
Deep learning CNN-LSTM architecture in PyTorch to predict equipment Remaining Useful Life from NASA C-MAPSS sensor data, with MC Dropout uncertainty and maintenance scheduling.
→ RUL prediction with uncertaintyCustomer 360 Analytics Platform
Unified customer view combining CRM, transactional, web, and support data with entity resolution, RFM analysis, K-Means segmentation, and predictive CLV.
→ Unified customer view with CLVRetail Analytics with Object Detection
YOLOv8-based system for customer behavior analysis in retail: foot traffic tracking with ByteTrack, dwell time per zone, heatmap generation, and privacy-preserving face blurring.
→ Real-time foot traffic analyticsData Lake Architecture
Serverless data lake simulation using MinIO (local S3), DuckDB, and medallion architecture with data cataloging, schema evolution, and Terraform templates.
→ Medallion architecture data lakeReal-Time Streaming Pipeline
Scalable streaming architecture using Kafka, PySpark Structured Streaming, and Delta Lake for e-commerce clickstream processing with exactly-once semantics.
→ Exactly-once streaming pipelineCI/CD Pipeline for ML
GitHub Actions workflows automating the full ML lifecycle: data validation, model training with performance gates, artifact storage, and multi-environment deployment.
→ Automated ML deployment pipelineML Model Deployment Platform
MLOps pipeline using MLflow for experiment tracking and model registry, FastAPI serving with canary deployment, Kubernetes manifests, and Prometheus/Grafana monitoring.
→ Full MLOps platform with monitoringML Monitoring & Observability
Comprehensive ML monitoring with Evidently AI drift detection, CUSUM degradation detection, Prometheus metrics, Grafana dashboards, and automated drift reporting.
→ Full observability stack with alertingDistributed Training Framework
Multi-GPU training using PyTorch DDP and Horovod, with Ray Tune hyperparameter optimization, mixed precision training, and Weights & Biases experiment tracking.
→ Multi-GPU scaling benchmarksInsurance Analytics Pipeline
End-to-end data science pipeline for insurance claim risk classification with Pandera validation, sklearn-compatible preprocessing, FastAPI inference, and Typer CLI.
→ Validated insurance risk classifierPrompt Evaluation Framework
Testing suite for LLM prompts with multi-model evaluation (GPT-4, Claude), A/B testing with statistical significance, cost optimization, and version control for prompts.
→ Multi-model prompt testing suiteAI Contract Review System
Legal document analysis with PDF parsing, LLM-powered clause identification, per-clause risk scoring, GDPR compliance checking, and side-by-side contract comparison.
→ Automated clause risk analysisAutomated Code Documentation
LLM-powered tool parsing Python and JS/TS codebases via AST/tree-sitter, generating documentation (docstrings, READMEs, module docs) using Claude API with complexity analysis.
→ Auto-generated project documentation