● DATA ENGINEER / AI SYSTEMS BUILDER
Engineering Data →
Driving Intelligence
From raw data to real-time insights — I design scalable pipelines, streaming systems, and AI-powered solutions for modern data platforms.
Architecting data for
scale and intelligence.
I design scalable data pipelines, real-time streaming systems, and AI-powered solutions that transform raw data into actionable insights.
My work focuses on building end-to-end platforms using Azure (ADF, ADLS, Event Hub, Databricks) and PySpark, supporting both batch and real-time processing.
I specialize in Medallion architectures (Bronze, Silver, Gold), optimizing ETL pipelines, and creating analytics-ready data models.
I also build Generative AI solutions, including RAG systems and intelligent automation that bridge data engineering with AI.
I enjoy solving complex data challenges and turning data into meaningful business impact.
Data Engineering & ETL
Building scalable data pipelines using Azure, PySpark, and distributed systems for reliable data processing.
Real-Time Streaming
Designing event-driven architectures using Event Hub and Kafka-based streaming pipelines.
Business Intelligence
Creating analytics-ready datasets and dashboards for data-driven decision making.
AI & GenAI Solutions
Developing RAG-based systems and intelligent automation powered by modern AI technologies.
Engineering Data →
Driving Intelligence
Building scalable data platforms, real-time pipelines, and AI-powered systems across cloud environments.
Data/AI Engineer
- ▹Designed and maintained ETL pipelines using Azure Data Factory (ADF) integrating property listings and construction data.
- ▹Improved data quality and reduced ingestion errors by 30%.
- ▹Built Tableau dashboards for leadership improving KPI tracking efficiency.
- ▹Optimized SQL queries using CTEs and indexing, reducing execution time by 50%
- ▹Developed a Generative AI tool using LLM APIs and Python, reducing manual documentation effort by 40%
Data Analyst
- ▹Wrote complex SQL queries to process 10,000+ product records improving data accuracy.
- ▹Reduced data reconciliation time by 30%.
- ▹Built Power BI dashboards for inventory and sales analysis.
- ▹Improved stock allocation decisions by 25%.
Data Analyst
- ▹Built PySpark pipelines on Databricks for processing large-scale sales data.
- ▹Improved reporting speed by 20%.
- ▹Developed demand forecasting models using Python.
- ▹Automated reporting workflows reducing manual effort by 70%.
- ▹Designed Tableau dashboards improving business decision-making.
Data Science & Analytics Intern
- ▹Built K-Means clustering models for customer segmentation.
- ▹Developed Power BI dashboards improving sales targeting by 25%.
- ▹Performed EDA and statistical analysis improving operational efficiency by 20%.
Engineering Data.
Delivering Intelligence.
Real-world projects across data engineering, analytics, cloud platforms, and Generative AI — designed for scale, insight, and automation.
Uber Real-Time Data Engineering Pipeline
Built an end-to-end real-time data pipeline simulating Uber ride data using Azure Event Hub, Databricks, and Medallion architecture to enable scalable analytics.
Ingested streaming ride data using Azure Event Hub and processed it using Apache Spark (Databricks).
Designed and implemented Medallion architecture (Bronze, Silver, Gold layers) for scalable data transformation.
Built ETL pipelines using Azure Data Factory and stored structured data in Azure Data Lake.
Modeled analytical datasets using Star Schema for efficient querying and reporting.
Delivered a production-ready real-time data platform enabling scalable analytics, optimized data processing, and actionable business insights.
GenAI Supply Chain Intelligence System (RAG-based)
Built an AI-powered system to analyze supply chain data and documents using a RAG pipeline, enabling intelligent querying and real-time operational insights.
Designed a RAG-based system to retrieve and generate context-aware responses from supply chain data and documents.
Implemented document ingestion pipeline with chunking, embeddings, and vector storage for efficient retrieval.
Enabled semantic search to identify delays, inefficiencies, and trends in supply chain operations.
Built FastAPI-based backend to serve real-time AI responses.
Improved response accuracy using prompt engineering and optimized retrieval strategies.
Transformed supply chain data into an intelligent decision-support system, enabling faster issue detection, improved visibility, and data-driven operational insights.
AI-Powered Football Match Outcome Predictor
Built an interactive football analytics and match prediction platform using Python and Streamlit to analyze English Premier League data, visualize team performance, and forecast match outcomes.
Developed a multi-tab Streamlit application for league overview, team performance, head-to-head analysis, and match prediction.
Performed data preprocessing and feature preparation using Pandas and NumPy on historical English Premier League datasets.
Integrated a machine learning prediction pipeline to estimate match outcomes, win probabilities, goals, and clean sheets.
Designed interactive visualizations using Matplotlib and Seaborn to surface trends, team insights, and comparative performance.
Delivered a user-friendly sports intelligence dashboard that combines analytics, visualization, and AI-based forecasting to support data-driven match insights and outcome prediction.
Amazon Last-Mile Delivery Optimization (AWS Pipeline)
Built an AWS-based analytics pipeline to analyze 100,000+ delivery records and optimize last-mile logistics performance using data-driven insights.
Designed end-to-end AWS pipeline using S3, Glue, and Redshift to process and analyze large-scale delivery datasets.
Analyzed 100,000+ delivery records to identify key inefficiencies in last-mile logistics operations.
Discovered weather conditions increased missed deliveries by 60% and peak-hour deliveries had 2x higher failure rates.
Applied Lean Six Sigma DMAIC methodology to identify root causes and optimize delivery workflows.
Developed Tableau dashboards to visualize KPIs, delivery trends, and operational bottlenecks.
Recommended optimizations that reduced missed deliveries by 33%, improved fuel efficiency by 15%, and increased delivery productivity by 25%.
Urban Road Safety & Crash Analytics Platform
Analyzed 3.2M+ traffic crash records across Austin, Chicago, and New York to identify accident patterns, high-risk zones, and key safety insights using data profiling, transformation, and visualization tools.
Processed and analyzed 3.2M+ crash records across 131 variables from multiple city datasets.
Performed data cleaning and transformation using Talend and Alteryx to ensure data quality and consistency.
Conducted dataset profiling using Python (ydata-profiling) to identify missing values, anomalies, and data inconsistencies.
Designed dimensional data models to support structured analysis and reporting.
Built interactive dashboards using Power BI and Tableau to visualize accident trends, high-risk zones, and contributing factors.
Analyzed pedestrian involvement, time-based trends, and regional accident distribution to uncover safety risks.
Delivered data-driven insights to support urban traffic planning, improve road safety strategies, and enable policy-level decision-making for reducing accident risks.
Full-Stack Data Expertise
Spanning analytics, engineering, intelligence, and AI automation
Data Engineering
Cloud & Platforms
Analytics & BI
Databases & Warehousing
Programming
AI / Generative AI
Tools & Workflow
Focused on data engineering, analytics, and intelligent systems with hands-on experience in building scalable data pipelines and cloud-based solutions.
MS in Information Systems
B.Tech in Information Technology
© 2026 Nikita Gupta · Designing systems where data meets intelligence.
Built in Boston.