● DATA ENGINEER / AI SYSTEMS BUILDER

Engineering Data →
Driving Intelligence

From raw data to real-time insights — I design scalable pipelines, streaming systems, and AI-powered solutions for modern data platforms.

3+Years Experience
10+Projects Completed
1M+Records Analyzed
3Companies Worked
From Data to Intelligent Systems

Architecting data for scale and intelligence.

I design scalable data pipelines, real-time streaming systems, and AI-powered solutions that transform raw data into actionable insights.

My work focuses on building end-to-end platforms using Azure (ADF, ADLS, Event Hub, Databricks) and PySpark, supporting both batch and real-time processing.

I specialize in Medallion architectures (Bronze, Silver, Gold), optimizing ETL pipelines, and creating analytics-ready data models.

I also build Generative AI solutions, including RAG systems and intelligent automation that bridge data engineering with AI.

I enjoy solving complex data challenges and turning data into meaningful business impact.

Data Engineering & ETL

Building scalable data pipelines using Azure, PySpark, and distributed systems for reliable data processing.

Real-Time Streaming

Designing event-driven architectures using Event Hub and Kafka-based streaming pipelines.

Business Intelligence

Creating analytics-ready datasets and dashboards for data-driven decision making.

AI & GenAI Solutions

Developing RAG-based systems and intelligent automation powered by modern AI technologies.

Professional Journey

Engineering Data → Driving Intelligence

Building scalable data platforms, real-time pipelines, and AI-powered systems across cloud environments.

Data/AI Engineer

RTS LabsUSA
Aug 2025 – Mar 2026
  • Designed and maintained ETL pipelines using Azure Data Factory (ADF) integrating property listings and construction data.
  • Improved data quality and reduced ingestion errors by 30%.
  • Built Tableau dashboards for leadership improving KPI tracking efficiency.
  • Optimized SQL queries using CTEs and indexing, reducing execution time by 50%
  • Developed a Generative AI tool using LLM APIs and Python, reducing manual documentation effort by 40%

Data Analyst

Northeastern University BookstoreUSA
Apr 2024 – April 2025
  • Wrote complex SQL queries to process 10,000+ product records improving data accuracy.
  • Reduced data reconciliation time by 30%.
  • Built Power BI dashboards for inventory and sales analysis.
  • Improved stock allocation decisions by 25%.

Data Analyst

Redington GroupUAE
Sep 2022 – Aug 2023
  • Built PySpark pipelines on Databricks for processing large-scale sales data.
  • Improved reporting speed by 20%.
  • Developed demand forecasting models using Python.
  • Automated reporting workflows reducing manual effort by 70%.
  • Designed Tableau dashboards improving business decision-making.

Data Science & Analytics Intern

Techtastic TechnologiesIndia
Jan 2022 – Jun 2022
  • Built K-Means clustering models for customer segmentation.
  • Developed Power BI dashboards improving sales targeting by 25%.
  • Performed EDA and statistical analysis improving operational efficiency by 20%.
Featured Work

Engineering Data. Delivering Intelligence.

Real-world projects across data engineering, analytics, cloud platforms, and Generative AI — designed for scale, insight, and automation.

Cloud Data Engineering

Uber Real-Time Data Engineering Pipeline

Built an end-to-end real-time data pipeline simulating Uber ride data using Azure Event Hub, Databricks, and Medallion architecture to enable scalable analytics.

Ingested streaming ride data using Azure Event Hub and processed it using Apache Spark (Databricks).

Designed and implemented Medallion architecture (Bronze, Silver, Gold layers) for scalable data transformation.

Built ETL pipelines using Azure Data Factory and stored structured data in Azure Data Lake.

Modeled analytical datasets using Star Schema for efficient querying and reporting.

Key Impact

Delivered a production-ready real-time data platform enabling scalable analytics, optimized data processing, and actionable business insights.

PythonFastAPIAzure Event HubAzure Data FactoryAzure Data LakeDatabricksApache SparkDelta LakeStreaming PipelinesETL PipelinesMedallion ArchitectureStar SchemaData WarehousingGitHub
Generative AI / LLM Engineering

GenAI Supply Chain Intelligence System (RAG-based)

Built an AI-powered system to analyze supply chain data and documents using a RAG pipeline, enabling intelligent querying and real-time operational insights.

Designed a RAG-based system to retrieve and generate context-aware responses from supply chain data and documents.

Implemented document ingestion pipeline with chunking, embeddings, and vector storage for efficient retrieval.

Enabled semantic search to identify delays, inefficiencies, and trends in supply chain operations.

Built FastAPI-based backend to serve real-time AI responses.

Improved response accuracy using prompt engineering and optimized retrieval strategies.

Key Impact

Transformed supply chain data into an intelligent decision-support system, enabling faster issue detection, improved visibility, and data-driven operational insights.

PythonFastAPIOpenAI / LLMLangChainRAG PipelineVector DatabaseEmbeddingsSemantic SearchPrompt EngineeringSupply Chain AnalyticsGenerative AI
Predictive Analytics / AI

AI-Powered Football Match Outcome Predictor

Built an interactive football analytics and match prediction platform using Python and Streamlit to analyze English Premier League data, visualize team performance, and forecast match outcomes.

Developed a multi-tab Streamlit application for league overview, team performance, head-to-head analysis, and match prediction.

Performed data preprocessing and feature preparation using Pandas and NumPy on historical English Premier League datasets.

Integrated a machine learning prediction pipeline to estimate match outcomes, win probabilities, goals, and clean sheets.

Designed interactive visualizations using Matplotlib and Seaborn to surface trends, team insights, and comparative performance.

Key Impact

Delivered a user-friendly sports intelligence dashboard that combines analytics, visualization, and AI-based forecasting to support data-driven match insights and outcome prediction.

PythonStreamlitPandasNumPyMatplotlibSeabornMachine LearningPredictive AnalyticsJoblibData VisualizationSports Analytics
Data Engineering / Analytics

Amazon Last-Mile Delivery Optimization (AWS Pipeline)

Built an AWS-based analytics pipeline to analyze 100,000+ delivery records and optimize last-mile logistics performance using data-driven insights.

Designed end-to-end AWS pipeline using S3, Glue, and Redshift to process and analyze large-scale delivery datasets.

Analyzed 100,000+ delivery records to identify key inefficiencies in last-mile logistics operations.

Discovered weather conditions increased missed deliveries by 60% and peak-hour deliveries had 2x higher failure rates.

Applied Lean Six Sigma DMAIC methodology to identify root causes and optimize delivery workflows.

Developed Tableau dashboards to visualize KPIs, delivery trends, and operational bottlenecks.

Key Impact

Recommended optimizations that reduced missed deliveries by 33%, improved fuel efficiency by 15%, and increased delivery productivity by 25%.

PythonAWS S3AWS GlueAWS RedshiftSQLTableauData AnalysisETL PipelinesLean Six SigmaLogistics Analytics
Data Analytics / Urban Intelligence

Urban Road Safety & Crash Analytics Platform

Analyzed 3.2M+ traffic crash records across Austin, Chicago, and New York to identify accident patterns, high-risk zones, and key safety insights using data profiling, transformation, and visualization tools.

Processed and analyzed 3.2M+ crash records across 131 variables from multiple city datasets.

Performed data cleaning and transformation using Talend and Alteryx to ensure data quality and consistency.

Conducted dataset profiling using Python (ydata-profiling) to identify missing values, anomalies, and data inconsistencies.

Designed dimensional data models to support structured analysis and reporting.

Built interactive dashboards using Power BI and Tableau to visualize accident trends, high-risk zones, and contributing factors.

Analyzed pedestrian involvement, time-based trends, and regional accident distribution to uncover safety risks.

Key Impact

Delivered data-driven insights to support urban traffic planning, improve road safety strategies, and enable policy-level decision-making for reducing accident risks.

Pythonydata-profilingAlteryxTalendPower BITableauData CleaningData ProfilingData ModelingEDAUrban AnalyticsData Visualization
Core Competencies

Full-Stack Data Expertise

Spanning analytics, engineering, intelligence, and AI automation

Data Engineering

ETL / ELT PipelinesData ModelingSchema DesignData QualityPipeline OrchestrationCDCPySpark

Cloud & Platforms

Azure (ADF, Data Factory)AWS (S3, EC2, Glue, Redshift)GCPDatabricks

Analytics & BI

TableauPower BIStreamlitA/B TestingHypothesis TestingGoogle AnalyticsExcel

Databases & Warehousing

SnowflakeMySQLPostgreSQLNoSQLPineconeMongoDBOracle SQLAlteryx

Programming

PythonSQLRJavaScriptJavaHTML/CSS

AI / Generative AI

OpenAI GPT-4RAGLangChainLLM APIPrompt EngineeringTensorFlowPyTorchHugging Face

Tools & Workflow

DockerGitApache AirflowApache SparkHadoopDBTLookerAgile/Scrum
Education

Focused on data engineering, analytics, and intelligent systems with hands-on experience in building scalable data pipelines and cloud-based solutions.

Coursework
ETL PipelinesData WarehousingCloud ComputingBig Data SystemsMachine LearningDeep LearningNatural Language ProcessingGenerative AIRAG SystemsData Structures & Algorithms

MS in Information Systems

Northeastern University
Boston, MA
Sept 2023 – May 2025
Graduated

B.Tech in Information Technology

Mumbai University
Mumbai, India
Jun 2020 – May 2023
Graduated
Connect

Let's Work Together

Ready to build scalable data systems, real-time pipelines, or AI-powered solutions? Let’s connect and turn ideas into measurable impact.

© 2026 Nikita Gupta · Designing systems where data meets intelligence.

Built in Boston.