// Data Engineer & ML Practitioner

Siddhesh
Chavan

I build scalable data pipelines, ML models, and AI-powered systems that turn raw data into real decisions. MS in Data Science from Pace University — currently engineering data infrastructure at Community Dreams Foundation.

Siddhesh Chavan

NYC skyline · 2024

Open to new opportunities
Python·SQL·Azure·AWS·GCP·FastAPI·PostgreSQL·ETL Pipelines·Machine Learning·LangChain·CrewAI·Docker·BigQuery·Kafka·Tableau·RAG· Python·SQL·Azure·AWS·GCP·FastAPI·PostgreSQL·ETL Pipelines·Machine Learning·LangChain·CrewAI·Docker·BigQuery·Kafka·Tableau·RAG·

Data pipelines,
ML models &
AI systems.

I'm a Data Engineer with a background in Electronics & Telecom and a Master's in Data Science. I specialize in building robust ETL/ELT pipelines, designing data warehouses, and applying ML to solve real-world problems.

Currently at Community Dreams Foundation, I work on Azure-based pipelines processing millions of records. I'm passionate about the intersection of data engineering and AI — from classical ML to LLM-powered agents with CrewAI and LangChain.

Pace University — New York, NY
MS in Data Science | GPA: 3.52 / 4.0
May 2025
K.J. Somaiya Institute of Technology — Mumbai
B.Tech in Electronics & Telecommunication | GPA: 8.96 / 10
June 2023

Data & Cloud

  • Python / SQL Expert
  • ETL Pipelines / Data Warehousing Expert
  • Azure / AWS / GCP Advanced
  • PostgreSQL / BigQuery / Redis Advanced
  • Docker / GitHub Actions Advanced

AI & Machine Learning

  • LangChain / LangGraph / RAG Advanced
  • CrewAI / MCP Agents Advanced
  • Random Forest / Regression / KNN Advanced
  • FastAPI / Flask API / REST APIs Advanced
  • Tableau / Looker Studio Intermediate

001

Patient Health Management System

Healthcare backend system helping clinics, doctors, and patients manage medical records, appointments, and health data efficiently.

  • Reduced manual appointment handling by 50–60% through automated slot generation
  • Minimized scheduling errors by 90% and cut operational costs by 25–35%
  • AI-agent integration using CrewAI and MCP for intelligent scheduling
FastAPICrewAIMCPPostgreSQLPython

002

NYC Trip Data & Business Insights

End-to-end data pipeline for NYC taxi data, transforming 100k+ raw records from GCS into BigQuery analytics and interactive dashboards.

  • Identified 15% increase in peak-hour rides using optimized BigQuery SQL
  • Improved stakeholder reporting speed by 60% via Looker Studio dashboards
  • Automated ingestion and transformation using Mage AI orchestration
Mage AIBigQueryGCPLooker StudioSQL

003

NVidia Stock Price Prediction

Random Forest model for stock price prediction with advanced feature engineering using lagged prices and moving averages.

  • Achieved R² of 0.85, capturing 85% of price variability
  • Improved accuracy by 10% through Close_Lag1/2 and MA5/10/20 features
PythonRandom ForestFeature EngineeringSklearn

004

Big Data Airline Delay Analysis

Large-scale airline performance analysis using AWS infrastructure, predicting flight delays with ensemble ML models.

  • 93% accuracy & 90% precision — Random Forest outperformed all other models
  • Used AWS S3 for storage and EC2 for compute with SQL-based CLI queries
AWS S3AWS EC2Random ForestLogistic RegressionSQL
Current Sep 2025 — Present
New York, NY

Data Engineer

Community Dreams Foundation

  • Implemented Azure-based data pipeline processing millions of records into data warehouses for analytics & reporting.
  • Improved pipeline performance by 20–30% through optimization and scalability enhancements with comprehensive documentation.
  • Developed well-structured data models that reduced query latency and improved analytical efficiency.
  • Executed end-to-end testing and deployment workflows to ensure high data quality and production readiness.
Internship Jun 2021 — Aug 2021
Mumbai, India

IoT Intern

Verzeo

  • Developed 2 IoT projects integrating sensor data collection and real-time transmission — improved data accuracy by 20%.
  • Processed and analyzed 1,500+ sensor data points using Python for automation tasks with 87% execution accuracy.
  • Built 2 Google Assistant voice-controlled applications to trigger IoT devices based on 10+ predefined conditions.
  • Created IFTTT applets for automating IoT workflows with 5+ data-driven decision triggers.

Let's build
something.

Open to full-time data engineering & ML roles, interesting projects, and conversations about hard data problems.