Resume

< Hello, World! />

Kranthi Chaithanya Thota

$

Data Engineer with 4 years designing and scaling cloud-native data platforms on GCP · AWS. Proven expertise in ETL/ELT architecture, data modeling, and GenAI-driven applications — delivering production-ready solutions across government, research, and enterprise domains.

GCP Data Engineer Terraform Certified M.S. 4.0 GPA GenAI Engineer
scroll

01. About Me

I'm a Data Engineer at UNC Chapel Hill's Environmental Finance Center. I build data pipelines, warehouses, and cloud infrastructure on GCP and AWS — the systems that turn raw, messy data into something analysts can actually rely on.

Previously at Clarkson CEM and Egen (a GCP partner agency), I've shipped production systems across Snowflake, dbt, Airflow, and Terraform — spanning geospatial pipelines for government clients, GenAI document extraction tools, and real-time streaming architectures.

I hold Google Cloud Data Engineer and Terraform certifications, and an M.S. in Applied Data Science (4.0 GPA) from Clarkson University.

Chapel Hill, NC Open to full-time roles

02. Technical Skills

Programming & Scripting

Python SQL PySpark JavaScript PowerShell Bash

Cloud & Infrastructure

GCP AWS BigQuery Dataflow Cloud Run Vertex AI Pub/Sub AWS EMR Redshift Lambda Terraform Docker Kubernetes

Data Engineering

ETL / ELT dbt Snowflake Data Warehousing Apache Kafka Delta Lake Apache Airflow Data Modeling API Integration Data Masking CDC

GenAI & LLMs

PyTorch RAG Transformers BERT / LLMs LangChain NLP Hugging Face

DevOps & Version Control

CI/CD Pipelines GitHub Actions Git / GitHub Azure DevOps Cloud Build Jenkins

GIS & Databases

PostgreSQL PostGIS ArcGIS Pro QGIS MongoDB MySQL SQL Server TimescaleDB

03. Experience

EFC
Current

Data Engineer

Environmental Finance Center · UNC Chapel Hill

Sep 2025 – Present Chapel Hill, NC
PySparkAWS EMRDelta Lake Star SchemaCI/CDGitHub Actions PythonGenAI
CEM

Data Engineer

Clarkson CEM Group

May – Aug 2025 Potsdam, NY
PythonETLPower BI ArcGIS ProData MaskingPostgreSQL
EGN

Associate Data Engineer

Egen (formerly SpringML)

Jan 2022 – Dec 2023 Hyderabad, India
GCPBigQuerySnowflake dbtTerraformAirflow QGISDocument AI

04. Projects

Real-Time Reddit Stock Sentiment Tracker

Low-latency Python streaming pipeline processing 100+ comments/sec with tumbling window logic. Surfaces breakout tickers in under 5 seconds — 95% latency reduction vs. batch via TimescaleDB time-series storage.

  • Python
  • TimescaleDB
  • Reddit API
  • Streaming

NYPA EV Charger Geospatial Pipeline

End-to-end geospatial pipeline for NY Power Authority EV charger site planning. Processed 10+ public GIS datasets via QGIS, BigQuery, and Dataflow; scoring algorithm and map-based decision tool used by government stakeholders.

  • QGIS
  • BigQuery
  • Dataflow
  • Python
  • GIS

HAVK Mladost Sports Club Data Infrastructure

Normalized PostgreSQL schema with 15+ interrelated tables and automated Python ETL pipeline. Replaced fragmented Excel workflows — reduced manual data handling by 95% and improved membership renewal rates by 30% via Looker Studio KPI dashboards.

  • PostgreSQL
  • Python
  • Looker Studio
  • ETL

Serverless Data Job Deployment Framework

Reusable IaC framework cutting new data job deployment time by 80% (3+ hours → <10 min) across 3 teams. Standardized IAM roles, secret management, and compliance enforcement built directly into Terraform modules.

  • Terraform
  • GCP
  • GitHub Actions
  • IaC

AI Therapeutic Chatbot

Full-stack chatbot (Ollama/Gemma, Flask, Voice I/O) enhancing senior mental well-being via personalized conversational AI.

  • Python
  • LLM
  • Flask
  • Ollama

Healthcare RAG QA System

RAG pipeline (Transformers, Vector DBs) for accurate, context-aware healthcare question answering with source citation.

  • Hugging Face
  • RAG
  • Vector DB
  • NLP

NYSERDA Grant Pipeline & NER (90% F1)

Automated scraping (BeautifulSoup) and NER (fine-tuned Hugging Face Transformer) across 10,000+ grant records with 90% F1-score on organization, funding, and project entities.

  • BeautifulSoup
  • Hugging Face
  • PostgreSQL
  • NER

DeepFake Detection (85%+ Accuracy)

EfficientNet model with 15% enhancement via Adaptive ELA preprocessing for robust deepfake image classification.

  • EfficientNet
  • TensorFlow
  • PyTorch
  • CV

YOLOv8 Parking Detection (95% Acc)

Fine-tuned YOLOv8 + OpenCV pipeline for real-time parking occupancy detection from UAV drone footage; projected 20% efficiency improvement in campus resource allocation.

  • YOLOv8
  • OpenCV
  • Python
  • UAV

Patent Data Pipeline (Document AI)

Scalable Document AI pipeline ingesting 2,000+ patent PDFs with daily ingestion of 50+ new docs; generated 25+ KPIs deployed via Cloud Run, reducing manual review time by 80%.

  • GCP
  • Document AI
  • BigQuery
  • Cloud Run

Medallion Data Warehouse (Snowflake + dbt)

Architected medallion DW in Snowflake using dbt (40+ models, GitHub Actions CI); implemented Streams/Tasks for CDC, reducing ELT latency from hourly batch to sub-5-minute incremental loads.

  • Snowflake
  • dbt
  • CDC
  • GitHub Actions

IPEDS Legacy Data Migration

Automated migration of 20+ years of legacy IPEDS data into a centralized warehouse; built 30+ KPI dashboards benchmarking peer institutions, reducing manual retrieval time by 60%.

  • Python
  • SQL Server
  • Power BI
  • ETL

Multi-Cloud Infrastructure (IaC)

Provisioned production infrastructure (VPC, compute, DBs) across GCP/AWS via Terraform; automated 50+ resources with CI/CD via Cloud Build, reducing setup time by 87%.

  • Terraform
  • GCP
  • AWS
  • Multi-Cloud

A2A Roadkill Hotspot Analysis

Statistical analysis on roadkill data using drone imagery and spatial statistical methods (ArcGIS Pro) for wildlife corridor hotspot identification.

  • ArcGIS Pro
  • R
  • Spatial Stats
  • GIS

Town of Colton Complete Streets Plan

Developed comprehensive transportation plan via survey analysis and GIS visualization using ArcGIS Pro and QGIS.

  • ArcGIS Pro
  • QGIS
  • Survey Analysis
  • GIS

BRFSS Health Risk Analysis (80% Acc)

Analyzed 400k+ health records (ANOVA, Chi-Sq) to identify key cancer risk factors using TensorFlow classification models.

  • Python
  • TensorFlow
  • Statistics
  • Healthcare

Clarkson Traffic/Parking Dashboard

Interactive Tableau dashboards visualizing campus traffic and parking data from drone and metrocount sensors.

  • Tableau
  • GIS
  • Python
  • Dashboard

Automated Instagram Bot (45% Engagement)

GPT-3.5 pipeline automating Instagram posts via API with NLP trend analysis, increasing engagement by 45%.

  • Python
  • GPT-3
  • API
  • NLP

OTT Web Platform (Netflix Clone)

Responsive streaming web application built with React, Firebase, and Node.js with real-time database and authentication.

  • React
  • Node.js
  • Firebase
  • Web Dev

Wine Quality Prediction

EDA and predictive modeling to determine physicochemical factors influencing wine quality using ensemble ML algorithms.

  • Python
  • Scikit-learn
  • Pandas
  • ML

05. Education & Certifications

M.S. Applied Data Science

Clarkson University · Potsdam, NY

Jan 2024 – Aug 2025

GPA: 4.0 / 4.0

Data Warehousing · Big Data Architecture · Cloud Computing · Data Mining · GIS & Spatial Analysis

B.Tech · Electronics & Communication Engineering

Kakatiya Institute of Technology & Science · Warangal, India

Aug 2018 – May 2022

GPA: 3.54 / 4.0

Certifications

Google Cloud Professional Data Engineer

Google Cloud

Google Cloud Associate Cloud Engineer

Google Cloud

HashiCorp Certified Terraform Associate

HashiCorp

06. Contact

Let's build something great together.

I'm actively seeking full-time Data Engineering and Cloud Engineering roles where I can contribute expertise in cloud-native pipelines, GenAI applications, and scalable data architecture. Whether you have a specific project in mind or just want to connect — my inbox is open.

Say Hello
contact.sh

$ whoami

Kranthi Chaithanya Thota

$ location

Chapel Hill, NC

$ email

$ status

open to opportunities |