Data Engineer
Environmental Finance Center · UNC Chapel Hill
I'm a Data Engineer at UNC Chapel Hill's Environmental Finance Center. I build data pipelines, warehouses, and cloud infrastructure on GCP and AWS — the systems that turn raw, messy data into something analysts can actually rely on.
Previously at Clarkson CEM and Egen (a GCP partner agency), I've shipped production systems across Snowflake, dbt, Airflow, and Terraform — spanning geospatial pipelines for government clients, GenAI document extraction tools, and real-time streaming architectures.
I hold Google Cloud Data Engineer and Terraform certifications, and an M.S. in Applied Data Science (4.0 GPA) from Clarkson University.
Environmental Finance Center · UNC Chapel Hill
Clarkson CEM Group
Egen (formerly SpringML)
Low-latency Python streaming pipeline processing 100+ comments/sec with tumbling window logic. Surfaces breakout tickers in under 5 seconds — 95% latency reduction vs. batch via TimescaleDB time-series storage.
End-to-end geospatial pipeline for NY Power Authority EV charger site planning. Processed 10+ public GIS datasets via QGIS, BigQuery, and Dataflow; scoring algorithm and map-based decision tool used by government stakeholders.
Normalized PostgreSQL schema with 15+ interrelated tables and automated Python ETL pipeline. Replaced fragmented Excel workflows — reduced manual data handling by 95% and improved membership renewal rates by 30% via Looker Studio KPI dashboards.
Reusable IaC framework cutting new data job deployment time by 80% (3+ hours → <10 min) across 3 teams. Standardized IAM roles, secret management, and compliance enforcement built directly into Terraform modules.
Full-stack chatbot (Ollama/Gemma, Flask, Voice I/O) enhancing senior mental well-being via personalized conversational AI.
RAG pipeline (Transformers, Vector DBs) for accurate, context-aware healthcare question answering with source citation.
Automated scraping (BeautifulSoup) and NER (fine-tuned Hugging Face Transformer) across 10,000+ grant records with 90% F1-score on organization, funding, and project entities.
EfficientNet model with 15% enhancement via Adaptive ELA preprocessing for robust deepfake image classification.
Fine-tuned YOLOv8 + OpenCV pipeline for real-time parking occupancy detection from UAV drone footage; projected 20% efficiency improvement in campus resource allocation.
Scalable Document AI pipeline ingesting 2,000+ patent PDFs with daily ingestion of 50+ new docs; generated 25+ KPIs deployed via Cloud Run, reducing manual review time by 80%.
Architected medallion DW in Snowflake using dbt (40+ models, GitHub Actions CI); implemented Streams/Tasks for CDC, reducing ELT latency from hourly batch to sub-5-minute incremental loads.
Automated migration of 20+ years of legacy IPEDS data into a centralized warehouse; built 30+ KPI dashboards benchmarking peer institutions, reducing manual retrieval time by 60%.
Provisioned production infrastructure (VPC, compute, DBs) across GCP/AWS via Terraform; automated 50+ resources with CI/CD via Cloud Build, reducing setup time by 87%.
Statistical analysis on roadkill data using drone imagery and spatial statistical methods (ArcGIS Pro) for wildlife corridor hotspot identification.
Developed comprehensive transportation plan via survey analysis and GIS visualization using ArcGIS Pro and QGIS.
Analyzed 400k+ health records (ANOVA, Chi-Sq) to identify key cancer risk factors using TensorFlow classification models.
Interactive Tableau dashboards visualizing campus traffic and parking data from drone and metrocount sensors.
GPT-3.5 pipeline automating Instagram posts via API with NLP trend analysis, increasing engagement by 45%.
Responsive streaming web application built with React, Firebase, and Node.js with real-time database and authentication.
Clarkson University · Potsdam, NY
Jan 2024 – Aug 2025
Data Warehousing · Big Data Architecture · Cloud Computing · Data Mining · GIS & Spatial Analysis
Kakatiya Institute of Technology & Science · Warangal, India
Aug 2018 – May 2022
Google Cloud Professional Data Engineer
Google Cloud
Google Cloud Associate Cloud Engineer
Google Cloud
HashiCorp Certified Terraform Associate
HashiCorp
I'm actively seeking full-time Data Engineering and Cloud Engineering roles where I can contribute expertise in cloud-native pipelines, GenAI applications, and scalable data architecture. Whether you have a specific project in mind or just want to connect — my inbox is open.
Say Hello$ whoami
Kranthi Chaithanya Thota
$ location
Chapel Hill, NC
$ status
open to opportunities |