Learn In Demand Cloud Focused Data Engineering Tools

Become a Google Cloud Platform Master by using tools like Spark, Kafka, and Beam to process REAL Big Data datasets
Data Engineering Learning Simplified
This site is an online, self paced platform for students, early career engineers, and well seasoned engineers to learn and strengthen their Data Engineering skills.
Feel secure knowing that you’re learning the most up to date cloud data technologies using curriculum developed and validated by professional Data Engineers.
Why Choose Data Engineering as a career?
1. There are 4x more Data Engineering jobs posted on Indeed vs. Data Science
2. Data Engineers earn on average $132,680 per year according to Indeed
3. In 2019, Data Engineering positions saw a growth of 50% according to the Dice Tech Job Report

Data Engineering Bootcamp

    Are you a...
  • - Recent college graduate
  • - An early career developer
  • - A self taught programmer

Typical Duration
6 months
5-10 hours/week
- Intermediate Python
- Basic SQL
- BASH commands

Understanding the Fundamentals of ETL Pipelines by Ingesting Historical Flight and Passenger Data

This section starts it all! You will be learning the fundamentals of simple data exploration and cleaning using Python Pandas. From local data cleanup you will move onto loading CSV reference flight data into GCP BigQuery for further exploration using SQL. You then move onto Using GCP Dataflow (Apache Beam) to extract, transform, and load (ETL)) 4 years of historical flight data in parallel. To advance your distributed computing knowledge, you will then use GCP Cloud Dataproc (Apache Spark) to transform and load millions of rows of passenger data into GCP BigQuery.

  • Chapter 1: Loading reference Dataset into BigQuery

  • Chapter 2: Loading Flights data using Apache Beam (Google Dataflow)

  • Chapter 3: Processing Passengers using Apache Spark (Google Dataproc)

Tech: Pandas, SQL, Google Cloud Storage, Google Cloud BigQuery, Google Cloud Dataflow (Apache Beam), Google Cloud Dataproc (Apache Spark),

Designing and Monitoring Real-time Ticket Purchase Data

Put on your Architect hat and learn the best practices behind developing logical Data Architecture that will be utilized throughout the rest of the course. You will then use GCP Dataflow (Apache Beam) to stream process real-time flight queries from GCP Pub/Sub (Apache Kafka). Utilizing GCP Dataproc and BigTable, you will develop an Online Transactional Platform (OLTP) to monitor ticket sales.

  • Chapter 4: Putting on our Data Architect Hat!

  • Chapter 5: Real-time Stream Processing of Live Flight Queries with Cloud Pub/Sub

  • Chapter 6: Registering Ticket Sales with Google BigTable

Tech: Google Cloud Pub/Sub (Apache Kafka, Google Cloud Dataflow (Apache Beam), Google Cloud BigTable (Apache HBase,

Automating Processes and Analytics to Determine Ticket Prices

Artificial Intelligence (AI) and Machine Learning (ML) will be levereged in this section to create advanced analytics built on top of your existing data pipeline. Further automation of pipeline processes will be implemented utilizing GCP Cloud Composer (Apache Airflow). Finally, you will complete your data pipeline by creating a Data Hub to expose all the AI data via a REST API.

  • Chapter 7: Advanced Analytics using BigQuery

  • Chapter 8: Building an A/I with BigQuery ML (Machine Learning)

  • Chapter 9: Pipeline Automation with Cloud Composer (Apache Airflow)

  • Chapter 10: Creating a Data Hub, Exporting Data via Google AppEgine (Python Flask)

Tech: SQL, Machine Learning, Google Cloud Bigquery, Google Cloud Bigquery ML, Google Cloud Composer (Apache Airflow), Flask (Python) REST API, Google AppEngine,

Why choose Data Stack Academy?

Program Benefits

Tuition Includes

  • Learning
  • Real world examples and datasets
  • End-to-end data pipeline based project
  • Curriculum developed by industry experts
  • Convenient online self-paced course
  • Community
  • Free to join Discord community
  • Weekly office hours with TuraLabs engineers