This section starts it all! You will be learning the fundamentals of simple data exploration and cleaning using Python Pandas. From local data cleanup you will move onto loading CSV reference flight data into GCP BigQuery for further exploration using SQL. You then move onto Using GCP Dataflow (Apache Beam) to extract, transform, and load (ETL)) 4 years of historical flight data in parallel. To advance your distributed computing knowledge, you will then use GCP Cloud Dataproc (Apache Spark) to transform and load millions of rows of passenger data into GCP BigQuery.
Chapter 1: Loading reference Dataset into BigQuery
Chapter 2: Loading Flights data using Apache Beam (Google Dataflow)
Chapter 3: Processing Passengers using Apache Spark (Google Dataproc)
Tech: Pandas, SQL, Google Cloud Storage, Google Cloud BigQuery, Google Cloud Dataflow (Apache Beam), Google Cloud Dataproc (Apache Spark),
Put on your Architect hat and learn the best practices behind developing logical Data Architecture that will be utilized throughout the rest of the course. You will then use GCP Dataflow (Apache Beam) to stream process real-time flight queries from GCP Pub/Sub (Apache Kafka). Utilizing GCP Dataproc and BigTable, you will develop an Online Transactional Platform (OLTP) to monitor ticket sales.
Chapter 4: Putting on our Data Architect Hat!
Chapter 5: Real-time Stream Processing of Live Flight Queries with Cloud Pub/Sub
Chapter 6: Registering Ticket Sales with Google BigTable
Tech: Google Cloud Pub/Sub (Apache Kafka, Google Cloud Dataflow (Apache Beam), Google Cloud BigTable (Apache HBase,
Artificial Intelligence (AI) and Machine Learning (ML) will be levereged in this section to create advanced analytics built on top of your existing data pipeline. Further automation of pipeline processes will be implemented utilizing GCP Cloud Composer (Apache Airflow). Finally, you will complete your data pipeline by creating a Data Hub to expose all the AI data via a REST API.
Chapter 7: Advanced Analytics using BigQuery
Chapter 8: Building an A/I with BigQuery ML (Machine Learning)
Chapter 9: Pipeline Automation with Cloud Composer (Apache Airflow)
Chapter 10: Creating a Data Hub, Exporting Data via Google AppEgine (Python Flask)
Tech: SQL, Machine Learning, Google Cloud Bigquery, Google Cloud Bigquery ML, Google Cloud Composer (Apache Airflow), Flask (Python) REST API, Google AppEngine,