ncibtep@nih.gov

Bioinformatics Training and Education Program

GCP Fundamentals - Big Data & Machine Learning

GCP Fundamentals - Big Data & Machine Learning

 When: Oct. 15th, 2021 9:00 am - 4:00 pm

This class has ended.
To Know
  • Where: Online Webinar
  • Organized By: NIH STRIDES

About this Class

This course will introduce you to Google Cloud's big data and machine learning functions. You'll begin with a quick overview of Google Cloud and then dive deeper into its data processing capabilities.

Prerequisites

Roughly one year of experience with one or more of the following: ● A common query language such as SQL. ● Extract, transform, and load activities. ● Data modeling. ● Machine learning and/or statistics. ● Programming in Python.

Objectives

● Identify the purpose and value of the key Big Data and Machine Learning products in Google Cloud. ● Use Cloud SQL and Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud. ● Employ BigQuery and Cloud SQL to carry out interactive data analysis. ● Choose between different data processing products in Google Cloud. ● Create ML models with BigQuery ML, ML APIs, and AutoML.

Audience

● Data analysts, data scientists, and business analysts who are getting started with Google Cloud. ● Individuals responsible for designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, querying datasets, visualizing query results, and creating reports. ● Executives and IT decision makers evaluating Google Cloud for use by data scientists.

Course Outline

The course includes presentations, demonstrations, and hands-on labs.

Module 1: Introduction to Google Cloud

  • Identify the different aspects of Google Cloud’s infrastructure.
  • Identify the big data and ML products that form Google Cloud.

Module 2: Recommending Products Using Cloud SQL and Spark

  • Review how businesses use recommendation models.
  • Evaluate how and where you will compute and store your housing rental model results.
  • Analyze how running Hadoop in the cloud with Dataproc can enable scale.
  • Evaluate different approaches for storing recommendation data off-cluster.

Module 3: Predicting Visitor Purchases Using BigQuery ML

  • Analyze big data at scale with BigQuery.
  • Learn how BigQuery processes queries and stores data at scale.
  • Walkthrough key ML terms: features, labels, training data.
  • Evaluate the different types of models for structured datasets.
  • Create custom ML models with BigQuery ML.

Module 4: Real-time Dashboards with Pub/Sub, Dataflow, and Google Data Studio

  • Identify modern data pipeline challenges and how to solve them at scale with Dataflow.
  • Design streaming pipelines with Apache Beam.
  • Build collaborative real-time dashboards with Data Studio.

Module 5: Deriving Insights from Unstructured Data Using Machine Learning

  • Evaluate how businesses use unstructured ML models and how the models work.
  • Choose the right approach for machine learning models between pre-built and custom.
  • Create a high-performing custom image classification model with no code using AutoML.

Module 6: Summary

  • Recap of key learning points.
  • Resources.