Build predictions
The fourth data operation of this tutorial will consist in analyzing our data with a machine learning model.
πΊοΈ Overview
During this step, we will create a clustering machine learning model using BigQuery ML. Then, we will aggregate all our data into one BigQuery table and use our new model to analyze it.
π€ Create your machine learning model
Go to the BigQuery web UI in the Cloud Console.
In the navigation panel, in the Resources section, select your project and your dataset.
Enter the following SQL query in the Query editor text area:
Click Run.
The query takes several minutes to complete. After the first iteration is complete, your model (sample_model) appears in the navigation panel of the BigQuery web UI.
You can observe the model as it's being trained by viewing the Training tab in the BigQuery web UI.
π Create your configuration files
Create the JSON file that configures the data pipeline operation
Access your tailer-demo folder.
Inside, create a folder named 4-Build-predictions for this new step.
In this folder, create a JSON file named tailer-demo-build-predictions.json for your data operation.
Copy the following contents into your file:
Edit the following values: βΎ Replace my-gcp-project-id with the ID of the GCP project containing your BigQuery dataset. βΎ Replace my-gbq-dataset with the name of your working dataset.
Create the JSON file that triggers the workflow
Inside the 4-Build-predictions folder, create a file named workflow.json.
Copy the following contents into your file:
Create SQL files
Inside the 4-Build-predictions folder, create the following files:\
βΎ iowa_liquor_agg_store.sql
βΎ store_clustering.sql
Copy the following contents into the iowa_liquor_agg_store.sql file:
Replace my-gbq-dataset with the name of your working dataset.
Copy the following contents into the store_clustering.sql file:
Replace my-gbq-dataset with the name of your working dataset.
βΆοΈ Deploy the data operation
Once your files are ready, you can deploy the data operation:
Access your working folder by running the following command:
To deploy the data operation, run the following command:
To trigger the workflow, run the following command:
Your data operation is now deployed, which means all your data will shortly be aggregated and analyzed by the machine learning model you have created. The Evaluation tab of your model allows you to view data clustering and get some first insights out of your data.
Your data operation status is now visible in Tailer Studio.
β
Check the data operation status in Tailer Studio
Access Tailer Studio.β
In the left navigation menu, select Table-to-table.
In the Configurations tab, search for your data operation, 000099-tailer-demo-build-predictions. You can see its status is Activated.
Click the data operation ID to display its parameters and full JSON file, or to leave comments about it.
Last updated