BigQuery is a fully-managed enterprise data warehouse for analystics.
It is cheap and high-scalable. In this article, I would like to share basic tutorial for BigQuery with Python.
๐ Installation
pip install google-cloud-bigquery |
๐ Create credentials
please see https://cloud.google.com/bigquery/docs/reference/libraries .
Additionally, please set the PATH to environment variables.
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json" |
๐ฎ Create a dataset if not existing
Create a dataset if there is not the dataset:
from google.cloud import bigquery |
๐ฃ Create a table if not existing
Create a table if there is not the table:
from google.cloud import bigquery |
๐ค Schema info
You can add description or required option to schema information.
BQ_TABLE_SCHEMA = [ |
If you want to know more detail for SchemaField
method, please see bigquery.schema.SchemaField
๐ Insert rows
Upload tuple object to BigQuery. It use stream buffer, so I donโt recommend it.
from google.cloud import bigquery |
๐ธ Check data exist
from google.cloud import bigquery |
๐ Upload a csv to google cloud storage and load the csv
This is a sample which is uploading a CSV file to google cloud storage and load the CSV file to BigQuery.
Before coding, please execute as follows:
pip install google-cloud-storage |
After installing google-cloud-storage
, add following functions:
from google.cloud import bigquery |
๐ Appendix
Web Console / Enable to standardSQL
If you want to delete some records in BigQuery, please add #standardSQL
, like this:
#standardSQL |
More Detail: Setting a query prefix
๐ผ References
- https://github.com/GoogleCloudPlatform/python-docs-samples/blob/19f7f65c7badc37e23ad9f0663da8bd78823a1d7/bigquery/cloud-client/quickstart.py
- https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv?hl=ja#bigquery-load-table-gcs-csv-create-python
- https://qiita.com/pyr_revs/items/55cec7ff435bddffc352
- https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv?hl=ja#bigquery-load-table-gcs-csv-create-python
- https://googlecloudplatform.github.io/google-cloud-python/latest/_modules/google/cloud/bigquery/schema.html
๐ฅ Recommended VPS Service
VULTR provides high performance cloud compute environment for you.
Vultr has 15 data-centers strategically placed around the globe, you can use a VPS with 512 MB memory for just $ 2.5 / month ($ 0.004 / hour).
In addition, Vultr is up to 4 times faster than the competition, so please check it => Check Benchmark Results!!