Basic tutorial for BigQuery with Python

BigQuery is a fully-managed enterprise data warehouse for analystics.
It is cheap and high-scalable. In this article, I would like to share basic tutorial for BigQuery with Python.

πŸ—½ Installation

pip install google-cloud-bigquery

😼 Create credentials

please see .

Additionally, please set the PATH to environment variables.

export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"

πŸ€” Create a dataset if not existing

Create a dataset if there is not the dataset:

from import bigquery

def bq_create_dataset():
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('my_datasset_id')

except NotFound:
dataset = bigquery.Dataset(dataset_ref)
dataset = bigquery_client.create_dataset(dataset)
print('Dataset {} created.'.format(dataset.dataset_id))

🐠 Create a table if not existing

Create a table if there is not the table:

from import bigquery

def bq_create_table():
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('my_datasset_id')

# Prepares a reference to the table
table_ref = dataset_ref.table('my_table_name')

except NotFound:
schema = [
bigquery.SchemaField('name', 'STRING', mode='REQUIRED'),
bigquery.SchemaField('age', 'INTEGER', mode='REQUIRED'),
table = bigquery.Table(table_ref, schema=schema)
table = bigquery_client.create_table(table)
print('table {} created.'.format(table.table_id))

🐝 Schema info

You can add description or required option to schema information.

bigquery.SchemaField('age', 'INTEGER', mode='REQUIRED', description='Age'),
bigquery.SchemaField('name', 'STRING', description='Name'),
bigquery.SchemaField('created_at', 'TIMESTAMP', mode='REQUIRED', description='Date and time when the record was created')

If you want to know more detail for SchemaField method, please see bigquery.schema.SchemaField

πŸ—» Insert rows

Upload tuple object to BigQuery. It use stream buffer, so I don’t recommend it.

from import bigquery

def export_items_to_bigquery():
# Instantiates a client
bigquery_client = bigquery.Client()

# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset('my_datasset_id')

table_ref = dataset_ref.table('my_table_id')
table = bigquery_client.get_table(table_ref) # API call

rows_to_insert = [
(u'Phred Phlyntstone', 32),
(u'Wylma Phlyntstone', 29),
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
assert errors == []

πŸ„ Check data exist

from import bigquery

def exist_record()
bigquery_client = bigquery.Client()

query = ('SELECT id FROM `{}.{}.{}` WHERE id="{}" LIMIT 1'
.format('my_project_id', 'my_datasset_id', 'my_table_id', 'my_selected_id'))

query_job = bigquery_client.query(query)
is_exist = len(list(query_job.result())) >= 1
print('Exist id: {}'.format('my_selected_id') if is_exist else 'Not exist id: {}'.format('my_selected_id'))
return is_exist
except Exception as e:

return False

πŸ€ Upload a csv to google cloud storage and load the csv

This is a sample which is uploading a CSV file to google cloud storage and load the CSV file to BigQuery.

Before coding, please execute as follows:

pip install google-cloud-storage

After installing google-cloud-storage, add following functions:

from import bigquery
from import storage

def gcs_upload_blob():
"""Uploads a file to the bucket."""
storage_client = storage.Client()
bucket = storage_client.get_bucket('my_bucket_name')
blob = bucket.blob('gs://path/to/')


print('File {} uploaded to {}.'.format(

def bq_load_csv_in_gcs():
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset()

job_config = bigquery.LoadJobConfig()
schema = [
bigquery.SchemaField('name', 'STRING', mode='REQUIRED'),
bigquery.SchemaField('age', 'INTEGER', mode='REQUIRED'),
job_config.schema = schema
job_config.skip_leading_rows = 1

load_job = bigquery_client.load_table_from_uri(

assert load_job.job_type == 'load'

load_job.result() # Waits for table load to complete.

assert load_job.state == 'DONE'

πŸ˜€ Appendix

Web Console / Enable to standardSQL

If you want to delete some records in BigQuery, please add #standardSQL, like this:

DELETE articles WHERE title = 'sample'

More Detail: Setting a query prefix

🍣 References

πŸ–₯ Recommended VPS Service

VULTR provides high performance cloud compute environment for you. Vultr has 15 data-centers strategically placed around the globe, you can use a VPS with 512 MB memory for just $ 2.5 / month ($ 0.004 / hour). In addition, Vultr is up to 4 times faster than the competition, so please check it => Check Benchmark Results!!