snsary.contrib.google.bigquery

Sends batches of Readings as rows to Google BigQuery to be stored in a dataset / table called snsary.readings, using the Google BigQuery Storage API.

Create an instance with .from_env(), which expects:

  • GOOGLE_APPLICATION_CREDENTIALS - the path to a JSON credentials file

  • GOOGLE_BIGQUERY_STREAM - of the form projects/<your_project_id>/datasets/snsary/tables/readings/streams/_default

Setting up BigQuery

You can use the BigQuery UI to do most of the setup.

  1. Create a dataset called snsary.

    • Do not enable table expiration (this is different to partition expiration).

  2. Create a table called readings.

    • Add columns timestamp, host, sensor, metric and value.

    • Use TIMESTAMP for timestamp, FLOAT for value and otherwise STRING.

    • Partition the table by day using values of the timestamp column.

  3. Set up partition expiration e.g.

    ALTER TABLE snsary.readings
    SET OPTIONS (
     partition_expiration_days=425
    )
    

You will also need to create a Google Cloud service account and corresponding API key. The service account should have the “BigQuery Data Editor” role or similar.

Querying the data

Example query for data in the table:

SELECT $__timeGroup(timestamp,$__interval), sensor, metric, avg(value)
FROM `snsary.readings`
where $__timeFilter(timestamp)
group by $__timeGroup(timestamp,$__interval), sensor, metric
order by 1 asc

Note that the $__ functions are defined by Grafana. A service account reading the data will need to have “BigQuery Data Viewer” and “BigQuery Job User” roles.

Module Contents

class snsary.contrib.google.bigquery.BigQueryOutput(stream)

Bases: snsary.outputs.BatchOutput

RETRY_DEADLINE = 10
classmethod from_env()
publish_batch(readings)