Google Drive

This article covers all the necessary steps to access and manage files on Google Drive, Google's cloud storage solution, from inside DataLab.

Setup

Before you can run Python code to programmatically access data in Google Sheets, you need to the following steps, which we will go through in detail step by step:

  • Enable the Google Sheets API

  • Create a Google service account for programmatic access.

  • Share the files you want to access with the service account.

  • Store the service account credentials in DataLab.

Enable the Google Drive API

  • Make sure you’re signed in with your Google account.

  • Navigate to the Google API Library

  • Create a new project by clicking in the dropdown on the navbar.

  • Search for the “Google Drive API” and enable it. This can take up to 10 seconds.

Configure a Google Service Account

  • In the “APIs and services” navbar on the left, go to the “Credentials tab”

A Google service account is a special kind of account that can be used by programs to access Google resources like your Drive. You will use this service account to connect DataLab to Google Drive.

You only have to set up this Google service account once for every Google account that you want to access Google resources with; you can skip this step the next time.

Follow the steps below to create the service account and generate the necessary credentials:

  • Click on “+ CREATE CREDENTIALS” and select “Service Account”

    • In the first step (service account details), provide a name for the service account, e.g., “google-operator” and click on “Create and continue”

    • In the second step, select the “Owner” role and click “Continue”

    • In the third step, don’t change anything and click “Done”

  • Once back on the Credentials page, click on the service account you just created.

  • Go to the Keys tab, click “Add Key > Create new key”

  • Choose “JSON”, then click “Create.” The JSON file with your service account credentials will automatically download to your computer.

You now have a service account and a JSON credentials file! Head over to your Downloads folder or wherever the JSON file was downloaded, open it up, and have a look. It should look something like this:

{
  "type": "service_account",
  "project_id": "<your-project-name>",
  "private_key_id": "<something-private>",
  "private_key": "-----BEGIN PRIVATE KEY-----\nM<some-very-private-stuff\n",
  "client_email": "google-operator@steam-verve-386214.iam.gserviceaccount.com",
  "client_id": "123456789012345678901",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/gsheets-operator%40<project-name>.iam.gserviceaccount.com"
}

There’s a client_email field in there, along the lines of google-operator@<google-project-name>.iam.gserviceaccount.com. Copy this email to your clipboard; you’ll need it in the next step.

Share Google Drive files with the service account

Your service account can only access Google Drive files that it has access to, so need to go through files in your Google Drive folder and share them with the email of the service account that you copied to your clipboard in the previous step. If you just want to read the files, "Viewer" access is enough.

Create a new workbook

Click this link to create a workbook in your own account that contains example Python code to connect to Google Drive, list all the files the service account has access to, and download an example CSV file.

Store service account credentials in DataLab

We'll use Environment variables to securely store the service account credentials JSON in your workbook.

In your new workbook, click on "Environment", and click on "+" next to "Environment variables":

  • Set Name to GOOGLE_JSON

  • Set Value to the full contents of the service account JSON file that was downloaded. You can do this by opening the JSON file, selecting all, copying it to your clipboard, and then pasting it in the Value field.

  • Set the “Environment Variable Set Name” to “Google Service Account” (this can be anything, really)

After filling in all fields, click “Create,” “Next,” and finally, “Connect.” Your workbook session will restart, and GOOGLE_JSON will now be available as an environment variable in your workbook. You can verify this by creating a Python cell with the following code and running it:

If you want to reuse the same services account credentials in another workbook, you don’t need to set up the environment variable again: you can connect the environment variable to your other workspaces as well.

List files in Google Drive

Use the Python code snippets in the workbook that you created before (with this link) to install the necessary packages, list all the files in your Google Drive that your account has access to, and download an example CSV file; all from Python!

Last updated