Here we describe how to do auth with a package that uses gargle, without requiring any user interaction. This comes up in a wide array of contexts, ranging from simple rendering of a local R Markdown document to deploying a data product on a remote server.
We assume the wrapper package uses the design described in
vignette("gargle-auth-in-client-package")
. Examples
include:
- bigrquery
- googledrive
- googlesheets4
- gmailr note: gmailr does not use the built-in tidyverse OAuth client
Full details on gargle::token_fetch()
, which powers this
strategy, are given in
vignette("how-gargle-gets-tokens")
.
Provide a token or pre-authorize token discovery
The main principle for auth that does not require user interaction:
Provide a token directly or take advance measures that indicate you want a token to be discovered.
We present several ways to achieve this, basically in order of preference.
Sidebar 1: Deployment
First, a word about deployed environments. If this doesn’t apply to you, skip this section.
Let’s identify a specific type of project: it is developed in one place, with interactivity – such as your local computer – and then deployed elsewhere, where it must run without further interaction – such as on Posit Connect or shinyapps.io. In this context, it may make sense to depart from gargle’s default behaviour, which is to store tokens outside the project, and to embed them in the project instead. An example at the end of this vignette demonstrates the use of a project-level OAuth cache. A service account token could also be stored in the project.
When you embed tokens in the project and deploy, remember, that, by
default, they are no more secure or hidden than the other source files
in the project. vignette("managing-tokens-securely")
describes a method for embedding an encrypted token in the project,
which is an extra level of care needed if you want to access credentials
within, e.g., a continuous integration service, such as GitHub
Actions.
Sidebar 2: I just want my .Rmd
to render
TL;DR is that you need to successfully authenticate once in an interactive session and then, in your code, give gargle permission to use a token it finds in the cache. These sorts of commands achieve that:
# Approach #1: use an option.
# Either specify the user:
options(gargle_oauth_email = "jenny@example.com")
# Or, if you only use one Google identity, you can be more vague:
options(gargle_oauth_email = TRUE)
# Or, you can specify the identity to use at the domain level:
options(gargle_oauth_email = "*@example.com")
# Approach #2: call PACKAGE_auth() proactively.
library(googledrive)
# Either specify the user:
drive_auth(email = "jenny@example.com")
# Or, if you only use one Google identity, you can be more vague:
drive_auth(email = TRUE)
# Or, you can specify the identity to use at the domain level:
drive_auth(email = "*@example.com")
At the end of this article, this scenario is explained in detail, if you want to understand why this works.
Embrace credentials available in certain cloud settings
In certain cloud computing contexts, a service account token may be ambiently available (or you can arrange for that to be true). Think about it: if your workload is running on Google Compute Engine (GCE), it’s already “inside the Google house”. It seems like there should be a way to avoid another round of auth and that is indeed the case.
Another advantage of these cloud auth workflows is that there is never any need to download and carefully manage a file that contains sensitive information. This is why they are often described as “keyless”. If you can use one of these methods, you should seriously consider doing so.
Google Compute Engine
This section applies to code running on a GCE instance, either literally, or on another Google Cloud product built on top of GCE. You should consider Google’s own documentation to be definitive, but we’ll try to give a useful summary here and to explain how gargle works with GCE:
https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-instances
A Google Cloud Platform (GCP) project generally has a GCE default service account and, by default, a new GCE instance runs as that service account. (If you wish, you can use a different service account by taking explicit steps when you create an instance or by modifying it while it’s stopped.) The main point is that, for an application running on GCE, a service account identity is generally available.
GCE allows applications to get an OAuth access token from its
metadata server and this is what gargle::credentials_gce()
does (which is one of functions tried by
gargle::token_fetch()
, which is called by wrapper
packages). This token request can be made for specific scopes and, in
general, most wrapper packages will indeed be asking for specific scopes
relevant to the API they access. Consider the signature of
googledrive::drive_auth()
:
drive_auth <- function(email = gargle::gargle_oauth_email(),
path = NULL,
scopes = "https://www.googleapis.com/auth/drive",
cache = gargle::gargle_oauth_cache(),
use_oob = gargle::gargle_oob_default(),
token = NULL) { ... }
The googledrive package asks for a token with "drive"
scope, by default. This brings up one big gotcha when using packages
like googledrive or googlesheets4 on GCE.
By default, a GCE instance will be running as the default service
account, with the "cloud-platform"
scope and this will,
generally speaking, allow the service account to work with various Cloud
products. However, the "cloud-platform"
scope does not
permit operations with non-Cloud APIs, such as Drive and Sheets. If you
want the service account identity for your GCE instance to be able to
get an access token for use with Drive and Sheets, you will need to
explicitly add, e.g., the "drive"
scope when you create the
instance (or stop the instance and add that scope). (Note that, in
contrast, BigQuery is considered a Cloud product and therefore bigrquery
can operate with the "cloud-platform"
scope.)
Be aware that you might also need to explicitly grant the service account an appropriate level of access (e.g. read or write) to any Drive files you intend to work on.
Finally, if you want to opt-out of using the default service account
and, instead, auth as a normal user, even though you are on GCE, that is
also possible. One way to achieve that is to remove
credentials_gce()
from the set of auth functions tried by
gargle::token_fetch()
by executing this command before any
explicit or implicit auth happens:
# removes `credentials_gce()` from gargle's registry
gargle::cred_funs_add(credentials_gce = NULL)
You can make a similar change in more scoped way with the helpers
gargle::with_cred_funs()
or
gargle::local_cred_funs()
.
Workload Identity on Google Kubernetes Engine (GKE)
Here we discuss how gargle’s GCE auth can work for a related service, Google Kubernetes Engine (GKE), using Workload Identity. This is more complicated that direct usage of GCE and some extra configuration is needed to make a service account’s metadata available for the GKE instance to discover. GKE is the underlying technology behind Google’s managed Airflow service, Cloud Composer, so this also applies to R docker files being called in that environment.
Workload Identity is the recommended way to do authentication on GKE and other places, if possible, since it eliminates the use of a file that holds the service key, which is a potential security risk.
- Following the Workload
Identity docs, you create a service account as normal and give it
permissions and scopes needed to, say, upload to BigQuery. Imagine that
my-service-key@my-project.iam.gserviceaccount.com
has thehttps://www.googleapis.com/auth/bigquery
scope. - Instead of downloading a JSON key, you instead migrate that permission by adding a policy binding to another service account within Kubernetes.
- Create the service account within Kubernetes, ideally within a new namespace:
# create namespace
kubectl create namespace my-namespace
# Create Kubernetes service account
kubectl create serviceaccount --namespace my-namespace bq-service-account
- Bind that Kubernetes service account to the service account outside of Kubernetes you created in step 1, and assign it an annotation:
# Create IAM policy binding betwwen k8s SA and GSA
gcloud iam service-accounts add-iam-policy-binding my-service-key@my-project.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:my-project.svc.id.goog[my-namespace/bq-service-account]"
# Annotate k8s SA
kubectl annotate serviceaccount bq-service-account \
--namespace my-namespace \
iam.gke.io/gcp-service-account=my-service-key@my-project.iam.gserviceaccount.com
This key will now be available to add to pods within the cluster. For
Airflow, you can pass them in using the Python code
GKEStartPodOperator(...., namespace='my-namespace', service_account_name='bq-service-account')
.
Documentation around GKEStartPodOperator()
within Cloud
Composer can be found here.
- In order for the R function
gargle::gce_credentials()
do the right thing, you need to do two things:
- Set
"gargle.gce.use_ip"
option toTRUE
, in order to use the metadata server that’s relevant on GKE. - Specify the target service account, i.e. you can’t just passively
accept the default, which is to use the
"default"
service account.gce_instance_service_accounts()
can be helpful, e.g., if you want to know which service accounts your Docker container can see.
Here is example code that you might execute in your Docker container:
options(gargle.gce.use_ip = TRUE)
t <- gargle::credentials_gce("my-service-key@my-project.iam.gserviceaccount.com")
# ... do authenticated stuff with the token t ...
Let’s assume that PKG is an R package that implements gargle auth in
the standard way, such as bigrquery or googledrive. At the time of
writing the service_account
argument is not exposed in the
usual, high-level PKG_auth()
function (https://github.com/r-lib/gargle/issues/249. So if you
need to use a non-default
service account, you need to call
credentials_gce()
directly and pass that token to
PKG_auth()
: Here’s an example of how that might look:
library(PKG)
options(gargle.gce.use_ip = TRUE)
t <- gargle::credentials_gce(
"my-service-key@my-project.iam.gserviceaccount.com", # use YOUR service account
scopes = "https://www.googleapis.com/auth/PKG" # use REAL scopes
)
PKG_auth(token = t)
# ... do authenticated stuff...
AWS
Keyless auth is even possible from non-Google cloud platforms, using Workload identity federation.
This is implemented in the experimental function
credentials_external_account()
, which currently only
supports AWS.
Provide a service account token directly
When two computers are talking to each other, possibly with no human involvement, the most appropriate type of token to use is a service account token. If you’re not working in cloud context with automatic access to a service account (see previous section), you can still use a service account, but it will require more explicit effort.
- Create a service account and then download its credentials as a JSON
file. This is described in
vignette("get-api-credentials")
, specifically in the Service account token section. - Call the wrapper package’s main auth function proactively and provide the path to this JSON file.
Example using googledrive:
library(googledrive)
drive_auth(path = "/path/to/your/service-account-token.json")
If this code is running on, e.g., a continuous integration service
and you need to use an encrypted token, see
vignette("managing-tokens-securely")
.
For certain APIs, service accounts are inherently awkward, because
you often want to do things on behalf of a specific user. Gmail
is a good example. If you are sending email programmatically, you
probably want to send it as yourself (or from some other specific email
account) instead of from
zestybus-geosyogl@fuffapster-654321.iam.gserviceaccount.com
.
This is, in fact, possible but is described as “impersonation”, which
should tip you off that Google does not exactly encourage this workflow.
Some details:
- This requires “delegating domain-wide authority” to the service account.
- It is only possible in the context of a G Suite domain and only an administrator of the domain can set this up.
- The domain-wide authority is granted only for specific scopes, so those can be as narrow as possible. This may make a domain administrator more receptive to the idea.
- This is documented in a few different places, such as:
- Delegating domain-wide authority to the service account from Google Identity Platform docs
- Perform G Suite Domain-Wide Delegation of Authority from G Suite Admin SDK docs
- The
subject
argument ofcredentials_service_account()
andcredentials_app_default()
is available to specify which user to impersonate, e.g.subject = "user@example.com"
. This argument first appeared in gargle 0.5.0, so it may not necessarily be exposed yet in user-facing auth functions likedrive_auth()
. If you needsubject
in a client package, that is a reasonable feature request. It is also possible to get a token with an explicit call to, e.g.,credentials_service_account()
and then pass that token to the auth function:
t <- gargle::credentials_service_account(
path = "/path/to/your/service-account-token.json",
scopes = ...,
subject = "user@example.com"
)
googledrive::dive_auth(token = t)
If delegation of domain-wide authority is impossible or unappealing, you must use an OAuth user token, as described below.
Rig a service or external account for use with Application Default Credentials
Wrapper packages that use gargle::token_fetch()
in the
recommended way have access to the token search strategy known as
Application Default Credentials.
You need to put the JSON corresponding to your service or external account in a very specific location or, alternatively, record the location of this JSON file in a specific environment variable.
Full details are in the credentials_app_default()
section of vignette("how-gargle-gets-tokens")
.
If you have your token rigged properly, you do not
need to do anything else, i.e. you do not need to call
PACKAGE_auth()
explicitly. Your token should just get
discovered upon first need.
For troubleshooting purposes, you can set a gargle option to see
verbose output about the execution of
gargle::token_fetch()
:
options(gargle_verbosity = "debug")
withr-style convenience helpers also exist:
with_gargle_verbosity()
and
local_gargle_verbosity()
.
Provide an OAuth token directly
If you somehow have the OAuth token you want to use as an R object,
you can provide it directly to the token
argument of the
main auth function. Example using googledrive:
library(googledrive)
my_oauth_token <- # some process that results in the token you want to use
drive_auth(token = my_oauth_token)
gargle caches each OAuth user token it obtains to an
.rds
file, by default. If you know the filepath to the
token you want to use, you could use readRDS()
to read it
and provide as the token
argument to the wrapper’s auth
function. Example using googledrive:
# googledrive
drive_auth(token = readRDS("/path/to/your/oauth-token.rds"))
How would you know this filepath? That requires some attention to the location of gargle’s OAuth token cache folder, which is described in the next section.
Full details are in the credentials_byo_oauth2()
section
of vignette("how-gargle-gets-tokens")
.
Arrange for an OAuth user token to be re-discovered
This is the least recommended strategy, but it appeals to many users, because it doesn’t require creating a service account. Just remember that the perceived ease of using the token you already have (an OAuth user token) is quickly cancelled out by the greater difficulty of managing such tokens for non-interactive use. You might be forced to use this strategy with certain APIs, such as Gmail, that are difficult to use with a service account.
Two main principles:
- Take charge of – or at least notice – the folder where OAuth tokens are being cached.
- Make sure exactly one cached token will be identified and pre-authorize its use.
There are many ways to do this. We’ll work several examples using that convey the range of what’s possible.
I just want my .Rmd
to render
Step 1: Get that first token. You must run your code at least once, interactively, do the auth dance, and allow gargle to store the token in its cache.
library(googledrive)
# do anything that triggers auth
drive_find(n_max)
Step 2: Revise your code to pre-authorize the use of
that token next time. Now your .Rmd
can be rendered or your
.R
script can run, without further interaction.
You have two choices to make:
- Set the
gargle_oauth_email
option or callPACKAGE_auth(email = ...)
.- The option-based approach can be implemented in each
.Rmd
or.R
or in a user-level or project level.Rprofile
startup file.
- The option-based approach can be implemented in each
- Authorize the use of the “matching token”:
-
email = TRUE
works if we’re only going to find, at most, 1 token, i.e. you always auth with the same identity -
email = "jane@example.com"
pre-authorizes use of a token associated with a specific identity -
email = "*@example.com"
pre-authorizes use of a token associated with an identity from a specific domain; good for code that might be executed on the machines of bothalice@example.com
andbob@example.com
-
This sets an option that allows gargle to use cached tokens whenever there’s a unique match:
options(gargle_oauth_email = TRUE)
This sets an option to use tokens associated with a specific email address:
options(gargle_oauth_email = "jenny@example.com")
This sets an option to use tokens associated with an email address with a specific domain:
options(gargle_oauth_email = "*@example.com")
This gets a token right now and allows the use of a matching token, using googledrive as an example:
drive_auth(email = TRUE)
This gets a token right now, for the user with a specific email address:
drive_auth(email = "jenny@example.com")
This gets a token right now, first checking the cache for a token associated with a specific domain:
drive_auth(email = "*@example.com")
Project-level OAuth cache
This is like the previous example, but with an added twist: we use a project-level OAuth cache. This is good for deployed data products.
Step 1: Obtain the token intended for non-interactive use and make sure it’s cached in a (hidden) directory of the current project. Using googledrive as an example:
library(googledrive)
# designate project-specific cache
options(gargle_oauth_cache = ".secrets")
# check the value of the option, if you like
gargle::gargle_oauth_cache()
# trigger auth on purpose --> store a token in the specified cache
drive_auth()
# see your token file in the cache, if you like
list.files(".secrets/")
Do this setup once per project.
Another way to accomplish the same setup is to specify the desired cache location directly in the call to the auth function:
library(googledrive)
# trigger auth on purpose --> store a token in the specified cache
drive_auth(cache = ".secrets")
Step 2: In all downstream use, announce the location of the cache and pre-authorize the use of a suitable token discovered there. Continuing the googledrive example:
library(googledrive)
options(
gargle_oauth_cache = ".secrets",
gargle_oauth_email = TRUE
)
# now use googledrive with no need for explicit auth
drive_find(n_max = 5)
Setting the option gargle_oauth_email = TRUE
says that
googledrive is allowed to use a token that it finds in the cache,
without interacting with a user, as long as it discovers EXACTLY one
matching token. This option-setting code needs to appear in each script,
.Rmd
, or app that needs to use this token
non-interactively. Depending on the context, it might be suitable to
accomplish this in a startup file, e.g. project-level
.Rprofile
.
Here’s a variation where we say which token to use by explicitly specifying the associated email. This is handy if there’s a reason to have more than one token in the cache.
library(googledrive)
options(
gargle_oauth_cache = ".secrets",
gargle_oauth_email = "jenny@example.com"
)
# now use googledrive with no need for explicit auth
drive_find(n_max = 5)
Here’s another variation where we specify the necessary info directly in an auth call, instead of in options:
library(googledrive)
drive_auth(cache = ".secrets", email = TRUE)
# now use googledrive with no need for explicit auth
drive_find(n_max = 5)
Here’s one last variation that’s applicable when the local cache could contain multiple tokens:
library(googledrive)
drive_auth(cache = ".secrets", email = "jenny@example.com")
# now use googledrive with no need for explicit auth
drive_find(n_max = 5)
Be very intentional about paths and working directory. Personally I
would use here::here(".secrets)"
everywhere above, to make
things more robust.
For troubleshooting purposes, you can set a gargle option to see
verbose output about the execution of
gargle::token_fetch()
:
options(gargle_verbosity = "debug")
withr-style convenience helpers also exist:
with_gargle_verbosity()
and
local_gargle_verbosity()
.
For a cached token to be considered a “match”, it must match the current request with respect to user’s email, scopes, and OAuth client (client ID or key and secret). By design, these settings have very low visibility, because we usually want to use the defaults. If your token is not being discovered, consider if any of these fields might explain the mismatch.