vignettes/articles/managing-tokens-securely.Rmd
managing-tokens-securely.Rmd
Testing presents special challenges for packages that wrap an API. Here we tackle one of those problems: how to deal with auth in a non-interactive setting on a remote machine. This affects gargle itself and will affect any client package that relies on gargle for auth.
This article documents the token management approach taken in gargle. We wanted it to be relatively easy to have a secret, such as an auth token, that we can:
all while keeping the secret secure.
The approach uses symmetric encryption, where the shared key is stored in an environment variable. Why? This works well with existing conventions for local R usage. Most CI services offer support for secure environment variables. And R-hub accepts environment variables via the env_vars
argument of rhub::check()
.
This is based on an approach originally worked out in bigrquery.
secret_*()
functions
gargle’s approach to managing test tokens is implemented through several functions that all start with the secret_
prefix. These functions are not (currently?) exported. This may seem odd, since others might want to use these functions. But note they are only needed during setup or at test time. This sort of usage is compatible with others calling internal gargle functions and possibly inlining a version of a couple test helpers.
One way to make the secret_*()
functions available for local experimentation is to call devtools::load_all()
, which exposes all internal objects in a package:
devtools::load_all("path/to/source/of/gargle/")
The approach I’ll take in this article is to call these functions via :::
.
usethis::use_package("sodium", "Suggests")
if you like.GARGLE_PASSWORD
. Store as an environment variable.secret_pw_name()
creates a name of the form “PACKAGE_PASSWORD”, a convention baked into the secret_*()
family of functions.
(pw_name <- gargle:::secret_pw_name("gargle"))
#> [1] "GARGLE_PASSWORD"
In real life, you should keep the output of secret_pw_gen()
to yourself! We reveal it here as part of the exposition.
(pw <- gargle:::secret_pw_gen())
#> [1] "Ya4axP2Nbm8wu6wUZM25gesSmpwayJkBboqyzR3Drcw91d53Sx"
.Renviron
Combine the name and value to form a line like this in your user-level .Renviron
file:
GARGLE_PASSWORD=Ya4axP2Nbm8wu6wUZM25gesSmpwayJkBboqyzR3Drcw91d53Sx
usethis::edit_r_environ()
can help create or open this file. We strongly recommend using the user-level .Renviron
, as opposed to project-level, because this makes it less likely you will share sensitive information by mistake. If you don’t take our advice and choose to store the PASSWORD in a file inside a Git repo, you must make sure that file is listed in .gitignore
. This still would not prevent leaking your secret if, for example, that project is in a directory that syncs to DropBox.
Make sure .Renviron
ends in a newline; the lack of this is a notorious cause of silent failure. Remember you’ll need to restart R or call readRenviron("~/.Renviron")
for the newly defined environment variable to take effect.
Define the environment variable as an encrypted secret in your repo:
Use the secrets context to expose a secret as an environment variable in your workflows. That will look like like so, in some appropriate place in your workflow file:
env:
PACKAGE_PASSWORD: ${{ secrets.PACKAGE_PASSWORD }}
Remember that the secret, and therefore the associated environment variable, is not available when workflows are triggered via an external pull request. This is another reason to carry on gracefully when token decryption is not possible (see below).
Define the environment variable in your repo settings via the browser UI:
https://docs.travis-ci.com/user/environment-variables/#defining-variables-in-repository-settings
Alternatively, you can use the Travis command line interface to configure the environment variable or even define an encrypted environment variable in .travis.yml
.
Regardless of how you define it, remember that private environment variables are not available to external pull requests, which is another reason to carry on gracefully when token decryption is not possible (see below).
You may also need something like this in .travis.yml
so that the sodium R package can be installed:
Define the environment variable in the Environment page of your repo’s Settings. Make sure to request variable encryption and to click “Save” at the bottom. In the General page, you probably want to check “Enable secure variables in Pull Requests from the same repository only” and, again, explicitly “Save”.
As with Travis, it is also possible to encrypt the password using your AppVeyor account’s public key and inline the value in appveyor.yml
. There is a helpful web UI for that does the encryption and generates the lines to add to your config:
https://ci.appveyor.com/tools/encrypt
This can also be found via Settings > Encrypt YAML.
Send the environment variable in your calls to rhub::check()
and friends:
rhub::check(env_vars = Sys.getenv(gargle:::secret_pw_name("gargle"), names = TRUE))
secret_write()
takes 3 arguments:
package
name. Processed through secret_pw_name()
in order to retrieve the PASSWORD from an appropriately named environment variable.name
of the encrypted file to write. The location is below inst/secret
in the source of package
.data
, either a file path to the unencrypted secret file or the data to be encrypted as a raw vector. In the case of a secret file, we strongly recommend that its primary home on your local computer is outside your package and, generally, outside of any folder that syncs regularly to a remote, e.g. GitHub or DropBox. This decreases the chance of accidental leakage.Example of a call to secret_write()
, where gargle-testing.json
is a JSON file downloaded for a service account managed via the Google API / Cloud Platform console:
gargle:::secret_write(
package = "gargle",
name = "gargle-testing.json",
input = "a/very/private/local/folder/gargle-testing.json"
)
This writes an encrypted version of gargle-testing.json
to inst/secret/gargle-testing.json
relative to the current working directory, which is presumably the top-level directory of gargle’s source. This encrypted file should be committed and pushed.
Now you need to rig your tests or their setup around this encrypted token. You need to plan for two scenarios:
We recommend that you actively check your package under the “no decryption, no token” scenario, so that you discover problems before CRAN or your contributors do. In fact, this should probably be the default situation for your CI jobs and you only supply the secret to a single, flagship job – probably the check with the current R release and your favorite operating system.
Here’s the simplified build matrix from the R CMD check
GitHub Actions workflow file used by gargle (note: we redacted a very long rspm
URL that’s irrelevant here):
strategy:
matrix:
config:
- {os: macOS-latest, r: 'devel', gargle_auth: GARGLE_NOAUTH}
- {os: macOS-latest, r: '4.0', gargle_auth: GARGLE_PASSWORD}
- {os: windows-latest, r: '4.0', gargle_auth: GARGLE_NOAUTH}
- {os: ubuntu-16.04, r: '4.0', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.6', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.5', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.4', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.3', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
env:
GARGLE_PASSWORD: ${{ secrets[matrix.config.gargle_auth] }}
Notice how GARGLE_PASSWORD
will only be available when checking against the released version of R on macOS-latest.
bigrquery implements the same idea with a different approach: it does not provide BIGRQUERY_PASSWORD
to the main R CMD check
GitHub Actions workflow at all. Instead there is a separate “live api” workflow that only has one job and that accesses the secret.
The tidyverse / r-lib team is transitioning from Travis/AppVeyor to GitHub Actions. But in case it is helpful, here’s a simplified excerpt from a .travis.yml
file used by gargle in the past. The main r: release
build accesses GARGLE_PASSWORD
implicitly as an encrypted environment variable, but R CMD check
runs for the other builds with GARGLE_PASSWORD
explicitly unset:
matrix:
include:
- r: release
# <stuff about code coverage, pkgdown build & deploy, etc.>
- r: release
env: GARGLE_PASSWORD=''
- r: devel
env: GARGLE_PASSWORD=''
- r: oldrel
env: GARGLE_PASSWORD=''
Regardless of your CI platform, your absolute best bet for writing configuration files is to look at what other R package developers are doing in their public source repos. All of the above is static, simplified, and (probably) stale and will never reflect the current state-of-the-art.
In a wrapper package, you could determine decrypt-ability at the start of the test run. Here’s representative code from googledrive’s tests/testthat/helper.R
file, but something similar can be seen in bigrquery and googlesheets4:
if (gargle:::secret_can_decrypt("googledrive")) {
json <- gargle:::secret_read("googledrive", "googledrive-testing.json")
drive_auth(path = rawToChar(json))
}
Versions of secret_can_decrypt()
and secret_read()
are defined here in gargle. drive_auth()
is a function specific to googledrive that loads a token for use downstream (in multiple tests, in this case). Note that it can clearly accept a JSON string, as an alternative to a filepath, and that’s very favorable for our workflow. We’ll come back to this below.
But what if secret_can_decrypt()
returns FALSE
and no token is loaded? That’s where you rely on a custom test skipper. Here is the test skipper from googledrive:
# googledrive
skip_if_no_token <- function() {
testthat::skip_if_not(drive_has_token(), "No Drive token")
}
googledrive::drive_has_token()
returns TRUE
if a token is available and FALSE
otherwise. By calling the skipper at the start of tests that require auth, you arrange for your package to cope gracefully when the token cannot be decrypted, e.g., on CRAN and in pull requests. It is typical to define such a skipper in tests/testthat/helper.R
or similar.
gargle’s usage of the testing token is a bit different, still evolving, and less relevant to the maintainers of wrapper packages. Therefore it’s not featured here.
Once you dig into the secret_*()
family, you will notice there are two recurring sources of friction:
secret_*()
functions.Functions useful for these conversions:
bigrquery and googledrive, which both use this approach.
bigrquery/tests/testthat/helper-auth.R
googledrive/tests/testthat/helper.R
googledrive/vignettes/articles/multiple-files.Rmd#L10-L29
“Managing secrets” vignette of httr:
Vignettes of the sodium package, especially the parts relating to symmetric encryption:
The cyphr package, which smooths over frictions like those identified above relating to “file vs. object?” and “character vs. raw?”: