Managing tokens securely
Source:vignettes/articles/managing-tokens-securely.Rmd
managing-tokens-securely.Rmd
Testing presents special challenges for packages that wrap an API. Here we tackle one of those problems: how to deal with auth in a non-interactive setting on a remote machine. This affects gargle itself and will affect any client package that relies on gargle for auth.
This article documents the token management approach taken in gargle. We wanted it to be relatively easy to have a secret, such as an auth token, that we can:
- Use locally
- Use with continuous integration (CI) services, such as GitHub Actions
- Use with R-hub
all while keeping the secret secure.
The approach uses symmetric encryption, where the shared key is
stored in an environment variable. Why? This works well with existing
conventions for local R usage. Most CI services offer support for secure
environment variables. And R-hub accepts environment variables via the
env_vars
argument of rhub::check()
.
This is based on an approach originally worked out in bigrquery.
Accessing the secret_*()
functions
gargle’s approach to managing test tokens is implemented through
several functions that all start with the secret_
prefix.
These functions are not (currently?) exported. This may seem odd, since
others might want to use these functions. But note they are only needed
during setup or at test time. This sort of usage is compatible with
others calling internal gargle functions and possibly inlining a version
of a couple test helpers.
One way to make the secret_*()
functions available for
local experimentation is to call devtools::load_all()
,
which exposes all internal objects in a package:
devtools::load_all("path/to/source/of/gargle/")
The approach I’ll take in this article is to call these functions via
:::
.
Overview of the approach
- Add the sodium
package to Suggests in your DESCRIPTION, via
usethis::use_package("sodium", "Suggests")
if you like. - Generate a random PASSWORD and give it a self-documenting name, e.g.
GARGLE_PASSWORD
. Store as an environment variable. - Identify a secret file of interest, such as the JSON representing a service account token. This is presumably stored outside your package.
- Use the PASSWORD to apply a method for symmetric encryption to the target file. Store the resulting encrypted file in a designated location within your package.
- Store or pass the PASSWORD as an environment variable everywhere
you’ll need to decrypt the secret.
- Check that the platform has support for keeping the PASSWORD concealed.
- Make sure you don’t do anything in your own code that would dump it to, e.g., a log file.
- Rig your tests to determine if the key is available and, therefore,
whether decryption is going to be possible.
- If “no”, carry on gracefully with any tests that don’t require auth.
- If “yes”, decrypt the secret and put the associated token into force globally for the test run or on an “as needed” basis in individual tests.
Annotated code-through
Generate a name for the PASSWORD
secret_pw_name()
creates a name of the form
“PACKAGE_PASSWORD”, a convention baked into the secret_*()
family of functions.
(pw_name <- gargle:::secret_pw_name("gargle"))
#> [1] "GARGLE_PASSWORD"
Generate a random PASSWORD
In real life, you should keep the output of
secret_pw_gen()
to yourself! We reveal it here as part of
the exposition.
(pw <- gargle:::secret_pw_gen())
#> [1] "cLldBLiCfJepovI0nPXuE6BKmU0N6TYntYwjROMa5f230gwuQE"
Define environment variable in .Renviron
Combine the name and value to form a line like this in your
user-level .Renviron
file:
GARGLE_PASSWORD=cLldBLiCfJepovI0nPXuE6BKmU0N6TYntYwjROMa5f230gwuQE
usethis::edit_r_environ()
can help create or open this file. We strongly
recommend using the user-level .Renviron
, as
opposed to project-level, because this makes it less likely you will
share sensitive information by mistake. If you don’t take our advice and
choose to store the PASSWORD in a file inside a Git repo, you must make
sure that file is listed in .gitignore
. This still would
not prevent leaking your secret if, for example, that project is in a
directory that syncs to DropBox.
Make sure .Renviron
ends in a newline; the lack of this
is a notorious cause of silent failure. Remember you’ll need to restart
R or call readRenviron("~/.Renviron")
for the newly defined
environment variable to take effect.
Provide environment variable to other services
GitHub Actions:
Define the environment variable as an encrypted secret in your repo:
Use the secrets context to expose a secret as an environment variable in your workflows. That will look like like so, in some appropriate place in your workflow file:
env:
PACKAGE_PASSWORD: ${{ secrets.PACKAGE_PASSWORD }}
Remember that the secret, and therefore the associated environment variable, is not available when workflows are triggered via an external pull request. This is another reason to carry on gracefully when token decryption is not possible (see below).
Travis-CI
Define the environment variable in your repo settings via the browser UI:
https://docs.travis-ci.com/user/environment-variables/#defining-variables-in-repository-settings
Alternatively, you can use the Travis command line interface to
configure the environment variable or even define an encrypted
environment variable in .travis.yml
.
Regardless of how you define it, remember that private environment variables are not available to external pull requests, which is another reason to carry on gracefully when token decryption is not possible (see below).
You may also need something like this in .travis.yml
so
that the sodium R package can be installed:
addons:
apt:
sources:
- sourceline: 'ppa:chris-lea/libsodium'
packages:
- libsodium-dev
AppVeyor
Define the environment variable in the Environment page of your repo’s Settings. Make sure to request variable encryption and to click “Save” at the bottom. In the General page, you probably want to check “Enable secure variables in Pull Requests from the same repository only” and, again, explicitly “Save”.
As with Travis, it is also possible to encrypt the password using
your AppVeyor account’s public key and inline the value in
appveyor.yml
. There is a helpful web UI for that does the
encryption and generates the lines to add to your config:
https://ci.appveyor.com/tools/encrypt
This can also be found via Settings > Encrypt YAML.
R-hub
Send the environment variable in your calls to rhub::check()
and friends:
rhub::check(env_vars = Sys.getenv(gargle:::secret_pw_name("gargle"), names = TRUE))
Encrypt the secret file
secret_write()
takes 3 arguments:
-
package
name. Processed throughsecret_pw_name()
in order to retrieve the PASSWORD from an appropriately named environment variable. -
name
of the encrypted file to write. The location is belowinst/secret
in the source ofpackage
. -
data
, either a file path to the unencrypted secret file or the data to be encrypted as a raw vector. In the case of a secret file, we strongly recommend that its primary home on your local computer is outside your package and, generally, outside of any folder that syncs regularly to a remote, e.g. GitHub or DropBox. This decreases the chance of accidental leakage.
Example of a call to secret_write()
, where
gargle-testing.json
is a JSON file downloaded for a service
account managed via the Google API / Cloud
Platform console:
gargle:::secret_write(
package = "gargle",
name = "gargle-testing.json",
input = "a/very/private/local/folder/gargle-testing.json"
)
This writes an encrypted version of gargle-testing.json
to inst/secret/gargle-testing.json
relative to the current
working directory, which is presumably the top-level directory of
gargle’s source. This encrypted file should be committed and
pushed.
Test setup
Now you need to rig your tests or their setup around this encrypted token. You need to plan for two scenarios:
- Decryption is going to work. This is where you actually get to test package functionality against the target API, with auth.
- Decryption is not going to work. Either because the Suggested sodium package is
not available or (much more likely) because the environment variable
that represents the key is not available.
- This will be the case on CRAN, by definition, because there is no way to share an encrypted secret.
- This will be the case for external contributors, on their personal machines and when their GitHub pull requests are checked via CI services, such as GitHub Actions, Travis-CI, or AppVeyor.
CI configuration
We recommend that you actively check your package under the “no decryption, no token” scenario, so that you discover problems before CRAN or your contributors do. In fact, this should probably be the default situation for your CI jobs and you only supply the secret to a single, flagship job – probably the check with the current R release and your favorite operating system.
Here’s the simplified build matrix from the R CMD check
GitHub Actions workflow file used by gargle (note: we redacted a very
long rspm
URL that’s irrelevant here):
strategy:
matrix:
config:
- {os: macOS-latest, r: 'devel', gargle_auth: GARGLE_NOAUTH}
- {os: macOS-latest, r: '4.0', gargle_auth: GARGLE_PASSWORD}
- {os: windows-latest, r: '4.0', gargle_auth: GARGLE_NOAUTH}
- {os: ubuntu-16.04, r: '4.0', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.6', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.5', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.4', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
- {os: ubuntu-16.04, r: '3.3', gargle_auth: GARGLE_NOAUTH, rspm: "..."}
env:
GARGLE_PASSWORD: ${{ secrets[matrix.config.gargle_auth] }}
Notice how GARGLE_PASSWORD
will only be available when
checking against the released version of R on macOS-latest.
bigrquery implements the same idea with a different approach: it does
not provide BIGRQUERY_PASSWORD
to the main
R CMD check
GitHub Actions workflow at all. Instead there
is a separate “live api” workflow that only has one job and that
accesses the secret.
The tidyverse / r-lib team is transitioning from Travis/AppVeyor to
GitHub Actions. But in case it is helpful, here’s a simplified excerpt
from a .travis.yml
file used by gargle in the past. The
main r: release
build accesses GARGLE_PASSWORD
implicitly as an encrypted environment variable, but
R CMD check
runs for the other builds with
GARGLE_PASSWORD
explicitly unset:
matrix:
include:
- r: release
# <stuff about code coverage, pkgdown build & deploy, etc.>
- r: release
env: GARGLE_PASSWORD=''
- r: devel
env: GARGLE_PASSWORD=''
- r: oldrel
env: GARGLE_PASSWORD=''
Regardless of your CI platform, your absolute best bet for writing configuration files is to look at what other R package developers are doing in their public source repos. All of the above is static, simplified, and (probably) stale and will never reflect the current state-of-the-art.
Testthat configuration
In a wrapper package, you could determine decrypt-ability at the
start of the test run. Here’s representative code from googledrive’s
tests/testthat/helper.R
file, but something similar can be
seen in bigrquery and googlesheets4:
if (gargle:::secret_can_decrypt("googledrive")) {
json <- gargle:::secret_read("googledrive", "googledrive-testing.json")
drive_auth(path = rawToChar(json))
}
Versions of secret_can_decrypt()
and
secret_read()
are defined here in gargle.
drive_auth()
is a function specific to googledrive that
loads a token for use downstream (in multiple tests, in this case). Note
that it can clearly accept a JSON string, as an alternative to a
filepath, and that’s very favorable for our workflow. We’ll come back to
this below.
But what if secret_can_decrypt()
returns
FALSE
and no token is loaded? That’s where you rely on a
custom test skipper. Here is the test
skipper from googledrive:
# googledrive
skip_if_no_token <- function() {
testthat::skip_if_not(drive_has_token(), "No Drive token")
}
googledrive::drive_has_token()
returns TRUE
if a token is available and FALSE
otherwise. By calling the
skipper at the start of tests that require auth, you arrange for your
package to cope gracefully when the token cannot be decrypted, e.g., on
CRAN and in pull requests. It is typical to define such a skipper in
tests/testthat/helper.R
or similar.
gargle’s usage of the testing token is a bit different, still evolving, and less relevant to the maintainers of wrapper packages. Therefore it’s not featured here.
Known sources of friction
Once you dig into the secret_*()
family, you will notice
there are two recurring sources of friction:
- File or object? You almost certainly store your secrets in files. But the sodium functions for data encrypt and decrypt work with R objects. So, for example, it is convenient if token ingest can accept an R object as opposed to only a file path.
- Raw vectors. You might think of the PASSWORD or even the secret file
itself (e.g., JSON) in terms of plain text. But the sodium functions for
data encrypt and decrypt work with raw vectors, not character
vectors. Be prepared to see related conversions in the
secret_*()
functions.
Functions useful for these conversions:
Resources
bigrquery and googledrive, which both use this approach.
bigrquery/tests/testthat/helper-auth.R
googledrive/tests/testthat/helper.R
- Setup chunk of a pkgdown article that is rendered and deployed via
CI:
googledrive/vignettes/articles/multiple-files.Rmd#L10-L29
“Managing secrets” vignette of httr:
Vignettes of the sodium package, especially the parts relating to symmetric encryption:
- https://cran.r-project.org/web/packages/sodium/vignettes/crypto101.html
- https://cran.r-project.org/web/packages/sodium/vignettes/intro.html
The cyphr package, which smooths over frictions like those identified above relating to “file vs. object?” and “character vs. raw?”: