Managing Untrusted Metadata

This article describes a semi-automatic process for managing untrusted SAML metadata using a Shibboleth LocalDynamicMetadataProvider and a complementary set of command-line tools.

First configure a Shibboleth LocalDynamicMetadataProvider. In particular, configure a sourceDirectory as a local repository of metadata. The latter is referred to as $sourceDirectory in the code fragments below.

Install the SAML Library of command-line tools. Note that BIN_DIR and LIB_DIR are environment variables created during the installation process. These environment variables are used repeatedly in the code fragments below.

Identify a metadata source location to be managed. Perform the following sequence of steps for each metadata source location:

Prime the cache with a copy of the metadata
Filter the metadata into the source directory of the LocalDynamicMetadataProvider
Check the metadata on the server
If the metadata on the server is different than the metadata in cache, investigate the differences
If the differences are acceptable, update the cache with fresh metadata
Filter the metadata into the source directory of the LocalDynamicMetadataProvider
Go to step 3

The following examples illustrate the basic process.

Example 1: IRBManager

We start with a relatively simple example of remote metadata:

https://shibboleth.irbmanager.com/metadata.xml

A non-InCommon Shibboleth SP that consumes InCommon metadata
Last-Modified: Tue, 28 Jul 2015 13:32:54 GMT
Supports HTTP Conditional GET
See the relevant discussion thread on the mailing list

If you trust the SP owner to do the Right Thing, and the reliance on commercial TLS is not a concern, configure a Shibboleth FileBackedHTTPMetadataProvider to refresh the metadata at least daily:

Example 1: Configure a FileBackedHTTPMetadataProvider

<MetadataProvider id="IRBManager" xsi:type="FileBackedHTTPMetadataProvider" 
    metadataURL="https://shibboleth.irbmanager.com/metadata.xml" 
    backingFile="%{idp.home}/metadata/IRBManager.xml" maxRefreshDelay="P1D">

    <!-- filter all but the listed entity -->
    <MetadataFilter xsi:type="Predicate" direction="include">
        <Entity>https://shibboleth.irbmanager.com/</Entity>
    </MetadataFilter>

</MetadataProvider>

If, OTOH, security and/or interoperability are a concern, manage the metadata as illustrated below.

Given the HTTP location of the metadata to be managed, and the source directory of a Shibboleth LocalDynamicMetadataProvider, initialize both the cache and the source directory as follows:

Initialize the cache

# Steps 1 and 2
$ md_location=https://shibboleth.irbmanager.com/metadata.xml
$ $BIN_DIR/md_refresh.bash $md_location \
    | $BIN_DIR/md_tee.bash $sourceDirectory \
    > /dev/null

Presumably the following command is executed some time later, after the metadata resource has been modified on the server:

Check the cache

# Step 3
$ $BIN_DIR/http_cache_check.bash $md_location && echo "cache is up-to-date" || echo "cache is dirty"
cache is dirty

If the cache is dirty, manually inspect the differences between the metadata on the server and the metadata in the cache:

Inspect the file differences

# Step 4
$ $BIN_DIR/http_cache_diff.bash $md_location

If the differences are acceptable, update both the cache and the source directory with the new metadata:

Update the cache

# Steps 5 and 6
# force a metadata refresh
$ $BIN_DIR/md_refresh.bash -F $md_location \
    | $BIN_DIR/md_tee.bash $sourceDirectory \
    > /dev/null

To semi-automate the above process, implement a cron job that executes the command in step 3:

Example 1: Cron job to check the cache

#!/bin/bash

# environment variables
# (also export TMPDIR if it doesn’t already exist)
export BIN_DIR=/tmp/bin
export LIB_DIR=/tmp/lib
export CACHE_DIR=/tmp/http_cache
export LOG_FILE=/tmp/bash_log.txt

# the name of this script
script_name=${0##*/}

# specify the HTTP resource
location=https://shibboleth.irbmanager.com/metadata.xml

# check the cache against the server
$BIN_DIR/http_cache_check.bash $location >&2
status_code=$?
if [ $status_code -eq 1 ]; then
	echo "WARN: $script_name: cache is NOT up-to-date for resource: $location" >&2
elif [ $status_code -gt 1 ]; then
	echo "ERROR: $script_name: http_cache_check.bash failed ($status_code) on location: $location" >&2
fi

exit $status_code

Example 2: Amazon Web Services

The AWS documentation entitled How to Use Shibboleth for Single Sign-On to the AWS Management Console shows how to use a FileBackedHTTPMetadataProvider to consume AWS metadata. What the documentation doesn't say, however, is that the AWS server does not support HTTP conditional requests, so every time the metadata provider runs, it loads fresh metadata even if the metadata has not changed on the server.

Moreover, the NameIDFormat elements in AWS metadata are bogus. The elements must be removed from metadata in order for the integration to be successful. Since AWS metadata includes a @validUntil attribute, downloading a static copy of the metadata is not advisable, however.

https://signin.aws.amazon.com/static/saml-metadata.xml

Last-Modified date unknown
Does not support HTTP Conditional GET (no ETag in response)
Unauthorized URN-based entityID (urn:amazon:webservices)
Includes @validUntil attribute (expires annually)
No encryption certificate
NameIDFormat is wrong (showstopper)

Current NameIDFormat values in metadata:

urn:oasis:names:tc:SAML:2.0:nameid-format:transient
urn:oasis:names:tc:SAML:2.0:nameid-format:persistent

Login apparently works fine when these two NameIDFormat values are removed from metadata
This might work: urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress

Role-based attribute release is tricky (see the AWS documentation and search the Shibboleth archives for details)
See relevant discussion thread on the mailing list

As in the previous example, initialize both the cache and the source directory, but this time filter the NameIDFormat elements from the metadata before copying to the source directory:

Initialize the cache

# Steps 1 and 2
$ md_location=https://signin.aws.amazon.com/static/saml-metadata.xml
# log a warning if the metadata will expire within 5 days
$ $BIN_DIR/md_refresh.bash $md_location \
   | $BIN_DIR/md_require_valid_metadata.bash -E P5D \
   | /usr/bin/xsltproc $LIB_DIR/remove_NameIDFormat.xsl - \
   | $BIN_DIR/md_tee.bash $sourceDirectory \
   > /dev/null

Since the server does not support HTTP Conditional GET, the tool used in the previous example (http_cache_check.bash) will not work. Here we use a diff-like tool that compares the file on the server to the cached file byte-by-byte:

Compare the files

# Step 3
$ $BIN_DIR/http_cache_diff.bash -Q $md_location && echo "cache is up-to-date" || echo "cache is dirty"
cache is dirty

Manually inspect the differences between the metadata on the server and the metadata in the cache:

Inspect the file differences

# Step 4
$ $BIN_DIR/http_cache_diff.bash $md_location

If the new metadata is acceptable, update both the cache and the source directory with the new metadata:

Update the cache

# Steps 5 and 6
# force a metadata refresh
$ $BIN_DIR/md_refresh.bash -F $md_location \
   | $BIN_DIR/md_require_valid_metadata.bash -E P5D \
   | /usr/bin/xsltproc $LIB_DIR/remove_NameIDFormat.xsl - \
   | $BIN_DIR/md_tee.bash $sourceDirectory \
   > /dev/null

To semi-automate the above process, implement a cron job that executes the command in step 3:

Example 2: Cron job to compare files

#!/bin/bash

# environment variables
# (also export TMPDIR if it doesn’t already exist)
export BIN_DIR=/tmp/bin
export LIB_DIR=/tmp/lib
export CACHE_DIR=/tmp/http_cache
export LOG_FILE=/tmp/bash_log.txt

# the name of this script
script_name=${0##*/}

# specify the HTTP resource
location=https://signin.aws.amazon.com/static/saml-metadata.xml	

# quietly diff the cached file against the file on the server
$BIN_DIR/http_cache_diff.bash -Q $location >&2
status_code=$?
if [ $status_code -eq 1 ]; then
	echo "WARN: $script_name: cache is NOT up-to-date for resource: $location" >&2
elif [ $status_code -gt 1 ]; then
	echo "ERROR: $script_name: http_cache_diff.bash failed ($status_code) on location: $location" >&2
fi

exit $status_code

Implement a separate cron job that periodically checks the source directory for expired or soon-to-be-expired metadata:

Example 2: Cron job to sweep the source directory

#!/bin/bash

# environment variables
# (also export TMPDIR if it doesn’t already exist)
export BIN_DIR=/tmp/bin
export LIB_DIR=/tmp/lib
export CACHE_DIR=/tmp/http_cache
export LOG_FILE=/tmp/bash_log.txt

# the name of this script
script_name=${0##*/}

# specify the source directory
sourceDirectory=/path/to/source/dir

# remove expired metadata from the source directory
# log a warning if a document will expire within two weeks
$BIN_DIR/md_sweep.bash -E P2W $sourceDirectory >&2
status_code=$?
if [ $status_code -ne 0 ]; then
	echo "ERROR: $script_name: md_sweep.bash failed ($status_code) on source directory: $sourceDirectory" >&2
fi

exit $status_code

Note that the above script removes all expired metadata from the source directory, not just AWS metadata.