This is a summary of the origins, issues, and SP best practices associated with the eduPersonTargetedID attribute.
While extending SAML, the Liberty Alliance developed a concept they called "federated" identifiers. In their terms, federating a user is accomplished by creating an opaque identifier for the user that is specific to both an IdP and an SP, and is shared only between them. This was termed a "federation handle" and was designed to preserve privacy and prevent correlation of user activity across unrelated services.
Conceptually, a federation handle consisted of three required pieces of data:
- the entityID of the IdP
- the entityID of the SP
- an actual user identifier
An SP was also permitted to attach an optional "alias" for the user that it could rely on the IdP to pass to it each time the user was referenced so that systems could both avoid rekeying and secondary indexing.
In SAML 2.0, the same concept was evolved into something called a "persistent" NameID format, the term "federated" having political connotations unacceptable at the time to at least one of the companies participating in the process. The wire-level syntax is different in SAML than in Liberty, and some of the rules are a bit more precise, but the concept is identical.
While the SAML standard was evolving, the Shibboleth project identified a need for something analagous to the Liberty concept in the work it was doing with SAML 1.1. With a focus on SAML attributes, and because of limitations in the older SAML schema, it was decided to define this concept as an attribute called "eduPersonTargetedID" within the eduPerson specification and develop a syntax for it as a SAML attribute.
In formally defining the attribute, the same "triple" of data was used, and the notion of an SP alias was dropped as unnecessarily complicated. A few more precise rules were written as to the processes that would go into managing the attribute, but it was largely envisioned as a SAML 1.1 Attribute "equivalent" to the Liberty (and eventually SAML 2.0) concept.
The "Botch" Propagated 'Round the World
Unfortunately, during this phase of the project, shipping code often preceded formalization, and a significant mistake was accidently propagated into the Shibboleth releases, in which the syntax of the example implementations of this attribute was botched. Instead of fully representing the three pieces of data mentioned above, a "simplified" version of the attribute was used in which a DNS domain (e.g. osu.edu) replaced the full entityID of the IdP. This so-called "scoped" syntax became common among early adopters and has led to a number of interoperability and complexity problems as a result.
XML Syntax and Formal Names
Were it not for the "botch" mentioned above, this would be a simple story, but unfortunately it's not.
As a starting point, let me outline the formal/XML representations recognized by the Shibboleth project/community. Each representation is of the same underlying data.
Note that regardless of whether the resulting
<NameID> appears in a SAML attribute (eduPersonTargetedID) or in the
<Subject> of an assertion, using it to perform SAML queries or other forms of communication with the IdP would depend on the IdP's ability to "reverse" the identifier back into the local user identity.
In Shibboleth, the two are essentially interchangeable because the "reversibility" property is dependent solely on the use of a
<PrincipalConnector> in the attribute resolver. Other implementations are likely to be more limited and tend to draw distinctions between attributes and subject identifiers.
SAML 2.0 NameID
This is the usually recommended approach to passing an eduPersonTargetedID to SAML 2.0 SPs, including Shibboleth 2.x. Instead of using a SAML attribute, the information is passed in the subject of the assertion:
SAML 2.0 Attribute
As an alternative, it's possible to embed the same syntax above inside a SAML attribute with the formal name "urn:oid:22.214.171.124.4.1.59126.96.36.199.10". The main reason for doing this would be to preserve the ability to pass a different kind of identifier in the assertion subject. One use case for this is to support the use of computed/non-reversible values for the "targeted" ID, but use transient, reversible values in the subject to support attribute queries or logout.
SAML 1.1 Attribute
This is the recommended approach to passing an eduPersonTargetedID to SAML 1.1 SPs, including Shibboleth 1.3.x. It is very similar to the attribute-based syntax above and uses the same formal name.
See the IdPAddAttribute topic for information on producing this result. The main requirement is to attach an "AttributeEncoder" of type
SAML1XMLObjectAttributeEncoder to the source attribute.
Incorrect SAML 1.1 Attribute
Finally, the original "botched" syntax is below. Unfortunately, this is probably by far the most common approach one will see today. It is however incorrect.
Both branches of the SP software have extensive support for parsing and communicating the various forms of eduPersonTargetedID to applications.
The newer SP supports all of the forms discussed above and includes additional plugins to ease the job of migrating eduPersonTargetedID consumers to the proper syntax.
The three proper forms that rely on a SAML 2.0 NameID element are by default (in the attribute-map) interchangeably handled and reflected to applications as an SP attribute called "persistent-id". By default, the attribute is expressed by concatenating the three pieces of data together with a bang (!) symbol as follows (using the example data above).
However, this string can be modified as required by the application to include or exclude each component and use any separator desired. For details, see the documentation for the NameIDAttributeDecoder.
The broken form by default is mapped to an SP attribute called "targeted-id" (to help delineate it from the proper form) and is expressed as a "scoped" string with the following @-delimited form:
As a migration feature to encourage adoption of the proper syntax, an alternate plugin can be enabled for consuming the deprecated syntax that produces the same result as the proper syntax would, by dropping the "scope", and substituting the entityID of the IdP source of the value. This feature is called a NameIDFromScopedAttributeDecoder for reasons that are hopefully clear at this point.
The older SP only supports SAML 1.1 and therefore can only process the two SAML 1.1 attribute forms. Both the broken form and the proper form are built-in to the default configuration using some custom attribute handling code and a pair of rules in the AAP file.
When processing the two forms, the SP will "publish" the attribute information to the application using a different string form depending on which syntax is found. It lacks the flexibility and customization of the newer SP and consumers of eduPersonTargetedID are urged to upgrade.
The proper form was handled by concatenating the three pieces of data together with a bang (!) symbol as follows (using the example data above). This was a hard-coded approach that can't easily be altered without plugging in additional code or altering the source code while building it.
The broken form was handled by treating the attribute as "scoped", resulting in the following @-delimited form:
While both forms can be handled at once, and they obviously will never overlap, the specific data seen for a given user will depend on which form the IdP chooses to use. This is in contrast to the migration capability added to the newer SP and discussed above.
Having plowed through all that background, what the heck are you supposed to do? I'll try and be succinct because I hope that the mess above explains why this is the kind of thing that needs a clear approach to deal with.
In my opinion, an SP that wants a robust service should follow these guidelines:
- If you're running the 1.3 SP, upgrade. Now. There is no good answer until you do.
- Adopt some form of the IdP!SP!ID syntax, with whatever order and separator you prefer to use. If your application/database currently contains records keyed by the scoped syntax, simply rekey it by writing a script to convert them. This requires mapping the various scope domains to the equivalent IdP entityID value, but this information is generally available in federation metadata. It's not a terribly hard problem and should be a few hours work, tops.
- Configure your SP to "upgrade" the deprecated SAML 1.1 syntax by enabling the NameIDFromScopedAttributeDecoder rule in your attribute mapping.
That's it. If you follow those steps, you should be able to ignore the problems with the broken syntax and leave it to the IdPs you deal with to fix their systems on whatever timeline they can fix them. None of these steps requires significant coordination with anybody, and the rekeying process should be safe to perform.
A Note About Length
One of the pressures that I've observed in some of the resistance to getting this problem fixed is that the broken syntax is much shorter than the proper syntax can generally be, and is also more "understandable", since it resembles other identifiers like eduPersonPrincipalName. While this is true, I think it's safe to say that no user is going to want to look at the value in any case, and treating it as anything but an internal identifier is a mistake anyway.
The length argument, on the other hand, is legitimate and not perfectly solved at this point. Unfortunately, for historical and frankly stupid reasons, SAML places very few limits on the possible length of the various components. An entityID can in theory be huge, up to 1024 characters. The identifier portion is more limited, but still only to 256 characters.
Obviously it's completely impractical to allow for an identifier over 1K, let alone 2K. Unfortunately, the best I can offer right now is that in practice, this doesn't come up. Nobody uses an entityID that long, and limiting the size to 512 or even 256 is almost certainly good enough.
The downside is that you have to at least allow for the chance you'll get one that's too long and deal with that error condition, but I would suspect that the chances of that error are very small, and likely to be "global" to an entire site such that some kind of accomodation can be made at the time this is noticed. It shouldn't happen on a per-user basis.
Hashing is of course another option, but a hash isn't reversible, and this will create problems when tracking back or communicating with the IdP about a user.