The most confusing aspect of the SP software for beginners, aside from all the SAML and federation concepts, is how the software relates to the applications and resources it's being used to protect. Early use tends to lead to a lot of common questions:
- Why do I need to install the software on every web server? (Answer here)
- Can I protect multiple "applications" at the same time?
- What if I'm just hosting a few static web pages?
- Can I use multiple virtual hosts? Do I have to use separate virtual hosts?
- What if my application is running across multiple physical servers?
To answer these questions, you have to understand how the software is designed to interact with and relate to the resources it's protecting, the URL "space" of the server in other words, and how it exposes this relationship to the IdPs you hook it up to. These aren't things that any standard, including SAML, dictates, and this is not the only way to implement an SP. But it is how this SP works.
Logical and Physical SPs
A single installation of the SP software can act as many logical, distinct "services", and a single logical "service" can span any number of physical hosts.
The first point to make is that the term "Service Provider" (SP) gets thrown around a lot in the documentation and in email, and sometimes it means slightly different things based on context. Like a lot of things in computing, there's the physical part (the software bits you're installing) and the logical part (the notion of a service).
For SAML purposes, an SP is simply any system that's accepting authentication from an IdP. This "system" could be a hundred web servers, a single web server, or a single directory on a web server. That's not defined. The SP concept just represents some collection of resources that makes up a coherent "service".
Furthermore, each SP has a entityID that usually is a URL and looks like a web address but is actually just an identifier to label the SP.
There aren't any hard and fast rules for what makes up a service and when two different things are really two services and not one. That's a fuzzy thing and depends a lot on who's answering. But what is explicit in Shibboleth is that we don't allow for distinctions between resources to be visible to IdPs unless the resources are associated with different logical services, or in SAML terms, different SPs. By extension, these services each have a different entityID.
So to put this concretely, if you have a server hosting two directories called "foo" and "bar", then the only way you can get an IdP to treat a request for authentication to "foo" and "bar" differently is to make the two directories logically distinct SPs, each with its own unique entityID. By "differently", we might mean releasing different SAML attributes to each one, or presenting a distinct login page in each case, or even refusing to respond at all.
Of course, that doesn't mean you want requests for authentication for "foo" and "bar" to be treated differently. In many, if not most cases, all resources on a single virtual host are and should be treated as a unit. This is discussed in more detail below. But if you do want there to be a difference, assigning them to separate SPs is the only way to achieve it.
Going in the other direction, provided some solution is used to cluster the software, any number of installed, physical SPs may be part of the same logical SP and act together as a unit. (Of course, most of the time this assumes that the physical systems are themselves linked and are acting as a single set of resources.)
Applications and Resources
The previous section essentially deals with the view from "outside" the SP software. This section is the opposite; here, we're talking about the view from "inside" that boundary.
With some technologies, Java servlets and ASP.NET to name a couple of examples, there's a built-in notion of "application" layered on top of the document tree. Typically a directory is bound to a single "context" and all the resources at that point in the tree are a unit. But web servers in general don't have this concept. It's often convenient to group things by directories, certainly, or sometimes just by virtual host, but it's not a requirement.
As a result, the SP software has to manage the grouping of resources itself, even though this adds configuration overhead. If it did not do this, then at best the level of granularity would be the virtual host, and that doesn't meet everybody's needs. An additional problem is that every web server is slightly different in how it supports (or doesn't support, in the case of IIS) the ability to attach customized settings to requests based on the URL or the physical directory/file. So, the end result is more complexity in the software in return for the ability to support a lot of different environments and tools at the same time.
The SP software calls a grouping of resources that are meant to be accessed as a unit an "application". This term is used in a precise way and doesn't necessarily refer to something you would actually draw a line around as a "web application", although certainly it can and often does. An application defined to the SP software has a number of important qualities. These are inviolate, and are literally built-in to the software:
- It acts as a barrier between user sessions managed by the software, such that a session with one application will not be shared with other applications. Crossing an application boundary between requests results in separate round trips to the IdP, separate assertions created, and separate attributes cached. It may or may not result in a separate login; that's mostly up to the IdP (assuming the SP doesn't request forced re-authentication).
- It is usually associated with one logical SP. To the outside world, the application is part of that SP, named by its entityID, and is not meant to be distinguished from any other part of that logical SP. With V3, a bit more flexibility is supported, in that an application may polymorphically assign itself different entityIDs at runtime, typically based on the virtual host. This is a new feature that avoids the need for defining multiple applications to support a one-SP-per-vhost scenario, common to virtual hosting platforms. It doesn't avoid the need for metadata for each SP, but it reduces the amount of SP configuration required.
- It uses a common configuration for all of the SAML-related behavior the software carries out, such as metadata and trust, security policy, attribute processing, etc (save for the multiple entityID feature noted in the prior bullet point).
- It includes a unique
handlerURLlocation specific to the application and associated only with it. Requests to SP handlers, such as Assertion Consumer Services and Session Initiators, are always prefixed with this URL and are grouped together into the set of resources that make up the application. Usually this URL will contain the path
/Shibboleth.sso. There is usually only a single such URL, but if an application spans multiple virtual hosts, then each of those virtual hosts will have its own (usually automatically generated) handlerURL.
Any two resources protected by the same physical SP software (or a cluster) can be aggregated into an application. They don't have to live in the same directory or even the same virtual host. Of course, it's common for that to be the case, and generally a good idea, for obvious reasons. But it's not a technical limitation.
The meat of the software configuration is divided across two sections of the shibboleth2.xml file: the
<RequestMapper> and the
<ApplicationDefaults> elements. In the case of Apache, the former is generally omitted in favor of Apache-specific commands.
Assigning Resources to Applications
applicationIdproperty to assign web resources to the application definition they belong to. Out of the box, all resources on all virtual hosts are assigned to a fixed "base" application called "default" that uses a single set of configuration options.
Referring to earlier discussion, the SP software cannot generally rely on the web server alone to make the essential determination at runtime about which application a given resource belongs to. Instead, the software requires that you, the deployer, make that determination by associating a content setting called
applicationId with each request, provided that you intend there to be more than one (otherwise you would simply leave everything bound to the default behavior).
The details are described here, but in general this is done by either of:
- using the
ShibRequestSettingApache command in an appropriate place
- adding it to an appropriate child element inside the
applicationId is assigned at the virtual host or path level, as you would expect. This is only half the job (the rest is down below, see "Application Configuration").
Other Per-Resource Settings
It's a good idea to review the ContentSettings topic, because you'll find a variety of useful settings there that in some cases historically required defining multiple applications to the SP software in order to use.
For example, it used to be common to use multiple applications to hard-wire different virtual hosts or directories to use particular IdPs. This is not needed; you can simply define an
entityID property based on the content, without needing the overhead of defining additional applications and complicating the SP's metadata.
With V3, there's a new setting,
entityIDSelf, which attacks the opposite problem, defining each virtual host as its own logical SP with a pattern-based entityID derived from the virtual host name. The goal is to eliminate as much as practical any need to define overrides at all.
Once you assign a non-default
applicationId property to some set of content, you have to complete the configuration by defining an "application override" in the shibboleth2.xml file. Defining a new application requires the minimum of an
<ApplicationOverride> element just inside the closing
</ApplicationDefaults> tag. An
id attribute is needed with the
applicationId used in the resource mapping step above. The rest depends on what's intended to be different about the application from the default settings.
By default, most of the SP configuration is derived from the information supplied inside the
<ApplicationDefaults> element. This includes a large number of XML attributes and elements that make up the SAML configuration to use, including the metadata, credentials to use, session policy, and the various handlers that do the technical work.
In most cases, the majority of these settings will be common to all the applications you define, so by design the software inherits settings as much as possible from the default level down into any overrides you create. This is however subject to some complications, as described here.
One of the most common things when creating an override is to assign it a special entityID, making it a distinct logical SP living inside the same physical installation. This is done by adding an
entityID property to the
<ApplicationOverride> element. With V3, this can even be avoided by assigning each virtual host to a
entityIDSelf content setting that allows the system to derive its own name at runtime based on the virtual host accessed.
The other common task depends on whether the application is intended to take up the whole of a different virtual host, or is part of a web tree on a virtual host that includes multiple applications. Separation by virtual host is the recommended approach because it allows the new application to inherit the default application
"/Shibboleth.sso" and greatly limits the amount of additional configuration work. It's also more secure.
Dividing up a virtual host, on the other hand, requires that you supply at minimum a new
<Sessions> element with all of the necessary settings inside, particularly a distinct
handlerURL that will be unique to, and be part of, the new application. This is discussed in more detail here.