DRAFT - In Progress
To better inform discussion about a next generation SP design, this is a high level summary of the current design, components, and some of the key decisions that have to be made if we go at this again.
To reiterate some of the main problems with the current design:
- The volume of C/C++ code needs to be greatly reduced to be anything close to sustainable with the skill sets available today from people not near retirement.
- Reliance on the currently used XML libraries has proven to be too big a risk going forward because they are essentially unmaintained. Alternatives may exist if a rewrite were done, but wouldn't address point 1.
- The footprint is not amenable to a lot of "modern" approaches to web application deployment.
- Packaging has been a significant effort, partly due to the sheer number of different libraries involved.
That said, there are some key requirements we probably have some consensus on that a new design probably would have to maintain:
- Support for Apache 2.4 and IIS 7+, ideally in a form that leads more easily to other options in the future.
- Support for the current very general integration strategy for applications that relies on server variables or headers.
- Some relatively straightforward options for clustering.
The new design being considered would offload the majority of "not every request" processing to a central hub that would be implemented in Java on top of the existing code base used by the IdP, including its SAML support, XML Security, metadata, etc. Long term it could potentially expand to more generic support for other protocols that fit the same general pattern. The intention would be to maintain at least some degree of scalability by ensuring that post-login "business as usual" requests could be handled locally as much as possible, which is where a lot of the interesting design decisions lie.
There would not be a 1:1 relationship between agents and hub obviously, so one hub could potentially serve lots of agents but other deployments might prefer to run one locally.
I have embedded a number of editorital comments and opinions in blockquotes (there are lots of opinions throughout, but these are moreso personal musings).
High Level Design
The SP is a set of web server modules that sit on top of a number of libraries created for the project, along with a large number of "unusual" dependencies. The modules are supplemented by a stand-alone daemon process that people gravitate to, thinking it is the SP, when in fact it's really a processing assistant and a state management tool for the work being done by the modules. In fact, the internal design is such that it's conceptually possible to build a version of the SP in a single binary that has no separate process, but that would create state management problems for sessions and would add significant symbol contamination such that conflicts inside Apache would be more common on some platforms because of how shared libraries work on most non-Windows systems.
So, in practice the system is split into those two pieces but they share a common configuration and the code on both sides of the boundary actually comes from the same set of libraries, built in two separate ways so that different code is included (or more accurately, one half contains everything and the other "lite" half contains much less code). At runtime the libraries are initialized with feature signals that inform the code which components to create and make available, and that includes "InProcess" and "OutOfProcess" designations that tell the code which half of the system it's in. Much of the functional code therefore contains conditional sections that evaluate at runtime how to behave based on these signals.
The SP modules and "shibd" executable wrapper sit on top of the "libshibsp" library, which in turn sits on top of the "libsaml" and "libxmltooling" libraries, the latter two conventionally considered to make up the C++ version of OpenSAML. There's a substantial amount of code in the modules because that's where all the web server touchpoints are (Apache in particular requires a lot), but the "shibd" executable is really just a thin wrapper that does a little bit of set up work and then runs a "Listener" plugin, the server half of the remoting layer that supports IPC between the two halves. The listener runs in a socket select loop waiting for connections and starting job threads to handle them, all of which is handled by the "libshibsp" library itself and all its dependencies.
Notably, only the "shibd" process is linked to "libsaml" and the full "libxmltooling" and on to "libxml-security-c", "libcurl", and OpenSSL. Thus, all the security work, SAML processing, signature/encryption code, backchannel communication, etc. is all handled by the portions of library code linked to "shibd". The web server modules don't link directly to those libraries, and instead make remote calls to "shibd" to process messages. Both halves are linked to the Xerces XML parser and to a logging library. The XML linkage allows the module to support the XML-based configuration and to do some kinds of XML message processing in isolated cases.
It doesn't matter all that much for discussion purposes which part of the job all the different libraries do other than to say that in the end, most of this code has to somehow be replaced by "something" in order to substanially reduce the native code footprint.
Like most modern code bases, there's a substantial amount of modular design, with C++ used to define abstract component interfaces that are implemented by supporting components. There is in fact a lot of boilerplate code in "libxmltooling" the provides for both portable dynamic loading of extensions and for a plugin management layer that allows components to register themselves with a plugin manager of a given type, and then allows the configuration to instantiate a plugin of a particular type at runtime. This plugin model extends to a range of components, both very large (whole "service" abstractions") and small (individual handlers that know how to process requests of a certain sort). With a few exceptions for the "mainline" processing of requests, XML wrappers, and the touchpoints with Apache and IIS, essentially most of the objects in the code tend to be plugins of some sort or another.
The configuration is split between XML files and a bridge back to Apache (for that server module obviously). The configuration object is largely a big "scaffold" built by a couple of large, complex classes that store off a lot of information and build lots of plugins that do other work and hang them off the configuration class. It's basically the outside-in approach that dependency injection solves in other systems. The noteworthy thing is that once this is built, apart from reloading changed configuration, the object tree is (with one recent exception) fully "processed" rather than constantly parsed in real time.
It should be possible (whether it's practical or not...) to actually pass a configuration file to a remote service to process and return some kind of processed result. Representing the result is clearly not simple, but would potentially allow some amount of temporary compatibility and maybe long term avoidance of XML parsing even if the format remains XML. It's attractive to consider the idea that the entire SP configuration could stay "local" and avoid any part of it being held by a central hub, even if the hub does work to parse it.
Reloading of configurations is actually the responsibility of each component itself to manage. Typically these classes inherit from a common base class that mediates this with some complex locking semantics to allow for shared read access but single writer access to swap in new implementations of the underlying service. This allows reloads to be triggered in the background, with all processing done "out of band" of requests and then swapped in briefly with a pointer swap under the exclusive lock.
Ideally I'd like all that code to go because it's extremely complex but it may prove difficult to get rid of entirely though I would hope/expect that very few configuration files would be directly parsed by the SP.
The current remoting layer between the server modules and "shibd" started life as a true ONC RPC interface but was replaced in V2 by a simple (and still unsecured) socket protocol with simple length-prefixed messages. There are plugins for both Unix domain and TCP sockets. The messages are actually passed as XML (this is an internal detail and not exposed to the rest of the code) and are serialized trees that are pretty similar to JSON in flavor (simple values of different types, arrays, and structures of named members), but predates JSON by many years.
The programming interface to this is a bit messy (it was ported from C to C++ and has some "odd" object reference vs. value semantics), but functionally it provides a very easy way to construct trees of data dynamically, and pass the trees across the process boundary. A convention of operation, input, and output is used to represent a remoted call. The tree is mutated by the call and the caller and callee just manipulate and read the tree to exchange data and do work.
There is no actual "API" in a documented sense. Instead, components typically exist in runtime form in both the calling and callee processes and a component registers the message names it listens for at an "address" based on the type of component and on context supplied from the configuration when it builds the components to keep addresses unique (e.g., if there are two identical handlers living at different paths, the path is supplied from outside via the constructor to make the addresses unique). The code is thus "talking to itself" across the boundary and the client and server logic is usually in the same code unit, where changes to the interface are made "all at once" in a new version.
This design allows for any portion of a component's work to be farmed out to "shibd" by splitting it into client and server portions, and some creative class designs allow this to be modeled in ways that allow the code to "collapse" back into one process and cut out the remoting, though this has never really been used.
The handlers tend to be good examples of this pattern, so the classes in cpp-sp/shibsp/handler/impl are good samples for what this looks like. Notably, and particularly with handlers, there are base classes that provide support for actually remoting the bulk of an HTTP Request and Response so that it's possible to pass an entire request, cookies, headers, etc. to "shibd" for work and a response to be returned and then mapped back into Apache or IIS API calls to issue a response, without the calling half even knowing what's being done.
This has pretty clear applicability to doing the same kind of thing to offload entire processing steps to a remote service. For example, a SAML ACS handler could effectively implement itself without knowing anything about SAML, the parameter names, etc. This effectively means that one handler might work for lots of protocols, all SAML bindings, etc., which would be a big improvement in the design and code size.
A note about threads: the calling code uses a client socket pool to reuse connections. The server code has a very primitive approach: a bound socket is passed off to a job thread that runs for the life of the socket, and terminates if the socket is closed or stops working. The job threads are not pooled or limited, because the intent was that jobs would be completed fast enough to complete work and free up the connection for a client thread in the caller to use it again, limiting the number of job threads needed. This works well except for Apache pre-fork mode where hundreds of child processes exist, each one connecting and starting its own thread with a gigantic default stack size. This is why the SP doesn't work with pre-fork mode under load, which mattered way longer than it should have, and is ridiculous to use now.
The threading issues should be moot in any redesign, but in terms of code reuse there is potentially some value here. Security would probably have to come from TLS and that's still a dependency challenge I don't have a good answer for. It may end up being the case that tunneling naked traffic over stunnel could be the best solution, but obviously adds moving parts.
There's a lengthy list of components that should be possible to fully eliminate from the code by offloading the work:
- All SAML specifics, including metadata, message handling, policy rules, artifact resolution. I wouldn't anticipate a single SAML reference in the code, ideally.
- SAML Attribute processing, extraction, decoding, filtering, resolution of additional data, etc.
I would think one big win would be leveraging the IdP's AttributeRegistry, AttributeResolver (and filter engine of course) to do all this work for us, and in particularly to instantly add the ability to supplement incoming data with database, LDAP, web service, etc. lookups and transformations, returning all of the results to the SP agent. For some deployments, that alone may be enough to incentivize converting to this approach, though it's possible that many of those deployments would (and have) just proxied SSO already anyway.
- Credential handling and trust engines
- SOAP client
- Protocol and security policy "providers" that supply a lot of low-level configuration details
- Replay cache
Most of this code would be either unnecessary to a redesign or already implemented in Java, modulo that configuring it may be very different or would have to be wired up in code based on the existing SP configuration syntax.
Like Apache or IIS, there's a "handler" concept that acts as a plugin point for any code that has to fully process requests and issue responses to do work for the SP, as opposed to the more "filter"-oriented parts that operate during normal web server request handling to enforce sessions, export attributes, do access control, etc. Handlers obviously work portably across any SP-supporting environment.
It's very unlikely given logout and other requirements that the actual concept of a handler goes away. The downside of that fact is that it could leave behind a decent amount of supporting code to allow for pluggability since all of that is pretty cross-functional and is kind of the price of supporting extensibility.
These start the process of establishing sessions by either issuing login requests to an IdP or by doing discovery to identify the IdP. They typically ran in "chains" that allowed discovery to run if needed or drop through to protocol-specific handlers that issued requests, allowing different SAML versions and WS-Federation to co-exist in whatever precedence was desired.
I would expect these to be eliminated and the hub would perform all this work. There are some very old plugins for discovery with cookies and forms that probably got little use but ultimately that can all be farmed out to "something" doing the discovery process.
All SAML-based at the moment, these are really generically any endpoint that handles a login response to establish a new session. They also do a lot of the work of the SP calling other components to extract, filter, resolve attributes, etc. Notably they also create the session with the client in conjunction with the Session Cache.
I would expect these to be eliminated and the hub doing most of this work but the cross-over into a session is TBD depending on some of the tricker questions.
These start (and sometimes finish) a logout operation, and are separated out so as to allow for issuing SAML LogoutRequests to be separate from processing inbound LogoutRequest and LogoutResponse messages.
Clearly much of this code has to go (the SAML parts), but probably a core amount of it will have to stay to support logout. I suspect two big piles of code that that will have to stay will be some form of session cache and logout. Probably the logout would be forwarded into the hub, where it would determine whether to "finish" or issue a LogoutRequest, and return either HTTP response to the SP.
I believe only the SAML 2 (and possibly WS-Federation) LogoutHandler is actually an extant example of this, but they would process some kind of formalized logout request coming from the IdP. It's extremely complex code with weird rules for dealing with the session, locating other applicable sessions because of how SAML SLO works, implementing application notification hooks, and finally having to issue LogoutResponses.
Again, there are clearly SAML portions that would have to be offloaded, but it will be some work to tease apart what aspects are which and maintain this kind of functionality, and the remaining code will be complex and a challenge for somebody to maintain.
The rest of the handlers are a grab bag of stuff.
- AssertionLookup – this was a back channel hook to pass the full SAML assertion into applications that wanted it. Not clear to me this would be worth keeping.
- DiscoveryFeed – the discovery feed, this would clearly go away though might have to migrate into the hub in some form if we intended to maintain the EDS.
- AttributeChecker – basically a pre-session authorization tool, probably would need to stay in some form
- ExternalAuth – this was a backdoor to create sessions as a trick, I doubt we'd keep it but it would take substantial offloading to do it
- MetadataGenerator – gone, obviously, but probably replaced by something else, possibly somewhere else
- SAML 2 ArtifactResolution – this is for issuing outbound artifacts, I can't imagine we'd bother keeping it, offloaded or otherwise, but we could
- SAML 2 NameIDMgmt – if we kept this, it would probably need to morph into some more generic idea of updating active session data via external input
- SessionHandler – debugging, still needed
- StatusHandler – debugging, still needed I imagine
- AttributeResolver – this was a trick to do post-session acquisition of data using the SP AttributeResolver/etc. machinery and dump it as JSON; if we kept this it would have to be offloaed obviously, and we'd likely have to discuss the actual requirements with someone
This is the most complex piece of code in the SP (and not coincidentally the IdP). Partly this is because it's a component that tends to start life as a "self-contained" component but ends up having to solve so many problems that the final result isn't so modular anymore, and didn't get decomposed into smaller portions. Sessions in general are just the hardest part of implementing one of these modules and in some sense is the only reason to do it. In a web platform that handles sessions, it's going to make more sense to implement identity inside that platform and not generically, because the application is already stuck using that platform and will be an instant legacy debt nightmare regardless of how you do identity.
With Apache, and with IIS if you exclude .NET, you have nothing. Implementing identity pretty much means you have to implement sessions, and that's a really hard thing to do because of state. IIS and Apache (the latter moreso) tend to use multiple, constantly cycling processes, so any sort of cache has to address that.
Anyway, for now, current state is that the SP primarily relies on a separate process to hold an in-memory session cache, though there are plugins that can extend that. The "shibd" process was meant to be colocated with every module instance to hold the in-memory sessions across web server process changes, with the assumption that application clustering would usually dictate sticky load balancing anyway. That has become less true now because people (who aren't me) have typically started putting application state into databases, something I don't know how to make work reliably because of database (lack of) locking.
The cache is actually a two-level architecture, and like many of the components at the SP layer, is internally remoted between the module and "shibd" in some very complex ways. The lower layer of the cache is a StorageService plugin to hold the sessions "persistently", though in common usage "persistent" here just means in-memory until "shibd" is restarted. The higher layer of the cache is a second buffer for sessions held within each web server process by the "in process" portion of the cache component. Essentially the first time a session is used within a child process, it's queried across the remoting boundary and read out of storage, deserialized, and returned across the process boundary for caching in memory in a simple hash table. Eventually entries do get pruned out if not used.
The sessions are not live objects within "shibd", so they're effectively stored in a serialized form or in-process as active objects. There's a trick involved in order to support session timeout. For that to work across process boundaries, every session access has to update a shared timestamp. This is actually done with a hack of leveraging the expiration of the session records in storage, substracting off some knowable values, to recover the last time used. When the session is accessed, there is always a "touch" operation that remotes to "shibd" and updates the record expiration. That touch also applies a timeout check so that every child process shares that view of the session and enforces a consistent timeout.
Astute readers will ask why on earth we would spend this much effort on timeouts. Reviewing the list, or talking to a few clueless security people, will give you a pretty good idea. You either deal with timeouts or spend half your waking moments justifying, explaining, or arguing with people about the lack of timeouts. The only difference between logout and timeout is that the former is just impossible to do reliably while the latter is merely very hard and painful.
The session cache does a number of other, related things, such as actually managing the session cookie used to associate the session with the client, making that detail "internal" to the implementation of the cache, simplifying other sections of code at the expense of the cache code.
Another thing it does is implement various logout tracking requirements because of SAML's horrendous decision to implement logout by NameID value. A lot of code is needed to support that and actually track logout messages for weird "logout arrives ahead of assertion" race conditions.
Finally, the session cache also implements a V3 feature copied from the IdP to implement encrypted cookies to actually persist sessions and recover them even across server nodes. The intention of that feature was that it would be used only occasionally, with sticky load balancing minimizing the need to do it, but it has the obvious potential to be part of a revamped design that could store sessions exclusively that way. That requires giving up on timeouts because it can't be tracked client-side (we can't decrypt, update, and encrypt on every request, obviously) and makes logout entirely at user-discretion unless some kind of revocation cache is used (one can just put the cookie back in place and voila, session's back).
There are two storage plugins that were designed mainly to offload session state from "shibd", memcache and ODBC. The memcache option always seemed strange since it's not really any more persistent, and adds unreliable network requests (the clients don't work well, to nobody's surprise) to the mix, but as a replacement for "shibd" entirely to track sessions, it probably makes more sense. Notably, there's no obviously maintained memcache client on Windows. The ODBC option has been totally unreliable because Linux has no stable and free ODBC driver manager or drivers anymore, but again as purely a session store with no "shibd" involved, there's some continued relevance at least on Windows where ODBC is reliable.
The one advantage of moving the second-level cache to the hub would be the potential of implementing other storage options in Java rather than C++, which will open up a lot more options. A possible hybrid would be to just implement session cache store operations remotely but not actually implement a real cache itself in the hub, just have it passthrough data between storage back-ends and the SP, essentially just relaying between a REST API we design and the Java libraries to talk to Hazelcast or what have you.
I suspect that timeout is just done for with virtually any practical design we could come up with. Having to touch the world on every single access is just not a very practical design, and the way others deal with that in a distributed fashion is some kind of polling/update with hidden frames. That has no future with browsers today.
There are two state mechanisms I have never tried to implement, and that do tend to crop up in others' Apache modules: shared memory and files. The former aren't portable and I've never cracked exactly how to lay out sessions in a way that works with shared memory blocks, but it clearly could be done. Files just seemed non-scalable to me and difficult to manage contention around. I'm not sure that "non-scalable" remains a sound argument in a design that might be offloading a lot of the up front work over a network anyway, so this probably needs some thought. Using files might be a simpl-ish way to deal with the multiple child process issue and possibly in conjunction with the encrypted cookie idea could provide a pretty reasonable to implement solution for the majority of sites?
This summarizes the general functionality from each dependent library and its likely future.
Boost is a giant grab-bag of advanced C++ libraries, mostly template-based so they just compile in rather than get linked in. Much of the usage now are things that are part of C++ 11 but weren't available at the time.
Generally the goal here should be to eliminate this if possible, mostly just to simplify the code, but it's not the end of the world if it's needed.
This is a fork of "log4cpp" to address a lot of threading bugs that may or may not have been fixed upstream but by then it became academic to go back to it.
At this point, given the history of startup/shutdown issues with it, I would be happier to see this replaced with native log calls. There's a bit of a shim around logging provided today in the request abstraction that provides a lot of the shared code between the Apache and IIS modules, so maintaining that approach and layering the logging left directly on Apache and on the Windows Event Log may make sense to try and eliminate this piece.
This is providing the cryptography support to the other libraries as well as direct support for loading keys and certificates and so on.
This ideally would be eliminated as a dependency to be successful because any touchpoints to it would imply a significant amount of complex native code remains. The most likely fallback would be that it's left as a libcurl dependency, which could lead to problems in some systems due to symbol/version conflicts with Apache but that's unlikely on all but Solaris or macOS these days.
The most feature-rich HTTP client available, this handles any use of the back-channel as well as the transport layer for loading certain files remotely, though in practice that's only commonly done with SAML metadata.
While it would be very nice to eliminate this as a dependency, that may prove difficult if the design ends up relying on web service calls to offload work. It may be necessary to consider a lower-level socket interface vs. a full HTTP interface since that code already exists, but that would complicate securing the connection as well as actually implementing and deploying the offloading service.
This is the common low-level compression library common to Linux and ported to Windows. It's used internally by OpenSSL and libcurl to implement compression support, and is also used by OpenSAML to implement DELFATE/INFLATE support in a couple of places.
I would hope all this goes away with the offloading of function.
Xerces is the XML parser and DOM API we standardized on, mainly because the Xerces-based XML Security code available (eventually) was much more flexible and "fit" the design of the SP than the comparable code based on libxml2. Nevertheless, we bet wrong and Xerces is unmaintained in practice and full of known and unknown security vulnerabilities. To this point, most of them have tended to lie in the DTD support, which I added options around to disable fully in our builds.
Getting rid of Xerces is probably among the more paramount goals we have. I would rather not replace it with anything else, which means either replacing the configuration or offloading the consumption of it.
The "Santuario" library in C++ is the XML Signature and Encryption (and C14N) code. It's also unmaintained except by us, and while it's in better shape than Xerces, it's still somewhat hard to maintain and it has some serious performance issues on large files.
This must be eliminated as a dependency to be successful because of the complexity and the dependencies on Xerces and OpenSSL.
This is the lowest layer of the SP "proper", and one of the libraries that makes up OpenSAML. It's a bit of a grab bag that started life as the main XML abstraction layer on top of the DOM and grew to include lots of other utilities, I/O interfaces, the SOAP layer around "libcurl", and other stuff, including security and trust abstractions modeled loosely after work done in Java. Because there's so much in it, it has to be used by both "shibd" and the server modules, so it builds as both full and "lite" versions, with the lite build excluding all the security-related code.
Below is a preliminary assessment of the future state of some of this code.
While the designs diverged to some degree, the C++ code shares some common ideas with the Java code and has a layer of XMLObject interfaces and base classes that handle most of the XML processing in the code in a way that allows programmatic access to information at a higher level than the DOM but without breaking signatures and with a high degree of DOM fidelity.
All this code has to go for a rewrite to be beneficial.
There are a lot of utility functions and classes here, some of which are probably still going to be needed, but it will be the last pile of code to go. Some of the code is relatively simple abstraction classes to wrap locking, mutexes and condition variables for portability, and may be worth keeping. One of the big things in this layer are the various helpers that manage UTF-16 and UTF-8 conversions between single and double-byte encoding because Xerces is natively UTF-16.
Since Xerces is actually the API by which conversions are being done, it's inherently going to be impossible to do that conversion so we probably have to avoid UTF-16 entirely, as I'm fairly certain C++ doesn't provide conversion.
The new version should be natively relying on ASCII and UTF-8 whenever possible and on C++ 11's support for native UTF-16 where necessary.
The persistence layer used by the rest of the code base is defined here, and the basic in-memory implementation is here.
Many of the current use cases for this are clearly out of scope for a replacement and have to be offloaded (e.g. the ReplayCache). The big unknown here is sessions; it may be that that piece if it stays would be rebased on some internal approach to storage rather than a shared interface but this doesn't seem all that much better and the basic implementation is quite simple anyway.
There's a port of the IdP's DataSealer idea here that supports encrypted storage of data under a shared key for cross-node purposes.
Something probably needs to provide this functionality, but offloading it should be practical.
There's a template engine in this layer that's used to produce error pages and the interstitial pages needed to implement SAML handoffs. The latter is clearly going away and the former probably should, in favor some kind of more systematic approach to relaying error information to custom code, which is the only thing that's practical to do anyway.
HTTP Request/Response Abstractions
The core interfaces that ended up being reused across the SP to model HTTP requests and responses for portability ended up in this library. I doubt they're going away but they would be part of an eventual "single" library supporting the SP in the future, i.e., would migrate into "libshibsp".
Key and Certificate Handling, Trust, Signatures, Encryption
The layer of code that implements loading of keys and evaluation of peer keys and certificates (generically, not in a SAML-specific way) is here, as are a lot of supporting code that bridges between Santuario and the XMLObject layer.
Notwithstanding that securing any remote channel to offload work is TBD, this code needs to go and the functions have to be offloaded.
SOAP and HTTP Clients
A limited HTTP client that has additional support for simple SOAP usage needed by SAML is implemented here on top of "libcurl" for security, with some very low-level callbacks into OpenSSL-aware code to do trust.
This code all needs to go, and any functionality needed has to be offloaded.
There's a very complex base class here that supports the managed reloading of configuration state used across a number of components in the code, including the SAML metadata support.
For complexity reasons alone, this should really go, but that will depend on being able to offload configuration processing.
This is the "proper" OpenSAML library itself and does not have a "lite" build because it's used directly only by "shibd" today. All of the SAML XMLObject wrappers are here, along with metadata support, message encoders and decoders, and some policy objects that implement some of the SAML processing rules. Some of this was patterned after early Java approaches, but is now significantly different from it.
Going into details here is largely beside the point; all this code has to go.
All the remaining non-Apache/IIS/etc. code is inside "libshibp" in the top-level project along with a few other add-in libraries that implement some lesser-used features.
Some of these pieces are discussed earlier, some of the remanining code is noted below.
This would probably have to end up centralized on the hub to be of much value, but possibly with some minimal session lifecycle events logged on the far side.
The RequestMapper is a portable way of mapping requests to settings, essentially a very stripped down version of the Apache Location feature (i.e. it's strictly URL-driven, never aware of physical files or directories). Originally it was used heavily with Apache also, but has since been reworked into a layered design that allows you to combine Apache commands with the XML syntax (Apache wins out) so that it's somewhat seamless.
This is a serious annoyance that's pretty much necessary everywhere but Apache because nothing else has the kind of maturity Apache does when it comes to attaching settings to requests. It's minimally needed for IIS if for nothing else. XML is a pretty natural syntax for this because it's already hierarchical, and Apache's weirdness often derive from the fact that its syntax isn't.
This will be a significant issue to deal with, i.e. maybe offload to the hub to parse, or perhaps come up with an alternative syntax.
This is another thing that exists primarily because not everything is Apache and has a real authorization layer, so this was a portable alternative. It's essentially tied into the Request Mapper, though implemented separately.
I would doubt this can be eliminated since it's not really enforced at session start, but per-request based on content. In practice, I never saw much value in static rules like this and it seems as though it never actually works since people don't want to just get an error page, they want to actually force a new session to try and get a different outcome. That's something that can't really be done with Apache's authorization model, but probably could be done if it were levelled up into the request processing steps.
Applications and Overrides
This is discussed pretty thoroughly in ApplicationModel, but is basically a way of layering segregation of content on top of an undifferentiated web document space. It's tied heavily into the Request Mapper component (which maps requests to the proper application ID). Since most of the settings in this layer are really more SAML-level details, the implication is that maintaining feature would require that the hub know about it intimately to be able to segregate its own configuration in the same way.
I deeply want to dump this concept, which has not worked. People can't handle or understand the concept. But I don't know how to just ignore it. Maybe ignoring it at the path level would fly but I can't really see losing it at the host level. In some sense, every virtual host is truly virtual to the hub. Securing all of this probably has to rely on some kind of security mapping between the virtual host and a certificate to "authorize" requests to the hub for a particular entityID and vhost. So in a sense, it's the norm to think of every vhost as its own "thing" in this new world and it sort of doesn't matter which vhosts share a physical server. Really too early to think about the implications yet.