The Shibboleth V1 software has reached its End of Life and is no longer supported. This documentation is available for historical purposes only.

ImprovedErrorHandling

Lots of concerns and complaints expressed about error handling, error messages, etc, much discussion has ensued as 1.3 is being coded.

Current State

Extensive relatively haphazard logging using log4j/log4cpp (mostly to flat files with timestamped entries and thread/context labels). Minimal structured transaction logging to track key session activities. Fairly similar mechanisms used in Java and C+. Most of this is about the C+...

Error handling mostly via exceptions. Key exception types:

$ SAMLException: Internal core errors mostly wrapped by SAMLException or subclasses, currently carries a nest/sequence of XML codes and a message.

$ ShibTargetException: C++ exception used for errors within the outer SP libraries, or to propagate SAMLExceptions out of certain functional units. Currently wraps a small set of error codes, and a structured set of (theoretically) support-oriented data determined based on the IdentityProvider involved in a transaction. The data is mostly used to populate error templates, so if putting that into the templates doesn't help, the type is probably not helping.

$ RPCError: C-style structure used to carry error information across the web server / external shar boundary, generally corresponds to a ShibTargetException and can be built from or turn into one, or be fed into error templates. Also has some limited intelligence based on code about whether an error is retryable or not. In practice, the only retryable error is a session timing out or something similar.

Initial Suggestions

Simplify

We need one error encapsulation mechanism, not three. The SAMLException design was originally modeled closely on the SAML StatusCode structure, but it shouldn't be limited by that. We should create a more flexible exception object that can carry both simple or complex information and be nested. They should continue to have tight value semantics so that copying/nesting is easy/cheap, so no embedded pointers.

Richer Structure

It should be able to carry some kind of simple (i.e. numeric) error code, and optionally the contact oriented data spread out among the classes. It should interface to a message catalog system and provide translatable code->message services when output is necessary. It can carry some string context parameters for the messages, but we shouldn't go overboard there...leave the detailed pieces in the log. We should try and keep things simple by encoding error context data into sets of strings, and then interpret the strings based on the error condition. Perhaps the I2MI diagnostics stuff could assist here...?

The tricky bit is that it needs to be easy to pickle across the RPC boundary. We could use XML for this, that's probably one way to support real structure, nesting, etc. The problem is that most of the information we might want to carry around "just in case" happens across the boundary.

(This is obviously easier in Java without having to cross language boundaries.)

Messages

Obviously need to extract all the error messages and use codes until the actual text is needed. Don't need to do this for the logging.

Logging

Some of the logging in C++ is lazy about printing system internals because a lot has to be transcoded from Unicode to be printed. May want to look into a logging extension to simplify that process.

More Speculations

Once a full system of error codes is in place, we should define error flow out of the system using the codes so that web scripts can interpret the conditions. Codes could maybe be routed in a configurable way by sending some to the IdentityProvider's error page.

Howard's suggestion of tracing activity using displayable logging should be explored more fully. Some code for that is in cvs, we should look at porting that to C++ if possible and exploring the privacy implications.

Need to define a set of status/diagnostic hooks via HTTP requests, along with some simple access control (disable, open for specific IPs, etc.)

I like the idea of capturing "non-errors" that have application impact, such as filtering of attributes, and pushing that into a new interface for some applications to look at. Not sure how to represent it (again, this is much easier in Java than in Apache/etc.) Not everything has to be feature-equivalent of course...