Good Operational Software – Part 2

This is the second part of my thoughts on good operational software.  For part one, go back one post to here.

Second, is separating out configuration management from software development. Like logging, this one is easy on the surface; it’s pretty trivial to separate out the main configuration parameters into a configuration file rather than hard coding everything. The most interesting thing about this, however, is that it exposes dependencies in an explicit way. For example, if an application is dependent on a database connection, then there must be configuration related to that database – the host, port, database name, user, and (hopefully encrypted or obscured) the password.

The difficult part of this, however, is to keep the configuration simple and, to the extent possible, uniform. In the same way that maintaining code in many different languages is difficult, maintaining configuration in many different languages is non-trivial. When this is compounded by the fact that operations staff are often responsible for all the software in the production environment, this quickly becomes unwieldy. Further, a more complex configuration file is more error-prone; operations staff shouldn’t have to debug the configuration files if at all possible.

The main benefit of this is that the configuration and the code can then be managed separately. This allows operational staff to evolve the software with the production environment without requiring that developers get involved. This is of particular importance when the operational staff must keep up with infrastructure changes and keep the software running without developer support.

Third, is empowering all technical staff. While the first two points both support this, there is a bit more to do, specifically around documentation and training. Documentation is noticeably not in-vogue right now, and I’ve heard a number of clichés around not doing it. The two most common focus around developers, “The code is the documentation” and application users, “If the application is so complicated as to need documentation, it should be refactored to be made more intuitive.” However, neither of these address technical staff that aren’t developers – there is value focusing some time on these documentation needs, or, at the very least, substituting it with training.

The documentation needs of operational staff are different from both users and developers. In particular, the main focus is on dependencies and exceptions: what specific systems and services are you dependent on? How are they configured in the event that they are moved/migrated/updated/deprecated? What is the user experience like when these services are unavailable? What happens when pieces of your own architecture fail unexpectedly? Having these sorts of issues is not a sign of bad software, but is a fact of life in an operational environment. Further, good software isn’t necessarily preparing for every possible exception case, but understanding that exceptions happen and equipping those that will have to deal with it to handle the problem in the most efficient way.