With large, distributed Guidewire installations, how do you manage and monitor performance and scalability?

There are many known processes and procedures for measuring the baseline performance of a single Guidewire application, and they should be utilized throughout the development of the application.
But, once the Guidewire application is released for production use, the metrics and methodologies utilized for the development of the application will not guarantee its performance over time.
This is not a problem with the Guidewire applications or a shortcoming of the current development processes and procedures, it is the nature of distributed systems.

Distributed systems represent multiple computers connected by integrations or network segments which perform automated work processes. Work processes in distributed systems can be of any type, such as asynchronous, synchronous, or transactional. In any system with multiple work processes of different types you have points in the system which work in tandem with other points in the system, work independent of any point in the system, and operate across multiple points in the system. With large variances in work processes and work process types, it should be expected that the distributed system will have inconsistent or unexpected performance from time to time. To be more specific, the nature of a distributed system dictates that there is not one baseline for performance, but many baselines for specific system functions.

Brewer’s CAP Theorem is a statement about large distributed systems which defines desirable rules of those systems:

The theorem goes on to provide an important reality about the CAP rules:

Brewer's CAP Theorem is telling us that there are some compromises with large distributed systems. If the system cannot adhere to all CAP rules, then it should be expected that there may be some variability in system behavior.
For example, if our system is not 100% available then it will be necessary to monitor availability to determine when there is a problem and how to rectify it.

Distributed applications tend to function in a manner which is opposite to performance benchmarking. At the time you release a distributed application which has been benchmarked you know its performance based on the 'new' database, the number of initial users, and the apparent performance based on testing.
After the distributed application has been available for some time it will start to behave differently, usually in small ways at first.

The key to measuring Performance of your Guidewire application is to monitor the various functions that the distributed system can perform, and to monitor the related servers, networks, and database instances.
By pro-actively monitoring a distributed application, problems can be determined sooner and more time will be available to troubleshoot the issue in a production environment.
Keep in mind that undetected changes in performance or scalability eventually show up as a system failure, an application crash, or an out of resource condition (memory, stack, disk, etc.)

<== Previous          Next ===>