Monday, September 19, 2005
Availability, External System Connectivity, and SOA
Web Services and SOA are getting a lot of press but the technology and platform for truly making highly available system by composing services available from other enterprise systems exposed as a web service is still in its infancy, specs are still being developed by various standards bodies (by OASIS such as WS-Relibility and other standards bodies). In the meanwhile we need to build failover and reliability in the architectural framework of the application. Some of the fundamental principles of architetcural framework that provides external system access using web services are:
- Create a bounded pool of connections to the remote system: This is important - it prevents all application server threads from being used up while waiting for a response in case backend is non-responsive or crashes half way while processing a request. We have seen scenarios where application server stops responding completely because all threads get stuck waiting for response from one or two critical backend service.
- Periodically close connections that have been open for a while for better load balancing i.e. age connections. This is very useful for hardware based load balancers which load balance at the time of HTTP connecton setup.
- Enable and set appropriate network level connection and read timeouts to prevent stuck applications: This is extremely important because of HTTP's request/response semantics. We have seen many a transactions getting stuck forever when the backend/external service becomes unavailable and the system does not get TCP reset (connection termination) message. This leads to TCP half open connection and the sender waits for a response forever unless read timeouts are implemented.
- Close and discard connections from the connection pool after error.