Monday, September 19, 2005
Availability, External System Connectivity, and SOA
          Web Services and SOA are getting a lot of press but the technology and platform for truly making highly available system by composing services available from other enterprise systems exposed as a web service is still in its infancy, specs are still being developed by various standards bodies (by OASIS such as WS-Relibility and other standards bodies). In the meanwhile we need to build failover and reliability in the architectural framework of the application. Some of the fundamental principles of architetcural framework that provides external system access using web services are:
 
          
		
 
     
     
     - Create a bounded pool of connections to the remote system: This is important - it prevents all application server threads from being used up while waiting for a response in case backend is non-responsive or crashes half way while processing a request. We have seen scenarios where application server stops responding completely because all threads get stuck waiting for response from one or two critical backend service.
- Periodically close connections that have been open for a while for better load balancing i.e. age connections. This is very useful for hardware based load balancers which load balance at the time of HTTP connecton setup.
- Enable and set appropriate network level connection and read timeouts to prevent stuck applications: This is extremely important because of HTTP's request/response semantics. We have seen many a transactions getting stuck forever when the backend/external service becomes unavailable and the system does not get TCP reset (connection termination) message. This leads to TCP half open connection and the sender waits for a response forever unless read timeouts are implemented.
- Close and discard connections from the connection pool after error.
Java Net, DNS Caching and Availability
          Many organizations are deploying global load balancers to load balance across geographically distributed data centers. This also imporves service availability since one data center can be taken offline for maintainance without disruption to service. The GLSBs use DNS resolution to direct traffic to server farms. To ensure that the system that uses other backend web services is highly available, can handle failovers, and recover from failover without requiring a server re-start do the following:
The DNS name to IP address resolution capability is provided by InetAddress class (part of java.net package - core networking package for Java Platform). The default implementation is to cache DNS-to-IP resolution FOREVER. In fact InetAddress will also cache un-successful DNS-to-IP resolution for 10 seconds (default).
Java 1.4 and above versions provide system properties to modify DNS caching behavior by setting the the cache TTL (time-to-live) and negative cache TTL (i.e. failed resolution) documented here, http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html.
Unfortunately there is no standard or formally documented way of changing DNS caching behavior in versions Java 1.3 and below. However there is a non-standard Sun proprietary system property that can be set at Java command line to change the behavior (Java 1.4 documentation actually includes this property name). The property is
sun.net.inetaddr.ttl
The system property is specified at command line as:
java -Dsun.net.inetaddr.ttl=0
Values are interpreted as:
-1 (default) => Cache FOREVER
0 => Disable caching. This means every call to resolve address will require DNS query.
+integer => In seconds TTL for cache entry i.e. time after which cache entry is stale. After this time, a call for DNS-to-IP will result in DNS query.
          
		
 
- Use URL to connect to service endpoint, so that DNS lookup is used to determine service endpoint IP address.
- Java DNS cache TTL is set to a reasonable value. By DEFAULT Java's DNS resolution will cache DNS to IP resolution FOREVER. After the initial DNS-to-IP (successful) resolution only way to force Java to make DNS query is to re-cycle the JVM. Obviously this is not very good for building highly available system.
The DNS name to IP address resolution capability is provided by InetAddress class (part of java.net package - core networking package for Java Platform). The default implementation is to cache DNS-to-IP resolution FOREVER. In fact InetAddress will also cache un-successful DNS-to-IP resolution for 10 seconds (default).
Java 1.4 and above versions provide system properties to modify DNS caching behavior by setting the the cache TTL (time-to-live) and negative cache TTL (i.e. failed resolution) documented here, http://java.sun.com/j2se/1.4.2/docs/guide/net/properties.html.
Unfortunately there is no standard or formally documented way of changing DNS caching behavior in versions Java 1.3 and below. However there is a non-standard Sun proprietary system property that can be set at Java command line to change the behavior (Java 1.4 documentation actually includes this property name). The property is
sun.net.inetaddr.ttl
The system property is specified at command line as:
java -Dsun.net.inetaddr.ttl=0
Values are interpreted as:
-1 (default) => Cache FOREVER
0 => Disable caching. This means every call to resolve address will require DNS query.
+integer => In seconds TTL for cache entry i.e. time after which cache entry is stale. After this time, a call for DNS-to-IP will result in DNS query.


