Skip to main content

Configuring stuck connection in IBM WAS 8.5.5 Connection pool

Recently we start getting a few complains from our client related to connection on DataBase from IBM WAS. First action we have taken to take a look on log which we got from the client and discovered these following errors on application logs:
  • Error 404: Database connection problem: IO Error: Got minus one from a read call DSRA0010E: SQL State = 08006, Error Code = 17,002
  • java.sql.SQLException: The back-end resource is currently unavailable. Stuck connections have been detected.
With a quick search on google i have found PMR 34250 004 000 on IBM support sites, which is also effect IBM WAS 8.* version. As soon as we are using third party web portal engine (BackBase) it was travois to figure out the problem, so we decompiled some code to make sure that all the data source connection closing well. After some research i have asked data base statistics and data source configurations from support team of the production. And i was surprised with the data base statistics that all connection on DataBase was full and IBM application server could not get any new connection to complete request. 
On Oracle DataBase, maximum connection was set to 6000 and we have more than 32 application server with Maximum Connection 200. It was a serious mistake, formula for configuring connection pool of IBM cluster is as follows:
Maximum Number of Connection in Node * Quantity of Nodes < Max Connection set to Database
In our case, configuration should be 
200 * 32 < 6000
We send a request to increase the DataBase connection in Oracle to 10 000. But what to do with the stuck connection? I have checked the IBM WAS advanced connection pool properties and noticed that, stuck connection properties are configured at all. 
Lets check, what the Stuck connection is?
A stuck connection is an active connection that is not responding or returning to the connection pool. Stuck connections are controlled by three properties, Stuck time , Stuck threshold and Stuck timer interval.
Stuck time
  • Time for a single active connection to be in use to the backend resource before it is considered to be stuck.
  • For example, stuck time is 120 seconds and if the connection is waiting on database for more than 120 seconds then the connection would be marked as Stuck
Stuck threshold
  • The stuck threshold is the number of connections that need to be considered stuck for the pool to be in stuck mode
  • For example, if the threshold is 10 and after 10 connections are considered stuck , whole pool for that datasource is considered Stuck
Stuck Timer Interval
  • Interval at which , how often the connection pool checks for stuck connections
With the above information i have configured the following Stuck connections properties:
With the above configuration, when the connection pool will be declared as stuck? 
Stuck timer interval : 120 secs
Stuck time : 240 secs
Stuck threshold : 100 connections (maximum connection 200)
What happens when pool is declared stuck ?
  • A resource exception is given to all new connection requests until the pool is unstuck.
  • An application can explicitly catch this exception and continue processing.
  • If the number of stuck connections drops below the stuck threshold, the pool will detect this during its periodic checks and enable the pool to begin servicing requests again
Also it is very useful to check inactive connection periodically in Oracle Database, if some connection is hang and inactive you can drop this connection manually.
Here is a pseduo query to find inactive connection in DB

SELECT
 s.username, 
s.status,
S.sid || ',' || S.serial# p_sid_serial
from v$session s, v$sort_usage T, dba_tablespaces TBS
where
(s.last_call_et / 60) > 1440
AND T.tablespace = TBS.tablespace_name
and T.tablespace = 'TEMP';
Hope the above information will help somebody to quick fix in IBM WAS. 

Comments

Popular posts from this blog

8 things every developer should know about the Apache Ignite caching

Any technology, no matter how advanced it is, will not be able to solve your problems if you implement it improperly. Caching, precisely when it comes to the use of a distributed caching, can only accelerate your application with the proper use and configurations of it. From this point of view, Apache Ignite is no different, and there are a few steps to consider before using it in the production environment. In this article, we describe various technics that can help you to plan and adequately use of Apache Ignite as cutting-edge caching technology. Do proper capacity planning before using Ignite cluster. Do paperwork for understanding the size of the cache, number of CPUs or how many JVMs will be required. Let’s assume that you are using Hibernate as an ORM in 10 application servers and wish to use Ignite as an L2 cache. Calculate the total memory usages and the number of Ignite nodes you have to need for maintaining your SLA. An incorrect number of the Ignite nodes can become a b...

Benchmarking high performance java collection framework

I am an ultimate fan of java high performance framework or library. Java native collection framework always works with primitive wrapper class such as Integer, Float e.t.c. Boxing and unboxing of wrapper class to primitive data type always decrease the java execution performance. Most of us, always looking for such a library or framework to works with primitive data type in collections for increasing performance of Java application. Most of the time i uses javolution framework to get better performance, however, this holiday i have read about a few new java collections frameworks and decided to do some homework benchmarking to find out, how much they could better than Java native collection framework. I have examine two new java collection framework, one of them are fastutil and another one are HPPC. For benchmarking i have used java JMH with mode Throughput. For benchmarking i took similar collection for java ArrayList, HashSet and HasMap from two above described frameworks. Col...

Apache Ignite Baseline Topology by Examples

Ignite Baseline Topology or BLT represents a set of server nodes in the cluster that persists data on disk. Where, N1-2 and N5 server nodes are the member of the Ignite clusters with native persistence which enable data to persist on disk. N3-4 and N6 server nodes are the member of the Ignite cluster but not a part of the baseline topology. The nodes from the baseline topology are a regular server node, that store's data in memory and on the disk, and also participates in computing tasks. Ignite clusters can have different nodes that are not a part of the baseline topology such as: Server nodes that are not used Ignite native persistence to persist data on disk. Usually, they store data in memory or persists data to a 3rd party database or NoSQL. In the above equitation, node N3 or N4 might be one of them. Client nodes that are not stored shared data. To better understand the baseline topology concept, let’s start at the beginning and try to understand its goal and what ...