Introduction:

Almost a year ago, I was asked to tune HBase read and write performance for a time series product. It was using 10 Datanodes in AWS i2.4XL’s and had 15 compute nodes out of which 10 were continuously writing and 5 were reading and running batch jobs to and from these data nodes. Most of these compute nodes were on r3.xl’s

Our goal was to achieve 10 million metrics per minute ingestion along with reads. There were some batch jobs which would run inside coprocessors for the past 10 min and 1-hour data on these data nodes. Tuning the entire cluster is required.

In this article, I am only going to focus on HBase Client tuning and the issues faced on the client side and how it was tuned. Of course, this also required tuning the HBase server side, which I am going to cover in another topic.

This article will cover following 3 client configurations

  1. Do not use single put, batch it instead
  2. How to control the number of HBase client connections/Sockets to each regionServer
  3. Using common connection and share the pool

I will explain all the three above and when and why we made those changes.

And the load test begin

I ran a load test using 2 Mil/min metric, ran it for and hour or so, with a hope that this is going to work. But like most of the assumptions, this one also short-lived and experienced a failure. Looks like the system was unable to handle the pressure.

What happened?

When I checked the logs I saw almost every Compute Node was throwing this error:

java.util.concurrent.ExecutionException: java.io.IOException: Call queue is full, is ipc.server.max.callqueue.size too small?

Looks like the calls are queuing up. Should I increase this size?

One thing that I have learned with my experience, which I must share. “To solve a problem, always look at the big picture first.”, here is why

From the exception, it looks like that increasing this size should resolve the issue. But, you need to reason or justify this change, and why the default is not enough? When I read about this parameter, I realize that increasing this value will have some side effects on HBase memory and the region server can go Out of Memory. But, region server intelligently will drop the new call if the queue size grows beyond 1GB but still can go OOM.

So, before making this change let’s find out why the queue is getting full, or why even this queue is getting used?

Theoretically, this can happen if you are making a lot of IPC calls to a regionServer or may be all the calls are ending up on one region server, hot spotting?

In this case, the data is equally distributed and we saw no hot spot happening on this cluster. So I suspected that are we making a lot of calls to the server?

Then I dig down in our code and saw this:

Put p = new Put(Bytes.toBytes("xxxxxx"));

p.add(Bytes.toBytes("xxxx"),  Bytes.toBytes("someQualifier"),
Bytes.toBytes("Some Value"));
  table.put(p);

From above snippet, you can see how many put operations the entire cluster will be making in a minute. The complexity here is O(n), in our case O(n) will resolve to the number of metrics we are sending. The load here was 2 million/min, so in a minute, we were calling put 2 million times, and each read call will use its own set of IPC’s as well. Looks like it’s obvious now on why we were getting the exception on every Compute Node.

The change here was to batch the PUT and re-run the test. Now instead of writing one by one, we were now writing every 10 sec using the below change.

table.put(List<Put> puts)

We nailed this problem, and since the change, we never experienced this exception.

NOTE: You might need to tune the batch. Do not blindly use the batching, the size of your payload can also play a role here. Check this link below before introducing this change

http://stackoverflow.com/questions/28754077/is-hbase-batch-put-putlistput-faster-than-putput-what-is-the-capacity-of

Is this the only problem?

For 2 mil/min metric, it worked. Yay!!! Now, let’s move the game up and run 5 mil/min metric load. Started the load, all looked well, no more queue size warnings or errors. But, hey It FAILED AGAIN!!!

Ahhhh! Why?

The cluster experienced multiple things this time. Due to the related nature of all these issues, I am going to cover them all in one shot

Here is the list of 4 issues faced:

  1. HBase client/Compute nodes while writing getting stuck forever and HBase client threads stayed in TIMED_WAIT forever.
  2. On high load compute nodes started throwing socket time out’s on write and reads.
  3. Slow write and read throughput.
  4. Unable to get to that 10 million per min mark.

We started getting the following exception on all the Compute Nodes, the exception here is for reads, but the same stack with writes also appeared.

2016-01-22 22:10:47.257 HBase READ-23 WARN o.a.h.hbase.client.ScannerCallable – Ignore, probably already closed
java.net.SocketTimeoutException: Call to ip-xx-xxx-xxx-xxx/xx.xxx.xxx.xxx:60020 failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannelconnected local=/xx.xxx.xxx.xxx:44064 remote=ip-xx.xxx.xxx.xxx/xx.xxx.xxx.xxx:60020
at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1486) ~hbase-client-0.98.12.1-hadoop2.jar:0.98.12.1-hadoop2
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1461) ~hbase-client-0.98.12.1-hadoop2.jar:0.98.12.1-hadoop2
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661) ~hbase-client-0.98.12.1-hadoop2.jar:0.98.12.1-hadoop2
at

 

Why sockettimeoutException?

Here were the two reasons I can think of for this exception:

  1. The read or write return from the HBase server is slow
  2. The client is unable to connect to the server and timed out. Thread/Connection congestion?

If you are experiencing “1”, then increasing hbase.rpc.timeout might be your solution, but still, you most probably will end up on “2” as well. For our case 1 was not the cause. How to tell? We enabled DEBUG logging on a regionServer and it prints the time it took to scan. It was hardly taking 5-10ms for each call. So for us, HBase server response was not a problem.

Let’s now focus on 2, which most of you would also experience if you are using default hbase-client properties.

After looking into one of the Compute Node, I noticed that HBase client by default creates only one connection per regionServer. When the load was up and running, I used the following command to see the number of HBase connections ESTABLISHED to the regionServer.

netstat -an | grep 60020 | grep EST

To my surprise, for every regionServer, the process made only one connection. This explained the timeouts. One connection/socket only? Seems as if this is the default HBase client behavior. Not sure why, yet?

But, I then searched for the HBase connection setup and found the following property

<property>
   <name>hbase.client.ipc.pool.type</name>
   <value>RoundRobinPool</value>
</property>
<property>
   <name>hbase.client.ipc.pool.size</name>
   <value>20</value>
</property>

What this hbase.client.ipc.pool.size property does? 

For each client connection, based on the load, it will create 20 connections to each regionServer. And the connection will be used in round-robin style. For this test bed, the above values were good enough.

And we Nailed it again, ran the test again and since then never seen socket timeout exception.

One last hurdle, DeadLocks

Here is a Thread Dump for the HBase read threads, same was for Writes as well

“HBase READ-4” prio=10 tid=0x00007f2ad8021800 nid=0x3bd in Object.wait() 0x00007f2a9dbda000      java.lang.Thread.State: TIMED_WAITING (on object monitor)      at java.lang.Object.wait(Native Method)      at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1452)     – locked <
0x0000000225072a48> (a org.apache.hadoop.hbase.ipc.RpcClient$Call)      at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)      at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)      at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31392)      at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:318)      at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:163)      at………

Threads are locked/deadlock

TO BE CONTINUED….

Some Useful links

You should read this bug, to get more details on ipc.server.max.callqueue.size

https://issues.apache.org/jira/browse/HBASE-5190

 

 

post