I have a two node hazelcast cluster configured and it seems to be working fine most of the time except when I get clients on both nodes. In this case messages send by a user in one node stop reaching users that are connected to the other node. This does not happen all the time but when it happens I can see the logs get filled with messages like:
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(Invocati onImpl.java:450)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java: 298)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:66)
at com.jivesoftware.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(Clus teredCacheFactory.java:333)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:586)
at org.jivesoftware.openfire.SessionManager.getConnectionsCount(SessionManager.jav a:894)
at org.jivesoftware.openfire.plugin.StatCollector.run(StatCollector.java:94)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014.09.18 11:39:00 com.jivesoftware.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(Invocati onImpl.java:450)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java: 298)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:66)
at com.jivesoftware.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(Clus teredCacheFactory.java:333)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:586)
at org.jivesoftware.openfire.SessionManager.getConnectionsCount(SessionManager.jav a:894)
at org.jivesoftware.openfire.plugin.StatCollector.run(StatCollector.java:94)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014.09.18 11:39:30 com.jivesoftware.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(Invocati onImpl.java:450)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java: 298)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:66)
at com.jivesoftware.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(Clus teredCacheFactory.java:333)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:586)
at org.jivesoftware.openfire.SessionManager.getConnectionsCount(SessionManager.jav a:894)
at org.jivesoftware.openfire.plugin.StatCollector.run(StatCollector.java:94)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014.09.18 11:39:35 com.jivesoftware.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(Invocati onImpl.java:450)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java: 298)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:66)
at com.jivesoftware.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(Clus teredCacheFactory.java:333)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:586)
at org.jivesoftware.openfire.SessionManager.getConnectionsCount(SessionManager.jav a:894)
at org.jivesoftware.openfire.plugin.StatCollector.run(StatCollector.java:94)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014.09.18 11:39:45 com.jivesoftware.util.cache.ClusteredCacheFactory - Failed to execute cluster task within 30 seconds
java.util.concurrent.TimeoutException
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.resolveResponse(Invocati onImpl.java:450)
at com.hazelcast.spi.impl.InvocationImpl$InvocationFuture.get(InvocationImpl.java: 298)
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:66)
at com.jivesoftware.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(Clus teredCacheFactory.java:333)
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory. java:586)
at org.jivesoftware.openfire.SessionManager.getConnectionsCount(SessionManager.jav a:894)
at org.jivesoftware.openfire.plugin.StatCollector.run(StatCollector.java:94)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
And in the web interface on both servers I can see that they fail to list remote sessions (sessions on other cluster node). This makes me thing that the cluster if timing out when requesting the remote sessions, which causes the node to ignore them and fails to send the message to them.
Is someone else having this issue? Is there a fix or workaround? Any help or tips would be greatly appreciated.
I am using Openfire 3.9.2 with Hazelcast 3.1.7.