Another strange problem raised last week at work that finally I’ve solved. To be honest, the problem is not so strange, but the way it showed up and the errors we got were. As most of the problems, when you discover the roots or the reasons, they stop being rare and you think – “It’s obvious!” :-).
I like to be concise when explaining this kind of things, so to sum up, we have a Java server that acts as an authenticator for other applications. This authentication server uses the javax.security.auth.login Java package to connect and authenticate through an LDAP server.
Besides, an intesive process tries to get information from another application through a web service that needs authentication. To get authenticated, this process uses the authentication server described above but after 300 hundred calls or so in one or two minutes we start seeing this Java exception stacktrace:
04:13:07.682  ERROR com.xxxxx.xxxxx.server.services.XXXXXXXXServiceHelper.login(): Authentication error : java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:691) at com.sun.jndi.ldap.Connection.(Connection.java:231) at com.sun.jndi.ldap.LdapClient.(LdapClient.java:136) at com.sun.jndi.ldap.LdapClient.getInstance(LdapClient.java:1600) at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2698) at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:316) at com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:193) at com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:211) at com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) at com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) at javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:684) at javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:307) at javax.naming.InitialContext.init(InitialContext.java:242) at javax.naming.ldap.InitialLdapContext.(InitialLdapContext.java:153) at com.xxxxx.xxxxx.server.ldap.LdapConnection.open(LdapConnection.java:115) at com.xxxxx.xxxxx.server.ldap.LdapConnection.open(LdapConnection.java:97) at com.xxxxx.xxxxx.server.ldap.LdapLoginModule.attemptAuthentication(LdapLoginModule.java:325) at com.xxxxx.xxxxx.server.ldap.LdapLoginModule.login(LdapLoginModule.java:175) at sun.reflect.GeneratedMethodAccessor10194.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:784) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:698) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:696) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:695) at javax.security.auth.login.LoginContext.login(LoginContext.java:594)
At the beginning people thought that our authentication server application was running out of heap memory… but it doesn’t, because it didn’t stop working. After 3 or 4 minutes it worked fine and no errors were shown until the next execution of the intensive process. Then, one of the development teams involved in this issue called me to help them investigating the problem.
I was sure it wasn’t a problem of the heap memory of the authentication server because the OutOfMemoryError was shown in a java log trace of the application, not the application server where it is deployed, so the application worked… not really fine, but worked!. Another possible cause I thought was that the error source was LDAP, but our LDAP is not Java based, either the log files gave any error.
The key to this problem was in front of our eyes, the message associated to the OutOfMemoryError: unable to create new native thread
After reading a bit on several pages, it’s possible that this problem could be solved using different solutions. But it’s sure it can’t be solved increasing the heap size of the JVM memory, because the JVM heap memory was fine.
I’m not a Linux system administration, but after investigating a bit, I discovered that it was the machine, where the application server with our authentication server was installed, which was running out of resources for creating system operating threads. Our system administrator realized that the Linux machine had a default configuration, so the running processes limit and the file descriptors limit was too low. Finally I asked the system administrator to increase those numbers. It can be done modifying the values for nofile and nproc from the file /etc/security/limits.conf:
# - nofile - max number of open files # - nproc - max number of processes
The safest way would be limiting the values only for the user involved, not the whole machine, but that’s must be a system administration decision.