Java webapp performance riddle

We have two servers: acceptance an production.

One day I noticed that an average page loading is 2 seconds on production while it is 0.5 seconds on acceptance.

Hm, fucking DB guys are fucking around with fucking DB. I thought.

I enabled log4jdbc. Checked DB query timings. And they were the same as on acceptance.

Then I put more logging to the controller. I understood that the controller works fast. Page rendering is slow.

I added more logging to the web layer. There is no decent bottleneck in the application. It is damn slow all around. I checked garbage collector – it was fine.

Then checked all the logic spread around the application: permissions, logging, i18n. I found and fixed numerous bugs. Did a pile of performance optimizations. But in the end prod was 2 times slower than acceptance still.

Then I though this is the hardware. I checked memory, CPU, HDD. The servers are the same.

I implemented a small arithmetic progression summing algorithm without any third party dependencies and ran them using java outside of tomcat. The are equally fast.

Running the same algorithm inside tomcat leads to 4 times difference on prod.

Then I decided – this is JVM.

I checked JVM parameters and found that it has the following on acceptance but not on production:

-XX:ReservedCodeCacheSize=256m -XX:+UseCodeCacheFlushing -XX:CodeCacheFlushingMinimumFreeSpace=20m

Fixing that fixed the half year problem.

Trying to do a retrospective I understand that there is no were no way to figure out that code cache region was exhausing.

Java specs promise to file a warning in this case

Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.

But I have not see anything like that in logs.


UseCodeCacheFlushing. Remember. Always. Especially when you use Groovy in your application.


