Per-class heap plots with py-jstack

Having to profile a webapp that filled the Java Heap, I wrote a simple python module that plots the size of the heap consumed by each class.

It uses the files generated by #jmap -histo $(pidof java); which tracks the memory per-class consumption.

Once you generate your files with something like:

while sleep 5; do
    jmap -histo $(pidof java) > /tmp/histo.$(date %s)
done

You can load the jplot module included in https://github.com/ioggstream/py-jstack/.
Using ipython makes things even easier!

#git clone https://github.com/ioggstream/py-jstack/ 
#cd py-jstack;
#ipython;
ipython$ import jplot

Once you loaded the module, you have to list the files to parse
and generate a table containing the classes and their memory occupation in time
for the first 30 greedy classes.

ipython$ files = ! ls /tmp/histo.*
ipython$ table = jplot.jhisto(files, limit=30, delta=False)

What does the `table` dictionary contain? A list of #instance and memory size in time

ipython$ cls = 'java.lang.String'
ipython$ print(table[cls][:10]) 
[(452588.0, 18103520.0), # values in 1st file
 (186198.0, 7447920.0), # values in 2nd file
 (229789.0, 9191560.0), # values in 3rd file
...]
ipython$ memory_for_string = zip(*table[cls])[1]
ipython$ max_memory_for_string = max(memory_for_string)

Using matplotlib we can plot too, and have a glimpse of which class is misbehaving…

ipython$ jplot.plot_classes(table, limit=10)

Statistics 101 with ipython

Today I needed to process some log files in search of some relations between data. After parsing the log file I got the following table.

data = [ 
('timestamp', 'elapsed', 'error', 'retry', 'size', 'hosts'),
(1379603191, 0.12, 2, 1, 123, 2313),
(1379603192, 12.43, 0, 1, 3223, 2303),
...
(1379609000, 0.43, 0, 1, 3223, 2303)
]

I easily converted this into a columned dict:

table = dict(zip( data[0], zip(*data[1:]) ))
{
'timestamp' : [ 1379603191, 1379603191, ..., 1379609000],
'elapsed': [0.12, 12.43, ..., 0.43],
...
}

In this way it was very easy to run basic stats:

print [k, max(v), min(v), stats.mean(v), stats.stdev(v) ] for k,v in table.items() ]

Check data distributions

from matplotlib import pyplot
pyplot.hist(table['elapsed'])

And even look for basic correlation between columns:

from itertools import combination
from scipy.stats.stats import pearsonr
for f1, f2 in combinations(table.keys(), 2):
    r, p_value = pearsonr(table[f1], table[f2])
    print("the correlation between %s and %s is: %s" % (f1, f2, r))
    print("the probability of a given distribution (see manual) is: %s" % p_value)

Or draw scatter plots

from matplotlib import pyplot
for f1, f2 in combinations(table.keys(), 2):
    pyplot.scatter(table[f1], table[2], label="%s_%s" % (f1,f2))
    # add legend and other labels
    r, p = pearsonr(table[f1], table[f2])
    pyplot.title("Correlation: %s v %s, %s" % (f1, f2, r))
    pyplot.xlabel(f1)
    pyplot.ylabel(f2)
    pyplot.legend(loc='upper left') # show the legend in a suitable corner
    pyplot.savefig(f1 + "_" + f2 + ".png")
    pyplot.close()

You spin me right round (like an hard disk)

You can get some info about your disk work with bb-iostat

Another nice tool – distributed with the bonnie++ suite – is zcav. zcav measures read thruput in function of the sector number.

As the disk spins at constant angular speed (eg. 7200RPM aka 120Hz), and data is stored on the plates at various distance from the spindle, the speed of the outer part of the disk (first sectors) is higher than the inner one.

Ex. Audio CD reads data at a constant speed. To achieve that, the spindle starts at 3.5Hz at the beginning of the disc (the outer part) to end at ~8Hz at the end.

Ex. On a 2.5inch (63mm) HD @120Hz:
* the outer data speed is 2·π· 63mm · 120Hz ~ 47m/s
* a mean data speed is the half 2 · π · 32mm · 120Hz ~ 24m/s

To get some numbers, run zcav on two distant partitions:

sudo zcav -c1 -lsda1 -f/dev/sda1;
sudo zcav -c1 -lsda8 -f/dev/sda8;

and nicely plot that data with gnuplot:

#gnuplot

unset autoscale x
set autoscale xmax
unset autoscale y
set autoscale ymax
set xlabel “Block offset (GB)”
set ylabel “MB/s”
set key right bottom
plot “sda1” title “/other”, “sda8” title “/home”

And here’s an extract from my sda8 test. The read performance loss is sensible!

#block offset (GiB), MiB/s, time
0.00 63.13 4.055
10.00 61.27 4.178
20.00 57.69 4.438
30.00 60.13 4.257
40.00 54.08 4.733
50.00 56.34 4.544
60.00 55.91 4.579
….
140.00 42.94 5.962
150.00 38.96 6.571
160.00 38.62 6.628
170.00 34.91 7.333

Live debugging java with btrace

Today I had to do a live debugging on a java application. I found a nice tool named btrace (no, it’s not that btrace…but THIS btrace ;).

Btrace seems magical:

  1. is the java version of SystemTap or DTrace;
  2. doesn’t need java debugging nor has  further requirements;
  3. doesn’t stop your application.

Once downloaded, you have to:

  1. create a small java program specifying what to trace;
  2. find the pid of your target jvm instance;
  3. add the required classpath to btrace.

Then run it, with:

# ./btrace -cp $YOUR_CLASSPATH  $JVM_PID MyClassTracer.java

The MyClassTracer.java is something like this (example take from here):

import com.sun.btrace.annotations.*;
import static com.sun.btrace.BTraceUtils.*;
import java.io.File;
// every time new File(String) is called, print the file name
@BTrace
public class HelloBTrace {
@OnMethod(
clazz="java.io.File", method=""
)
public static void onNewFile(File self, String name) {
print("new file ");
println(name);
}

}

Log for the bare necessities, the simple bare necessities

Why did you do that if you weren’t asked to? Improving performance means doing just the minimum tasks. As stated in “Performance” section, http://logging.apache.org/log4j/1.2/manual.html , we’d better check if we need creating tonns of strings in java. So

if (log.isDebugEnabled()) {

log.debug(“Create” + “this” + message + “string”);

}

prevents creating unneeded debug/trace strings.