Proleptic apoplexy

I spent a few hours this week trying to figure out why some date manipulation methods I was writing weren't working. In my test case, I had two instances of GregorianCalendar which I was comparing, an original and one that had been round-tripped through some conversion methods, via oracle.jbo.doman.Timestamp. Using the equals method, they were returning false. I used the toString method to see if there was something obviously different, but the strings were exactly the same. I even when so far as to step though equals in the debugger, until it disappeared into a JDK implementation class for which the source was unavailable. Even stranger, I discovered using compareTo returned 0. I set about printing each property of the instances until I discovered problem: the GregorianChange property.

In the tests, I was using XMLGregorianCalendar to take an ISO 8601 date string (e.g., 2006-09-22T00:00:00.000-00:00) and create a GregorianCalendar. The intention of this was to allow users to create statements like "if Order.timestamp > DateLib.from("2006-09-22T00:00:00.000-00:00"), do something". Below is the reduction of what was happening in the test and application code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import java.util.GregorianCalendar;
import javax.xml.datatype.DatatypeFactory;
 
public class CalTest {
    public static void main(String[] args) throws Exception {
 
        String dtstring = "2006-09-22T00:00:00.000-00:00";
        GregorianCalendar gcFromXmlgc =
            DatatypeFactory.newInstance()
                .newXMLGregorianCalendar(dtstring).toGregorianCalendar();
        GregorianCalendar gcFromCons = new GregorianCalendar();
 
        gcFromCons.setTimeZone(gcFromXmlgc.getTimeZone());
        gcFromCons.setTimeInMillis(gcFromXmlgc.getTimeInMillis());
 
        System.out.println("equals? " + gcFromXmlgc.equals(gcFromCons));
        System.out.println("compareTo? " + gcFromXmlgc.compareTo(gcFromCons));
        System.out.println("XML GC: " + gcFromXmlgc.getGregorianChange());
        System.out.println("new GC: " + gcFromCons.getGregorianChange());
 
    }
}

Running this produces the following output:

equals? false
compareTo? 0
XML GC: Sun Dec 02 08:47:04 PST 292269055
new GC: Thu Oct 04 16:00:00 PST 1582

equals is false, but compareTo is 0? That's curious. The last two lines show why, now explained.

The core of the problem is XMLGregorianCalendar is intended to represent the W3C XML Schema 1.0 date/time datatypes. The date/time type in XML Schema essentially implements ISO 8601, with a few minor exceptions. The important part here is that it represents a "proleptic Gregorian calendar".

The Gregorian calendar is the calendar used exclusively in the West and by most of the global business world. Most people in the West don't even know there's even a name for the calendar, they just know it as "the calendar". The Gregorian calendar was adopted in 1582 to correct for the gradual drift caused by the previous Julian calendar. This change set the nominal date back 10 days and added a more astronomically appropriate leap year rule. The "proleptic" Gregorian calendar is then the Gregorian rules applied to dates before the calendar was adopted, as if the calendar had always been in effect. Therefore, all Western dates after 1582 are according to the Gregorian calendar, but any date before this can be in either the Julian calendar or the proleptic Gregorian calendar, with one always being specified explicitly.

Knowing this, one would then assume that the class GregorianCalendar implemented a proleptic Gregorian calendar. However, belying it's name, it actually represents a hybrid Julian-Gregorian, with each type having a set of rules as to the order of different dates, and the ability to set a date which defines when the rules cutover. The well-known Joda-Time library names these correctly, with a proleptic Gregorian, a postleptic Julian, and a hybrid GregorianJulian. The JDK 1.5 documentation includes this description, but it's not really useful unless you know what the semantics behind it mean. GregorianCalendar.toString() doesn't print it out, so you have to know it exists and to explicitly examine it when debugging. For most code using dates, the cutover date doesn't matter, since most software isn't going to be used for dates in the distant past, and if it is, there's hopefully a domain expert on hand to specify this sort of thing.

A default instance of GregorianCalendar implements a non-proleptic calendar — dates before to 4 October 1582 are in Julian, dates after are Gregorian. However, XMLGregorianCalendar implements a proleptic Gregorian calendar as specified by ISO 8601, so it sets the change date to Long.MIN_VALUE. The Timestamp class uses millis since the epoch (1970), so it will always give us a representation of the same date, but not the same GregorianCalendar. In Sun JDK 1.5, the only difference in equals between Calendar and GregorianCalendar is the change date:

1
2
3
4
5
 public boolean equals(Object obj) {
        return obj instanceof GregorianCalendar &&
        super.equals(obj) &&
            gregorianCutover == ((GregorianCalendar)obj).gregorianCutover;
    }

GregorianCalendar inherits compareTo (implementing Comparable) from Calendar, which simply compares the millis since the epoch. GNU Classpath 0.18 has a different implementation, as it doesn't take into account the change date:

1
2
3
4
5
6
7
  public boolean equals(Object o) {
    if (! (o instanceof GregorianCalendar))
      return false;
 
    GregorianCalendar cal = (GregorianCalendar) o;
    return (cal.getTimeInMillis() == getTimeInMillis());
  }

In writing my tests, I have made a huge error– I was testing for object equality, when I should have been testing for value ordering. I shouldn't have cared if the objects were the same, what I really cared about was that there was no distinguishable ordering between them– that both objects represented the same value.

I didn't actually write the tests from scratch, I had actually translated them from other date tests written in Oracle Business Rules RL, which as semantics similar to Java, but not exactly. The relevant difference here is that you can use the relative operators on Comparable implementors, which then are converted to calls to compareTo at runtime. I made the mistake of regex-replacing a bunch of obj1 != obj2 to !obj1.equals(obj2), when I should have done obj1.compareTo(obj2) != 0. If these had been any of the relative operators, I would have used compareTo, but != took me to !equals. Big mistake.

I learned a lot from this distraction, not just about calendars, but more importantly about equals and compareTo. I recently started reading the new version of Effective Java, so the sections on comparison and equality have new and deeper meaning for me now.

Aggregated code coverage with Emma and Groovy

This post describes a script I wrote to take XML Emma output and produce multi-package aggregated statistics. One of the drawbacks of Emma's HTML reporting is that it does not allow you to get aggregated coverage information across packages. For instance, if I have packages "com.foobar.sdk.interface" "com.foobar.sdk.impl", there's no automated way to get coverage information for all packages starting with "com.foobar.sdk". Most larger projects are logically grouped like this, so having these "superpackage" groupings is useful. My previous method of getting this was to cut-and-paste the HTML from a browser into a text file, run a Ruby script on it to convert it to CSV, import the CSV into Excel, and add the necessary formulas to the sheet to get the measurements I wanted. Having it simply printed out at the end of the Emma run is much simpler.

First, the setup of Emma. Inside the <report> tag, I put the following output descriptions:

<html outfile="${emma.coverage.dir}/foobar/coverage.html"
    columns="name,class,method,block,line"
     sort="+name,+class,+method,+block,+line" depth="method"/>
<xml outfile="${emma.coverage.dir}/foobar_coverage.xml"
    columns="name,class,method,block,line"
     sort="+name,+class,+method,+block,+line" depth="method"/>

These create both the full Emma HTML report and an XML document with the same results. After calling the report target that includes this, I then use the <groovy> Ant task to call a script which parses the Emma XML and produces some output.

<echo message="------------EMMA Summary----------------" />
<groovy src="${test.scripts.dir}/EmmaParser.groovy">
  <arg value="${emma.coverage.dir}/rules_coverage.xml" />
  <arg value="com.foobar.sdk:SDK,com.foobar.tools:SDK,com.foobar.engine:ENGINE,com.thirdparty:ENGINE"/>
</groovy>
<echo message="----------------------------------------" />

The format of the second argument is comma-delimited set of Java package prefixes and "superpackage" names for which we want aggregate coverage. In the above example, all packages that start with "com.foobar.sdk" and "com.foobar.tools" are grouped into the "SDK" aggregate, and "com.foobar.engine" and "com.thirdparty" are grouped into "ENGINE". For each superpackage, the total number of lines, number of lines covered, and percentage covered are printed.

Below is the groovy script which does the EMMA XML work. A few comments on it:

  • The Groovy XmlParser class was a joy to use and vastly simplified accessing the XML document.
  • The regex was the hardest part to get right. I most commonly write regexes in vim, which requires different escaping that Groovy. It involves both captures and parenthesis in the expression. In Groovy regexes, you escape the parens you want in the expression and don't escape the capture parens. This really tripped me up on the next groovy project after this one, where I reversed the meaning when looking at this regex.
  • Closures are such a nice feature to have when parsing with XmlParser like this. Their use in iteration and assignment of local variables makes the code much shorter to read and understand.

The script:

def filename = args[0]
def config = args[1]

def pkgmap = [:]
def spkgs = [:]
def cmap = [:]
def tmap = [:]

// split the config string by comma, then by colon
config.split(',').each { entry ->
  (entry =~ /(.+):(.+)/).each { all, pkg, spkg ->
      pkgmap[pkg] = spkg
      spkgs[spkg] = ''
  }
}

// init the package map
pkgmap.each { k, v -> cmap[v] = 0; tmap[v] = 0; }

// parse the report
def report = new XmlParser().parse(new File(filename))

// get the stats for the "line" coverage for each package
// packages are non-bundling, so pkg.foo does not contain stats for pkg.foo.bar
report.data[0].all[0].'package'.each() { pkg ->
  pkgmap.each { pkgname, sname ->
      if ((pkg.'@name').startsWith(pkgname) ) {

          (pkg.coverage[3].'@value' =~ /\d+%\s+\((\d+\.*\d*)\/(\d+)\)/ ).each {
              all, cov, total ->
                  cmap[sname] += Float.valueOf(cov)
                  tmap[sname] += Integer.valueOf(total)
          }
      }
  }
}

// print summary stats for each super-package
spkgs.each { sname,x ->
  if (tmap[sname] > 0) println "," + sname + "," +
     String.format("%.2f",cmap[sname]*100/tmap[sname]) + "%," +
        cmap[sname] + "," + tmap[sname]
  else println "," + sname + ",0%,0,0"
}