Unicode in Java: some Groovy pieces (part 7)

One of the common tasks Java developers use Groovy for is testing. One of the common idioms I use is the create a list of strings and use the "each" method to assert that an output file contains them. When testing Unicode, this means both the output files and the Groovy source files contain Unicode characters. For example, the code may contain:

        def contents = new File(outputFile).getText("UTF-8")
 
       [ "D'fhuascail Íosa Úrmhac na hÓighe Beannaithe pór Éava agus Ádhaimh",
         'イロハニホヘト チリヌルヲ ワカヨタレソ ツネナラム',
         'เป็นมนุษย์สุดประเสริฐเลิศคุณค่า'
        ].each{ assertTrue(contents.contains(it), "${it} not in ${outputFile}") }

The first point is that we can no longer use the File#text method, we need to use the getText method that takes a character encoding scheme argument.

The second point is when Java or Groovy source files that contain Unicode characters, the specify what the encoding for those files is. In this case, we've saved our source files in UTF-8 encoding. As with JVM, javac and groovyc will default to using the platform default encoding if none is specified, which would give us odd errors when the non-printable ASCII characters that resulted from incorrectly decoding the UTF-8 where fed to the compiler.

When I call groovyc from Ant, this is code I use:

         <groovyc srcdir="." includes="com/example/**/*.groovy" destdir="${twork}" encoding="UTF-8">
            <classpath refid="example.common.class.path"/>
         </groovyc>

For more on Groovy and Unicode, Guillaume has an excellent post Heads-up on File and Stream groovy methods

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">