Skip to content

Category Archives: Uncategorized

ClamAVj : a Java library for accessing the ClamAV clamd daemon

I wrote some code last week to scan files against the ClamAV antivirus scanner using the clamd daemon. It's up now on Google Code under the Apache 2.0 license.

C3P0

It's a little scary that that his valid Java:

public class Main {
 
static class OX {
public static double C3P0;
}
 
public static void main(String[] args) {
OX.C3P0 = 0X.C3P0;
[...]

Multi-extends in generified types

In Effective Java, I came across a language construct I'd never seen before:

public class Foo<T extends List & Comparator> {
<U extends List & Comparator> void foo(U x) { }
}

This declares that T must extend or implement both List and Comparator. I've never had occasion to use this, but I [...]

Unicode in Java: some Groovy pieces (part 7)

One of the common tasks Java developers use Groovy for is testing. One of the common idioms I use is the create a list of strings and use the "each" method to assert that an output file contains them. When testing Unicode, this means both the output files and the Groovy source files [...]

Unicode in Java: bytes and charsets (part 6)

In this part, I'll discuss some of the lower-level APIs for converting byte arrays to characters and a bit more about the Charset and CharsetDecoder classes.
The string class has two constructors that will decode a byte[] using a specified charset: String(byte[] bytes, String charsetName) and
String(byte[] bytes, Charset charset). Likewise, it has two instance methods [...]

Unicode in Java: sample data (part 5)

When testing Unicode with your application, you need some examples. Most people don't have Thai or Katakana files sitting around, so finding test data is hard.
I've been playing around with JavaScript and JQuery recently, so I thought I'd build a small app that would render Unicode characters from a variety of languages [...]

Unicode in Java: Default Charset (part 4)

In this part, I will discuss the default Charset and how to change it.
The default character set (technically a character encoding) is set when the JVM starts. Every platform has a default default, but the default can also be configured explicitly. For example, Windows XP 32 bit (English) defaults to "windows-1252", which is the [...]

Unicode in Java: Readers and Writers (part 3)

In the previous parts, I've discussed Unicode, encodings, and which encodings are used for Java internally. In this part, I'll discuss using Readers and Writers in a Unicode-compliant way. In short, never use FileReader or FileWriter. This is a particularly important thing to understand because I don't feel any of the Java [...]

Unicode in Java: primitives and encodings (part 2)

In the last part, I discussed how Unicode is a consistent naming scheme for graphemes, how character encodings such as UTF-8 map Unicode code points to bits, and how fonts describe how code points should be visually displayed. In this part, I discuss the specific things you need to know about using Unicode in [...]

Unicode in Java: introduction (part 1)

The bad old days
A long time ago, things were much easier for programmers. The only computers anyone cared about were in the US, and these computers only needed to render "normal" letters like "a" and "Q". Then the internet came along, and we realized that there were all of these other people in [...]