Tuesday, October 27, 2009
In Effective Java, I came across a language construct I'd never seen before:
public class Foo<T extends List & Comparator> {
<U extends List & Comparator> void foo(U x) { }
}
This declares that T must extend or implement both List and Comparator. I've never had occasion to use this, but I [...]
Tuesday, October 27, 2009
One of the common tasks Java developers use Groovy for is testing. One of the common idioms I use is the create a list of strings and use the "each" method to assert that an output file contains them. When testing Unicode, this means both the output files and the Groovy source files [...]
In this part, I'll discuss some of the lower-level APIs for converting byte arrays to characters and a bit more about the Charset and CharsetDecoder classes.
The string class has two constructors that will decode a byte[] using a specified charset: String(byte[] bytes, String charsetName) and
String(byte[] bytes, Charset charset). Likewise, it has two instance methods [...]
When testing Unicode with your application, you need some examples. Most people don't have Thai or Katakana files sitting around, so finding test data is hard.
I've been playing around with JavaScript and JQuery recently, so I thought I'd build a small app that would render Unicode characters from a variety of languages [...]
Saturday, October 24, 2009
In this part, I will discuss the default Charset and how to change it.
The default character set (technically a character encoding) is set when the JVM starts. Every platform has a default default, but the default can also be configured explicitly. For example, Windows XP 32 bit (English) defaults to "windows-1252", which is the [...]
Wednesday, October 21, 2009
In the previous parts, I've discussed Unicode, encodings, and which encodings are used for Java internally. In this part, I'll discuss using Readers and Writers in a Unicode-compliant way. In short, never use FileReader or FileWriter. This is a particularly important thing to understand because I don't feel any of the Java [...]
Tuesday, October 20, 2009
In the last part, I discussed how Unicode is a consistent naming scheme for graphemes, how character encodings such as UTF-8 map Unicode code points to bits, and how fonts describe how code points should be visually displayed. In this part, I discuss the specific things you need to know about using Unicode in [...]
The bad old days
A long time ago, things were much easier for programmers. The only computers anyone cared about were in the US, and these computers only needed to render "normal" letters like "a" and "Q". Then the internet came along, and we realized that there were all of these other people in [...]
Snow Leopard isn't the typical big leap most MacOS X versions have been, and I think it's smart marketing on the part of Apple to name it a variation on the previous version than waste a whole cat on it. Most of the features I don't really understand (Grand Central Dispatch) or care about [...]