Skip to content

Unicode in Java: Default Charset (part 4)

In this part, I will discuss the default Charset and how to change it.

The default character set (technically a character encoding) is set when the JVM starts. Every platform has a default default, but the default can also be configured explicitly. For example, Windows XP 32 bit (English) defaults to "windows-1252", which is the CP1252 encoding that provides for encoding most Western European languages.

The default charset can be printed by calling:

System.out.println(java.nio.charset.Charset.defaultCharset());

When the JVM is started, the default charset can be set with the property "file.encoding", e.g., "-Dfile.encoding=utf-8". Some IDEs will do this automatically, for example, NetBeans uses this property to explicitly set the charset to UTF-8. The drawback to this is that code that uses a class like FileReader that relies on the default encoding may work correctly when handling Unicode in the development environment, but then break when used in an environment that has a different default encoding. The developer should not rely on the user to set the encoding for the code to work correctly.

Also, one might think they could just alter the system property "file.encoding" programmatically. However, this cannot be set after the JVM starts, as by that time all of the system classes which rely on this value have already cached it.

In Linux/Unix, you can also set the LC_ALL to affect the default encoding. For example, on one Linux box I have, the default is US-ASCII. When I set "export LC_ALL=en_US.UTF-8", the default encoding is UTF8.

The environment variables LANG and LC_CTYPE will also have a similar affect (more here).

In summary, the default charset is used by many classes when a character set is not explicitly specified, but this charset should not be relied upon to work correctly when your application is supposed to handle Unicode.

One Comment

  1. Satish wrote:

    Excellent Analyzation

    Wednesday, December 2, 2009 at 10:58 am | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*