Unicode in Java: Readers and Writers (part 3)

In the previous parts, I've discussed Unicode, encodings, and which encodings are used for Java internally. In this part, I'll discuss using Readers and Writers in a Unicode-compliant way. In short, never use FileReader or FileWriter. This is a particularly important thing to understand because I don't feel any of the Java books I have stated this explicitly enough so that I understood it until I encountered it in the field.

The various Reader and Writer classes in Java almost never to the correct thing by default. Not because they're not well-designed, but because it's largely up to the user to specify what "the correct thing" is. For example, FileReader and FileWriter will always use the default character encoding. This varies widely between platforms, for example, Windows XP 32-bit defaults to CP1252 (a variant of ISO-8859-1), many Linuxes default to US-ASCII, and MacOS X defaults to MacRoman. If you expect your users to input Unicode characters, this will always cause them to be garbled. It is possible to change the default character encoding (which we'll discuss later), but you shouldn't rely on your users to set their environments up in a certain way, particularly when your users are non-technical.

If your application has control over a set of flies, it needs to explicitly specify the character encoding and always use that encoding. Instead of using FileReader and FileWriter, you must use InputStreamReader and OutputStreamWriter with the constructors that take stream and a charset name string, e.g. "UTF-8". This is a bit confusing, since it is referred to as a "charset", even though it's technically a character encoding. Here is what the code should look like:

InputStream istream = ...;
BufferedReader reader = new BufferedReader(new InputStreamReader(istream, "UTF-8"));
OutputStream ostream = ...;
Writer writer = new OutputStreamWriter(ostream, "UTF-8");

If you're reading an writing files, you can use the FileOutputStream and FileInputStream implementations for the InputStream and OutputStream instances. The *Stream classes only read and write bytes, so it's the Reader that actually tries to apply an encoding to map the bytes to chars or vice versa. You can pretty much just grep your code for FileReader and FileWriter to find places where support for Unicode will break.

The javadoc for these classes isn't much help unless you're already aware of the issues. The FileOutputStream javadoc says "FileOutputStream is meant for writing streams of raw bytes such as image data. For writing streams of characters, consider using FileWriter. " This is misleading, since if you're naive to the issues with Unicode support, you might think that FileWriter will "just work" if your code expects to handle Unicode. The FileWriter javadoc says "The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable." If you know what that means, you're okay. But a more useful warning would be "This will almost never write anything other than American English correctly, so don't use it!". I say American English because, for example, the British pound symbol £ isn't included in ASCII.

Now, go and find all of the places in your code where this is broken and fix it.

In the next part, I'll discuss more about the default character set.

Unicode in Java: primitives and encodings (part 2)

In the last part, I discussed how Unicode is a consistent naming scheme for graphemes, how character encodings such as UTF-8 map Unicode code points to bits, and how fonts describe how code points should be visually displayed. In this part, I discuss the specific things you need to know about using Unicode in Java code.

Java primitives and Unicode

The two most commonly used character encodings for Unicode are UTF-8 and UTF-16. Java uses UTF-16 for char values, and as a result for Strings, since these are just an object wrapper for a char array. UTF-8 is most commonly used when writing files, particularly XML. UTF-16 stores nearly all characters as a sequence of 16 bits, even the ones that could be stored in only 8 bits (e.g., characters in the ASCII range). UTF-8 uses a variable-length encoding scheme that stores ASCII-range characters in 8 bits and other characters in 2 to 6 bytes, depending on the character. For example, the letter "a" (Latin small letter a, U+0061) is represented with 8 bits; "á" (Latin small letter A with acute, U+00E1) is represented with 16 bits, and our beloved snowman (☃) is represented with 24 bits. As I mentioned before, files encoded using ASCII can be read as if they were encoded using UTF-8, and files written using UTF-8 that only contain characters in the ASCII range can be read by Unicode-ignorant programs as if they were ASCII (usually). UTF-16 uses a similar variable-width encoding as UTF-8, but uses increments of 16 bits instead of 8.

From bytes to Strings

The character encoding describes how to map a byte array (byte[]) to a char array (char[]), and vice versa. Strings are just wrappers around char[]s, so this applies to Strings also. The important thing with the mapping is how it describes instances when more than one byte in the array maps to a single char value. This allows a char to represent any Unicode code point from U+0000 to U+FFFF. This range is known as the Basic Multilingual Plane and includes every language that a general-purpose Java application can be expected to support. If your app needs to support Cuneiform or Phoenician, you probably need to read something other than a blog post.

Encoding support

Every Java implementation must support US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, and UTF-16 (with byte order mark). US-ASCII and UTF-8 you should recognize. ISO-8859-1 is commonly referred to as Latin-1 and is usually used when only "Western European" languages needed to be supported. It's related to the Windows-1252 encoding used by default on older Windows OSes. UTF-16BE and UTF-16LE encode either as big endian or little endian, which will give a speedup for certain platforms. The default UTF-16 scheme includes the code point U+FEFF as the first two bytes of a document (called the byte order mark), the order of which determines if the rest of the document is big endian or little endian.

However, most Java implementations support a lot more. For instance, MacOS X Java 6 supports: Big5, Big5-HKSCS, EUC-JP, EUC-KR, GB18030, GB2312, GBK, IBM-Thai, IBM00858, IBM01140, IBM01141, IBM01142, IBM01143, IBM01144, IBM01145, IBM01146, IBM01147, IBM01148, IBM01149, IBM037, IBM1026, IBM1047, IBM273, IBM277, IBM278, IBM280, IBM284, IBM285, IBM297, IBM420, IBM424, IBM437, IBM500, IBM775, IBM850, IBM852, IBM855, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864, IBM865, IBM866, IBM868, IBM869, IBM870, IBM871, IBM918, ISO-2022-CN, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-KR, ISO-8859-1, ISO-8859-13, ISO-8859-15, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, JIS_X0201, JIS_X0212-1990, KOI8-R, KOI8-U, MacRoman, Shift_JIS, TIS-620, US-ASCII, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF-8, windows-1250, windows-1251, windows-1252, windows-1253, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258, windows-31j, x-Big5-Solaris, x-euc-jp-linux, x-EUC-TW, x-eucJP-Open, x-IBM1006, x-IBM1025, x-IBM1046, x-IBM1097, x-IBM1098, x-IBM1112, x-IBM1122, x-IBM1123, x-IBM1124, x-IBM1381, x-IBM1383, x-IBM33722, x-IBM737, x-IBM834, x-IBM856, x-IBM874, x-IBM875, x-IBM921, x-IBM922, x-IBM930, x-IBM933, x-IBM935, x-IBM937, x-IBM939, x-IBM942, x-IBM942C, x-IBM943, x-IBM943C, x-IBM948, x-IBM949, x-IBM949C, x-IBM950, x-IBM964, x-IBM970, x-ISCII91, x-ISO-2022-CN-CNS, x-ISO-2022-CN-GB, x-iso-8859-11, x-JIS0208, x-JISAutoDetect, x-Johab, x-MacArabic, x-MacCentralEurope, x-MacCroatian, x-MacCyrillic, x-MacDingbat, x-MacGreek, x-MacHebrew, x-MacIceland, x-MacRomania, x-MacSymbol, x-MacThai, x-MacTurkish, x-MacUkraine, x-MS932_0213, x-MS950-HKSCS, x-mswin-936, x-PCK, x-SJIS_0213, x-UTF-16LE-BOM, X-UTF-32BE-BOM, X-UTF-32LE-BOM, x-windows-50220, x-windows-50221, x-windows-874, x-windows-949, x-windows-950, x-windows-iso2022jp.

In the next part, I'll discuss using Readers and Writers with Unicode.

Unicode in Java: introduction (part 1)

The bad old days

A long time ago, things were much easier for programmers. The only computers anyone cared about were in the US, and these computers only needed to render "normal" letters like "a" and "Q". Then the internet came along, and we realized that there were all of these other people in the world that had other languages with crazy letters like ð and ß and བོ , and even symbols that represent entire words like 中 and 말.

Back then, most programmers only needed to worry about [0-9a-zA-Z], these were most commonly represented as ASCII. All of the characters were encoded as 7 bits and padded with one extra bit to make an 8 bit sequence, so only a total of 128 characters were represented.

Unfortunately, 8 bits can't represent the thousands of basic units of a language used throughout the world. We use the word grapheme to describe these basic units because they vary widely between languages. For example, in English this could be a letter like "A" and in Chinese it could be an ideograph like 中. Before Unicode, there were dozens of other schemes in common use that covered different subsets of the problem, but none of which provided a unified approach. For example, ISO 8859-1 and ISO 8859-2 were commonly used for Western European languages that use diacritics (commonly called "accented" characters); ISO 8859-7 for Greek; KOI-8, ISO 8859-5, and CP1251 for Cyrillic alphabets (e.g., Russian and Ukranian); EUC and Shift-JIS for Japanese; BIG5 for traditional Chinese characters (Taiwan); GB for simplified Chinese characters (China).

If you wanted to mix these together in the same text string, good luck.

Unicode to the rescue

To solve this issue, Unicode and series of encodings were created. Unicode is only a consistent way of naming the graphemes and does not describe how they should be encoded into a bit pattern.

Each Unicode character is referred to by a four digit number prefixed by "U+", so "A" is represented by U+0041 and described as "LATIN CAPITAL LETTER A", and U+2603 is "SNOWMAN" (not kidding: ☃). ASCII had so few characters that the description of which character is which and the bit encoding of the characters aren't separated. In Unicode they are, so you don't have to describe the Icelandic character ð as "that d with the slash in it", and can instead refer to it by a standardized code, U+00F0. It gets even messier when referring to some Asian languages that share what are essentially the same grapheme, but written in different ways (see Han unification). There are also a significant number of symbol-like things in Unicode, so the casual observer would not be able tell ☸ (wheel of dharma, U+2638) from ⎈ (helm symbol, U+2388). Unicode makes it very explicit which grapheme is which.

To reiterate, Unicode doesn't describe how the character should be represented in bits (encoded) nor does it describe what the character should actually look like when displayed. It's only providing a mapping between numbers (called code points) like U+0041 and U+2603 and abstract things, like English letters, Chinese ideographs, and snowpersons.

Character encoding

The next issue is, how to we physically store these Unicode code points as bits? This is referred to as a character encoding, and describes a mapping between the code points and a sequence of bits (although it probably should be referred to as grapheme encoding). In ASCII, each character is stored in 8 bits, but 8 bits limit the number of characters that can be represented to 256. To represent the thousands of Unicode code points, we need to have an encoding that uses more than 8 bits. However, we already have millions of files that are encoded in 8 bits with ASCII. Ideally we'd like our new encoding to be backwards compatible, so we don't have our legacy ASCII files garbled if they were read as if they were in our new encoding. This is where UTF-8 comes in.

UTF-8 is an encoding for Unicode code points, hence its acronym Unicode Transformation Format. UTF-8 is known as a variable-length encoding because some code points are represented by 8 bits and others by 16 bits (or more). The cool thing is that all of the characters which can be represented in ASCII have the same bit encodings in ASCII and UTF-8, so trying to read an ASCII-encoded file as UTF-8 will just work. Trying to read a UTF-8 encoded file as if it were ASCII (as many Unicode-ignorant programs do) results in characters encoded in 16 bits being read as if they were two 8 bit characters, so instead of a Chinese character, you get a capital Q and a ASCII beep.

UTF-16 is similar to UTF-8, but instead of encoding characters as multiples of 8 bits, all characters are encoded as multiples of 16 bits. The drawbacks here are that if the text primarily consists of characters in the ASCII range, it takes up twice the amount of storage space. Also, files which mostly contain mostly ASCII can't be read at all in editors which don't understand ASCII, rather than just incorrectly displaying characters outside of the ASCII range.


The final piece of this is fonts. A font describes how a character (code point) should be displayed on the screen. Useful fonts look like glyphs people recognize. Before Unicode was prevalent and we could use U+2620 to represent a skull and crossbones (☠), there were fonts like Wingdings that displayed a symbol in place of a letter. For example, "N" in wingdings is a skull and crossbones, but it's still (technically) an N, it's just no one would recognize it as such. It's very important to recognize the difference between the code point, the character encoding, and the font describing the visual display.

In the next part, we'll discuss how Unicode and character encodings are used in Java.

Additional Resources

Joel Spolsky's great intro to Unicode in general, which sounds a lot like this post
Jukka K. Korpela's tutorial on character code issues

How not to do ebooks and customer service

The ever-observant Pro JavaFX Platform. So far, the content of the book is very good. The problem I have is with it's packaging.

The main reasons to buy ebooks (e.g., PDF) over dead-tree books are that you can keyword search them and cut-and-paste the code examples. However, Apress (a subsidiary of Springer-Verlag) has decided to disable this. Most of the text can be cut-and-pasted, but the code samples cannot. None of the text is searchable.

When I contacted Apress about this, I got this courteous but useless response:

Hi Phil,

Thanks for contacting Apress Customer Support.

We're sorry about the inconvenience when trying to cut and paste code samples. One way around this though is to use the snapshot tool in Adobe Acrobat.

Using the snapshot tool, simply drag the tool around the area of code you want to cut and paste and take a snapshot and then you'll be able to past that code into word as an image.

Please contact us if you have questions concerning any of our other books.

Apress Customer Support

Since NetBeans has the new "paste an image to text feature", this is great (sarcasm). The decision to publish their books with these restrictions (including password protection) was an intentional one, so there should be some explicit policy on why this is that their customer support reps can refer to — otherwise they're stuck making useless recommendations like the above and discouraging customers from buying Apress ebooks in the future. But, it's probably not very appealing for them to explictly say "we don't trust our customers, so we've decided to annoy the honest ones."

I have a few Pragmatic Press books, which do allow full searching and cut and paste. Their model of ebook publishing is that they trust their customers and simply generate a PDF for you that has your name and email on each page. Totally non-intrusive to honest users, and enough to discourage marginally dishonest ones. A much better model, and one I hope Apress will soon adopt.

Snow Leopard: what I really like

Snow Leopard isn't the typical big leap most MacOS X versions have been, and I think it's smart marketing on the part of Apple to name it a variation on the previous version than waste a whole cat on it. Most of the features I don't really understand (Grand Central Dispatch) or care about (Exchange support), but three have really stood out for me:

Offline printing works! I typically have my MBP disconnected when I send something to print, and then plug it in later. This never really worked correctly in Leopard, with the vague "Operation error printer-not-supported-no-one-understands-this-message" popup. A combination of plugging and unplugging the printer and attempting to resume the print job usually did the trick. Now it just works — amazing!

Annotations in Preview Probably the reason 99% of people buy Acrobat Pro is so they can annotate PDFs. Now you can do it in Preview

Faster AirPort connecting I don't know if this is an update to AirPort or my new AirPort Express (hooked into my stereo, great!), but connecting is really fast now after opening my MBP, no more Try Again or waiting.

Overall, just a lot of fixes and polish. Well done.

Java: Localized number formatting

The other day, I had an NLS bug to respond to, and realized I didn't know how numbers were formatted for any locales other than English and French. Quick, to the JVM:

ar    3,141.59
   ar_AE    3,141.59
   ar_BH    3,141.59
   ar_DZ    3,141.59
   ar_EG    3,141.59
   ar_IQ    3,141.59
   ar_JO    3,141.59
   ar_KW    3,141.59
   ar_LB    3,141.59
   ar_LY    3,141.59
   ar_MA    3,141.59
   ar_OM    3,141.59
   ar_QA    3,141.59
   ar_SA    3,141.59
   ar_SD    3,141.59
   ar_SY    3,141.59
   ar_TN    3,141.59
   ar_YE    3,141.59
be    3 141,59
   be_BY    3 141,59
bg    3 141,59
   bg_BG    3 141,59
ca    3.141,59
   ca_ES    3.141,59
cs    3 141,59
   cs_CZ    3 141,59
da    3.141,59
   da_DK    3.141,59
de    3.141,59
   de_AT    3.141,59
   de_CH    3'141.59
   de_DE    3.141,59
   de_LU    3.141,59
el    3.141,59
   el_CY    3.141,59
   el_GR    3.141,59
en    3,141.59
   en_AU    3,141.59
   en_CA    3,141.59
   en_GB    3,141.59
   en_IE    3,141.59
   en_IN    3,141.59
   en_MT    3,141.59
   en_NZ    3,141.59
   en_PH    3,141.59
   en_SG    3,141.59
   en_US    3,141.59
   en_ZA    3,141.59
es    3.141,59
   es_AR    3.141,59
   es_BO    3.141,59
   es_CL    3.141,59
   es_CO    3.141,59
   es_CR    3,141.59
   es_DO    3,141.59
   es_EC    3.141,59
   es_ES    3.141,59
   es_GT    3,141.59
   es_HN    3,141.59
   es_MX    3,141.59
   es_NI    3,141.59
   es_PA    3,141.59
   es_PE    3.141,59
   es_PR    3,141.59
   es_PY    3.141,59
   es_SV    3,141.59
   es_US    3,141.59
   es_UY    3.141,59
   es_VE    3.141,59
et    3 141,59
   et_EE    3 141,59
fi    3 141,59
   fi_FI    3 141,59
fr    3 141,59
   fr_BE    3.141,59
   fr_CA    3 141,59
   fr_CH    3'141.59
   fr_FR    3 141,59
   fr_LU    3 141,59
ga    3,141.59
   ga_IE    3,141.59
   hi_IN    ?,???.??
hr    3.141,59
   hr_HR    3.141,59
hu    3 141,59
   hu_HU    3 141,59
in    3.141,59
   in_ID    3.141,59
is    3.141,59
   is_IS    3.141,59
it    3.141,59
   it_CH    3'141.59
   it_IT    3.141,59
iw    3,141.59
   iw_IL    3,141.59
ja    3,141.59
   ja_JP    3,141.59
   ja_JP_JP    3,141.59
ko    3,141.59
   ko_KR    3,141.59
lt    3 141,59
   lt_LT    3 141,59
lv    3 141,59
   lv_LV    3 141,59
mk    3.141,59
   mk_MK    3.141,59
ms    3,141.59
   ms_MY    3,141.59
mt    3,141.59
   mt_MT    3,141.59
nl    3.141,59
   nl_BE    3.141,59
   nl_NL    3.141,59
no    3 141,59
   no_NO    3 141,59
   no_NO_NY    3 141,59
pl    3 141,59
   pl_PL    3 141,59
pt    3.141,59
   pt_BR    3.141,59
   pt_PT    3.141,59
ro    3.141,59
   ro_RO    3.141,59
ru    3 141,59
   ru_RU    3 141,59
sk    3 141,59
   sk_SK    3 141,59
sl    3.141,59
   sl_SI    3.141,59
sq    3.141,59
   sq_AL    3.141,59
sr    3.141,59
   sr_BA    3.141,59
   sr_CS    3.141,59
   sr_ME    3.141,59
   sr_RS    3.141,59
sv    3 141,59
   sv_SE    3 141,59
th    3,141.59
   th_TH    3,141.59
   th_TH_TH    ?,???.??
tr    3.141,59
   tr_TR    3.141,59
uk    3.141,59
   uk_UA    3.141,59
vi    3.141,59
   vi_VN    3.141,59
zh    3,141.59
   zh_CN    3,141.59
   zh_HK    3,141.59
   zh_SG    3,141.59
   zh_TW    3,141.59

Code for generating it:

import java.text.NumberFormat;
import java.util.Locale;
import java.util.Set;
import java.util.TreeSet;

public class Class1 {
   public static void main(String[] args) {
       Set list = new TreeSet();
       for (Locale locale : Locale.getAvailableLocales())
           list.add(locale + "\t" + NumberFormat.getInstance(locale).format(3141.59));
       for (String s : list) {
           if (s.contains("_")) System.out.println("\t" + s);
           else System.out.println(s);

JavaFX: binding property values to anonymous function calls

Summary: When binding to anonymous functions in JavaFX, make sure you bind to the value of evaluating the function rather than the function itself.

One of the most useful aspects of JavaFX is property binding. This allows a more declarative description of how the various UI and model components interact, and frees the user from needing to manually maintain the syncronization between them. A powerful aspect of this is to bind a property value to a value derived from calling a function with other property values.

I'm working on a small cellular automata simulator (Game of Life), which involves displaying rectangles on the screen that have a state and a color, where the color is derived from the state. The code looks like:

function state2color(b:Boolean) { if (b) { Color.SEAGREEN; } else { Color.WHITESMOKE; }}
class Cell {
    var state:Boolean; 
    var color:Color = bind state2color(state);

Because 'color' is bound to the state2color function call, it's updated any time the value of 'state' is changed. However, state2color isn't used anywhere else, so I wanted to make it an anonymous function. The first thing I tried was this:

var color:Color = bind function() { if (state) { Color.SEAGREEN; } else { Color.WHITESMOKE; }};

but I got the error:

incompatible types:
found:    function():javafx.scene.paint.Color
required: javafx.scene.paint.Color

The problem here is that I'm trying to bind the 'color' property to the anonymous function, rather than binding the property to the value of _evaluating_ the anonymous function.

The simple fix is to add parens after the function declaration, indicating that it's the return value we want:

class Cell {
    var color:Color = bind function(){ if (state) { Color.SEAGREEN; } else { Color.WHITESMOKE; }}();

or alternatively:

class Cell {
    var color:Color = bind function(b){ if (b) { Color.SEAGREEN; } else { Color.WHITESMOKE; }}(state);

Java Map interface impl for composition

One of the mantras of object-oriented programming is that you should always favor composition over inheritance. However, it's always annoying when you want to implement Map but don't want to copy-paste-edit a bunch of methods that are just pass-through to the underlying map. This is code to do that, so I won't be tempted to just extend HashMap anymore.

  private final Map<String,String> m_map = new HashMap<String,String>();
  // begin Map impl
  public void clear(){ m_map.clear(); }
  public boolean containsKey(Object key){ return m_map.containsKey(key); }
  public boolean containsValue(Object value){ return m_map.containsValue(value); }
  public Set<Map.Entry<String,String>> entrySet(){ return m_map.entrySet(); }
  public boolean equals(Object o){ return m_map.equals(o); }
  public String get(Object key){ return m_map.get(key);}
  public int hashCode(){ return m_map.hashCode(); }
  public boolean isEmpty(){ return m_map.isEmpty(); }
  public Set<String> keySet(){ return m_map.keySet(); }
  public String put(String key, String value){ return m_map.put(key,value);}
  public void putAll(Map<? extends String,? extends String> m){ m_map.putAll(m) ;}
  public String remove(Object key){ return m_map.remove(key);}
  public int size(){ return m_map.size(); }
  public Collection<String> values(){ return m_map.values(); }
  // end Map impl

Books, January 2009

Thinking in Java by Bruce Eckel — I've literally been reading this book for 9 years. This was the book I learned Java from many years ago when I was a student, through the graciousness of Eckel making it available for free electronically. I accidentally bought two copies, so I now have one at work and one at home. I finally got around to reading the chapters on annotations, enums, inner classes, and generics.

Getting Things Done: The Art of Stress-Free Productivity by David Allen
— Mr. Allen, where have you been all my life? As he freely admits, a lot of the stuff in the book is common sense, but its the implementation that people get hung up on. I've started writing down everything I need to do, which alone has made me more productive with lower stress. Highly recommended.

The Little Schemer by Daniel P. Friedman and Matthias Felleisen — This series is one of the weirdest. The Socratic style is off-putting to some, including me, at least initially. Some of the text borders on cheesy, but I'm growing my appreciation of whimsy. I still don't understand they Y combinator, so I'm going to need to revisit this in a few months.

Practices of an Agile Developer: Working in the Real World (Pragmatic Programmers) (Jul 1, 2005) by Venkat Subramaniam and Andy Hunt — Another great book in the "Pragmatic" tradition. I was expecting a book with more on agile software development methods, but this is more individual things developers can do to increase their productivity. While I already do a lot of these, it's always good to be reminded of what you're naturally doing and why it works.

After The Gold Rush by Steve McConnell — Recommended for anyone interested how software development can become a true engineering discipline instead of the craft that it currently is in most of the industry.

Waltzing With Bears: Managing Risk on Software Projects by Tom DeMarco and Timothy Lister — From the authors who wrote "Peopleware. While the word "agile" isn't used in the book, a lot of the topics here strongly related to agile practices.

Bargaining for Advantage: Negotiation Strategies for Reasonable People 2nd Edition by G. Richard Shell — An excellent "soft skills" book. I've read "Getting to Yes" and "Getting Past No", and this book was an excellent complement to those two. Lots of concrete strategies for negotiation and guidance for what works in what situations.

Books, December 2008


Effective Java by Joshua BlochTHE book for honing your Java-fu. If you haven't read this, stop reading and go buy it now. There's so much good stuff in this book, I savored it over about three months.

Peopleware: Productive Projects and Teams by Tom DeMarco and Timothy Lister
— A classic in software development management. Much of what Joel Spolsky has used to build Fog Creek and which he writes about is derived from this book. A warning though, if you work in a cube for a large company, you will probably be tempted to quit :)

Groovy in Action by Dierk Koenig, Andrew Glover, Paul King, and Guillaume Laforge — Groovy is a fantastic language, built around the desire to be useful and productive. This is a good introduction from some of the project leads. I would like to have more content on Groovy's metaprogramming capabilities. The MarkupBuilder section of the book was very handy when I was re-implementing a mess of a Java/XSLT report generator, ending up with about half the lines of code and far greater readability.

The Productive Programmer by Neal Ford — Following in the tradition of "The Pragmatic Programmer", this book provides a bevy of ideas of how to improve your programming. The treatment of many of the subjects is relatively shallow, but everything from window launchers to code coverage is somehow hit. Ford also mentions specific software packages various tasks, which takes the great risk of quickly dating the book but makes it much more useful for the reader.

Not Recommended:

Differentiate or Die: Survival in Our Era of Killer Competition by Jack Trout and Steve Rivkin — I was hoping for a book on how to differentiate your product in the marketplace based on creating innovative products, but this is more "how to lie to people about your product when it's just like everybody else's." "It's Toasted" indeed.