Why does a tuning fork sound different than a piano, even if they’re playing the same note?

Why does a tuning fork sound different than a piano, even if they’re playing the same note?

The objective aspects of a sound like frequency and amplitude can be measured. A tuning fork and a piano may be playing exactly the note at exactly the same volume, and may be perceived by a listener to have the same pitch and loudness, but the two will never sound "the same" to a listener. The objective and subjective aspects of two sounds that distinguish them from one another is the timbre.

Listen to this recording of several different instruments all playing A440 and recognize how they all sound different:

Timbre is a combination of both the objective physical properties of a sound wave and the subjective psychoacoustic perception of the listener. Objective aspects are those that can be definitively measured, and are usually related to the physical propagation of the sound. There are several of these, but the two most important are:

  1. The instantaneous combinations of frequencies, including the fundamental tone and it's related harmonics (also known as partials or overtones)
  2. The change in the frequency and amplitude over time, typically referred to as the attack, decay, sustain, release (ADSR)

Any physical instrument is not only going to play the fundamental but also harmonics. These harmonics are frequencies in the sound that are integer multiples of the fundamental tone. For examples, the A above middle C has a frequency of 440Hz, so will generate harmonics of 880Hz (x2), 1320Hz (x3), 1720Hz (x4), and on and on. Theoretically they're infinite, but in most practical situations the higher harmonics are to soft to be heard or noticed over the louder lower harmonics.

For example, in this frequency graph of one instant of a Grand Piano playing A440, we see peaks at exactly these integer multiples:

Because all of these frequencies exist at the same time, they combine to form a complex waveform:

So instead of creating a simple sine wave, like a tuning fork or a single cathedral organ pipe does, the harmonics create complex waves that human ears perceive as "interesting".

In addition to the instantaneous frequencies, the waveform can change over time in the ADSR cycle — attack (when the note is first struck), decay (from the initial strike down to the sustain), sustain (the long part of the note), and release (when the note ends).

Attack and Decay

Sustain

Release

You can see the harmonics and the ASDR cycle demonstrated in this video. It looks best in 720p HD and full screen. Watch the frequency graph at the bottom of the video change over time:

These instruments were played in Garage Band, so it's simulating an instrument and is not exactly perfect, but the main point here is that even approximations of real instruments are extremely complicated. It's also interesting to note how some of the "recordings" have very different left and right stereo tracks, which is an attempt by the software instrument to sound more like a real instrument, even though a recording of an actual instrument would likely have exactly the same waveforms on both tracks.

In the audio recordings below, you can hear what sounds with different timbres sound like and compare their frequency graphs and waveforms during the sustain.

Grand Piano

Cathedral Organ

This is exactly the same timbre of a tuning fork, as one is a simple vibrating rod and the other is a simple vibrating column of air.

Grand Organ

Acoustic Guitar

French Horns

Clarinet

Analog Mono Lead

This one is actually very complicated, as the waveform changes in a cycle over a period of seconds.

Electric Buzz

Conclusion

A tuning fork sounds different than a piano because a tuning fork only has one fundamental note that has a uniform waveform throughout it's playing, whereas a piano has complex harmonics and great variation throughout it's attack, decay, sustain, and release, which makes their timbres very different. While it's very complex to analyze a sound wave to break it into it's combinations of frequencies and amplitudes, it's easy for most humans to hear even small differences in timbre. The timbre is determined both by the physical properties of the sound wave that we've described here and the perception of the listener. Timbre is what makes sounds interesting.

Resources

Multi-extends in generified types

In Effective Java, I came across a language construct I'd never seen before:

public class Foo<T extends List & Comparator> { 
    <U extends List & Comparator> void foo(U x) { }
}

This declares that T must extend or implement both List and Comparator. I've never had occasion to use this, but I can imagine it would be useful. The example Bloch gives in the book is when T is derived from one class and implements an interface.

Unicode in Java: some Groovy pieces (part 7)

One of the common tasks Java developers use Groovy for is testing. One of the common idioms I use is the create a list of strings and use the "each" method to assert that an output file contains them. When testing Unicode, this means both the output files and the Groovy source files contain Unicode characters. For example, the code may contain:

        def contents = new File(outputFile).getText("UTF-8")
 
       [ "D'fhuascail Íosa Úrmhac na hÓighe Beannaithe pór Éava agus Ádhaimh",
         'イロハニホヘト チリヌルヲ ワカヨタレソ ツネナラム',
         'เป็นมนุษย์สุดประเสริฐเลิศคุณค่า'
        ].each{ assertTrue(contents.contains(it), "${it} not in ${outputFile}") }

The first point is that we can no longer use the File#text method, we need to use the getText method that takes a character encoding scheme argument.

The second point is when Java or Groovy source files that contain Unicode characters, the specify what the encoding for those files is. In this case, we've saved our source files in UTF-8 encoding. As with JVM, javac and groovyc will default to using the platform default encoding if none is specified, which would give us odd errors when the non-printable ASCII characters that resulted from incorrectly decoding the UTF-8 where fed to the compiler.

When I call groovyc from Ant, this is code I use:

         <groovyc srcdir="." includes="com/example/**/*.groovy" destdir="${twork}" encoding="UTF-8">
            <classpath refid="example.common.class.path"/>
         </groovyc>

For more on Groovy and Unicode, Guillaume has an excellent post Heads-up on File and Stream groovy methods

Unicode in Java: bytes and charsets (part 6)

In this part, I'll discuss some of the lower-level APIs for converting byte arrays to characters and a bit more about the Charset and CharsetDecoder classes.

The string class has two constructors that will decode a byte[] using a specified charset: String(byte[] bytes, String charsetName) and
String(byte[] bytes, Charset charset). Likewise, it has two instance methods for doing the opposite: byte[] getBytes(String charsetName) and byte[] getBytes(Charset charset). It is almost always wrong to to use the String(byte[]) or byte[] getBytes() methods, since these will use the default platform encoding. It is nearly always better to choose a consistent encoding to use within your application, typically UTF-8, unless you have a good reason to do otherwise.

In the previous part, we used the Charset class to retrieve the default character encoding. We can also use this to retrieve the Charset instance for a given string name with the static method Charset.forName(String charsetName), e.g., Charset.forName("UTF-8"). In addition to String having methods that take either a string name of the encoding or the Charset instance, most of the Reader classes do too. In my previous examples I showed using the version where "UTF-8" is specified, but the better way would be to have a final static attribute that contains the value of Charset.forName("UTF-8") and use this. It eliminates the need to repeated look up the Charset and it prevents a type in the charset name from creating a hard-to-find bug.

The CharsetDecoder class is provided for when you need more control over the decoding process than the String methods provide. This definitely falls into the "advanced" category, so I'm not going to cover it here. Aaron Elkiss has a good writeup as does the javadoc

Unicode in Java: sample data (part 5)

When testing Unicode with your application, you need some examples. Most people don't have Thai or Katakana files sitting around, so finding test data is hard.

I've been playing around with JavaScript and JQuery recently, so I thought I'd build a small app that would render Unicode characters from a variety of languages in a variety of scripts. You can cut-and-paste the examples into your own test files, or since the HTML file contain the characters themselves (instead of the HTML escape codes), you could even use the file as as test data. It even has Klingon :)

unicode_app

Marcus Kuhn has a lot of good examples including "quick brown fox" examples in many languages (unfortunately Chinese is not among them).

Unicode in Java: Default Charset (part 4)

In this part, I will discuss the default Charset and how to change it.

The default character set (technically a character encoding) is set when the JVM starts. Every platform has a default default, but the default can also be configured explicitly. For example, Windows XP 32 bit (English) defaults to "windows-1252", which is the CP1252 encoding that provides for encoding most Western European languages.

The default charset can be printed by calling:

System.out.println(java.nio.charset.Charset.defaultCharset());

When the JVM is started, the default charset can be set with the property "file.encoding", e.g., "-Dfile.encoding=utf-8". Some IDEs will do this automatically, for example, NetBeans uses this property to explicitly set the charset to UTF-8. The drawback to this is that code that uses a class like FileReader that relies on the default encoding may work correctly when handling Unicode in the development environment, but then break when used in an environment that has a different default encoding. The developer should not rely on the user to set the encoding for the code to work correctly.

Also, one might think they could just alter the system property "file.encoding" programmatically. However, this cannot be set after the JVM starts, as by that time all of the system classes which rely on this value have already cached it.

In Linux/Unix, you can also set the LC_ALL to affect the default encoding. For example, on one Linux box I have, the default is US-ASCII. When I set "export LC_ALL=en_US.UTF-8", the default encoding is UTF8.

The environment variables LANG and LC_CTYPE will also have a similar affect (more here).

In summary, the default charset is used by many classes when a character set is not explicitly specified, but this charset should not be relied upon to work correctly when your application is supposed to handle Unicode.

Unicode in Java: Readers and Writers (part 3)

In the previous parts, I've discussed Unicode, encodings, and which encodings are used for Java internally. In this part, I'll discuss using Readers and Writers in a Unicode-compliant way. In short, never use FileReader or FileWriter. This is a particularly important thing to understand because I don't feel any of the Java books I have stated this explicitly enough so that I understood it until I encountered it in the field.

The various Reader and Writer classes in Java almost never to the correct thing by default. Not because they're not well-designed, but because it's largely up to the user to specify what "the correct thing" is. For example, FileReader and FileWriter will always use the default character encoding. This varies widely between platforms, for example, Windows XP 32-bit defaults to CP1252 (a variant of ISO-8859-1), many Linuxes default to US-ASCII, and MacOS X defaults to MacRoman. If you expect your users to input Unicode characters, this will always cause them to be garbled. It is possible to change the default character encoding (which we'll discuss later), but you shouldn't rely on your users to set their environments up in a certain way, particularly when your users are non-technical.

If your application has control over a set of flies, it needs to explicitly specify the character encoding and always use that encoding. Instead of using FileReader and FileWriter, you must use InputStreamReader and OutputStreamWriter with the constructors that take stream and a charset name string, e.g. "UTF-8". This is a bit confusing, since it is referred to as a "charset", even though it's technically a character encoding. Here is what the code should look like:

InputStream istream = ...;
BufferedReader reader = new BufferedReader(new InputStreamReader(istream, "UTF-8"));
 
OutputStream ostream = ...;
Writer writer = new OutputStreamWriter(ostream, "UTF-8");

If you're reading an writing files, you can use the FileOutputStream and FileInputStream implementations for the InputStream and OutputStream instances. The *Stream classes only read and write bytes, so it's the Reader that actually tries to apply an encoding to map the bytes to chars or vice versa. You can pretty much just grep your code for FileReader and FileWriter to find places where support for Unicode will break.

The javadoc for these classes isn't much help unless you're already aware of the issues. The FileOutputStream javadoc says "FileOutputStream is meant for writing streams of raw bytes such as image data. For writing streams of characters, consider using FileWriter. " This is misleading, since if you're naive to the issues with Unicode support, you might think that FileWriter will "just work" if your code expects to handle Unicode. The FileWriter javadoc says "The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable." If you know what that means, you're okay. But a more useful warning would be "This will almost never write anything other than American English correctly, so don't use it!". I say American English because, for example, the British pound symbol £ isn't included in ASCII.

Now, go and find all of the places in your code where this is broken and fix it.

In the next part, I'll discuss more about the default character set.

Unicode in Java: primitives and encodings (part 2)

In the last part, I discussed how Unicode is a consistent naming scheme for graphemes, how character encodings such as UTF-8 map Unicode code points to bits, and how fonts describe how code points should be visually displayed. In this part, I discuss the specific things you need to know about using Unicode in Java code.

Java primitives and Unicode

The two most commonly used character encodings for Unicode are UTF-8 and UTF-16. Java uses UTF-16 for char values, and as a result for Strings, since these are just an object wrapper for a char array. UTF-8 is most commonly used when writing files, particularly XML. UTF-16 stores nearly all characters as a sequence of 16 bits, even the ones that could be stored in only 8 bits (e.g., characters in the ASCII range). UTF-8 uses a variable-length encoding scheme that stores ASCII-range characters in 8 bits and other characters in 2 to 6 bytes, depending on the character. For example, the letter "a" (Latin small letter a, U+0061) is represented with 8 bits; "á" (Latin small letter A with acute, U+00E1) is represented with 16 bits, and our beloved snowman (☃) is represented with 24 bits. As I mentioned before, files encoded using ASCII can be read as if they were encoded using UTF-8, and files written using UTF-8 that only contain characters in the ASCII range can be read by Unicode-ignorant programs as if they were ASCII (usually). UTF-16 uses a similar variable-width encoding as UTF-8, but uses increments of 16 bits instead of 8.

From bytes to Strings

The character encoding describes how to map a byte array (byte[]) to a char array (char[]), and vice versa. Strings are just wrappers around char[]s, so this applies to Strings also. The important thing with the mapping is how it describes instances when more than one byte in the array maps to a single char value. This allows a char to represent any Unicode code point from U+0000 to U+FFFF. This range is known as the Basic Multilingual Plane and includes every language that a general-purpose Java application can be expected to support. If your app needs to support Cuneiform or Phoenician, you probably need to read something other than a blog post.

Encoding support

Every Java implementation must support US-ASCII, ISO-8859-1, UTF-8, UTF-16BE, UTF-16LE, and UTF-16 (with byte order mark). US-ASCII and UTF-8 you should recognize. ISO-8859-1 is commonly referred to as Latin-1 and is usually used when only "Western European" languages needed to be supported. It's related to the Windows-1252 encoding used by default on older Windows OSes. UTF-16BE and UTF-16LE encode either as big endian or little endian, which will give a speedup for certain platforms. The default UTF-16 scheme includes the code point U+FEFF as the first two bytes of a document (called the byte order mark), the order of which determines if the rest of the document is big endian or little endian.

However, most Java implementations support a lot more. For instance, MacOS X Java 6 supports: Big5, Big5-HKSCS, EUC-JP, EUC-KR, GB18030, GB2312, GBK, IBM-Thai, IBM00858, IBM01140, IBM01141, IBM01142, IBM01143, IBM01144, IBM01145, IBM01146, IBM01147, IBM01148, IBM01149, IBM037, IBM1026, IBM1047, IBM273, IBM277, IBM278, IBM280, IBM284, IBM285, IBM297, IBM420, IBM424, IBM437, IBM500, IBM775, IBM850, IBM852, IBM855, IBM857, IBM860, IBM861, IBM862, IBM863, IBM864, IBM865, IBM866, IBM868, IBM869, IBM870, IBM871, IBM918, ISO-2022-CN, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-KR, ISO-8859-1, ISO-8859-13, ISO-8859-15, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, JIS_X0201, JIS_X0212-1990, KOI8-R, KOI8-U, MacRoman, Shift_JIS, TIS-620, US-ASCII, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF-8, windows-1250, windows-1251, windows-1252, windows-1253, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258, windows-31j, x-Big5-Solaris, x-euc-jp-linux, x-EUC-TW, x-eucJP-Open, x-IBM1006, x-IBM1025, x-IBM1046, x-IBM1097, x-IBM1098, x-IBM1112, x-IBM1122, x-IBM1123, x-IBM1124, x-IBM1381, x-IBM1383, x-IBM33722, x-IBM737, x-IBM834, x-IBM856, x-IBM874, x-IBM875, x-IBM921, x-IBM922, x-IBM930, x-IBM933, x-IBM935, x-IBM937, x-IBM939, x-IBM942, x-IBM942C, x-IBM943, x-IBM943C, x-IBM948, x-IBM949, x-IBM949C, x-IBM950, x-IBM964, x-IBM970, x-ISCII91, x-ISO-2022-CN-CNS, x-ISO-2022-CN-GB, x-iso-8859-11, x-JIS0208, x-JISAutoDetect, x-Johab, x-MacArabic, x-MacCentralEurope, x-MacCroatian, x-MacCyrillic, x-MacDingbat, x-MacGreek, x-MacHebrew, x-MacIceland, x-MacRomania, x-MacSymbol, x-MacThai, x-MacTurkish, x-MacUkraine, x-MS932_0213, x-MS950-HKSCS, x-mswin-936, x-PCK, x-SJIS_0213, x-UTF-16LE-BOM, X-UTF-32BE-BOM, X-UTF-32LE-BOM, x-windows-50220, x-windows-50221, x-windows-874, x-windows-949, x-windows-950, x-windows-iso2022jp.

In the next part, I'll discuss using Readers and Writers with Unicode.