Over muziek  Ogg Vorbis...  Ogg Vorbis for President? | An introduction to compressed audio with Vorbis | De proef op de som... |  
 
 

Ogg Vorbis for President?

Het heeft inderdaad veel van een verkiezingsstrijd... En 'Ogg Vorbis' is daarbij het Open Source audio compressieformaat dat staat voor vrijheid. De vrijheid van patenten, welke de andere kandidaten, waaronder mp3 en Microsoft's WMA, zozeer kenmerken. Zoals waarschijnlijk bekend ontstond het razendpopulaire 'mp3' oorspronkelijk als een onderdeel (de 'layer 3') van MPEG-1, een gezamenlijke inspanning van een aantal bedrijven om een krachtig video-compressieformaat te ontwikkelen. Hoewel de bijdragen gepatenteerd werden, gold het mp3-formaat lange tijd toch als 'free', totdat het zo ingeburgerd raakte dat de ontwerpers, het Fraunhofer-Instituut en Thomson, het grote geld roken en licenties gingen afdwingen.

Open Source-compressie

Dit was voor een groep Open Source-ontwikkelaars aanleiding om van de grond af een nieuw multimedia formaat te definiëren waarin geen enkel patent voor zou komen. Het grote voordeel was daarbij dat natuurlijk ook van de nieuwste inzichten gebruik kon worden gemaakt, waardoor de kwaliteit ten opzichte van het alweer 13 jaar oude mp3 een stuk hoger kon zijn. 'Ogg' staat voor het algemene formaat dat op zich verschillende componenten kan omvatten, waarvan 'Vorbis' dan het 'lossy' audiogedeelte betreft. Hierbij gaat dus net als bij mp3 weliswaar een deel van de geluidsinformatie verloren, maar door gebruik te maken van de psychologische kenmerken van ons gehoor zul je dit nauwelijks kunnen horen.

Op de site vorbis.com wordt nauwkeurig verslag gedaan van alle vorderingen, want Ogg Vorbis is, zoals alle Open Source-projecten een werk in uitvoering waarbij je steeds toegang hebt tot de laatste ontwikkelingen. De huidige versie is 1.0 en deze bestaat zowel in een Windows, een Mac en een Linux-uitvoering. Zelfs voor het inmiddels om zeep geholpen multimedia-besturingssysteem BeOS bestond een goede portering. Het is die platform-onafhankelijkheid welke veel ontwikkelaars, waaronder die van spelletjes, aantrekt. En de financiële eisen van licentie-houders zijn in principe natuurlijk onvoorspelbaar en kunnen van het ene op het andere moment verveelvoudigen.

Sites met nadere informatie over Ogg Vorbis

Informatief artikel in Linux Journal
Interview met Chris Montgomery, een van de eerste ontwikkelaars van Ogg Vorbis
Interview met Jack Moffitt, een mede-ontwikkelaar
Wat is Open Source?

Naar de index
Met toestemming van de auteur hebben we het volgende artikel vab Graham Mitchell opgenomen, omdat het een duidelijk beeld schetst van de principes waarop audiocompressie in het algemeen berust en de plaats van Ogg Vorbis in dat geheel. Daarbij wordt ook helder ingegaan op de technische overwegingen die bij het opzetten van het formaat een belangrijke rol speelden.

An introduction to compressed audio with Vorbis

By Graham Mitchell

Music is made up of waves. When a violin player bows a string, the string vibrates at a certain frequency and creates a sound wave, which travels through the air, hits your eardrum and causes it to vibrate. Your brain interprets the signals coming from your eardrum and "hears" a sound.

Likewise, everything else you can hear is because something is vibrating and creating sound waves. In a trumpet, it's a column of air. With an electric guitar, the vibrating strings send a signal through the amplifier, which causes a speaker cone to vibrate in the same manner as the original string. When you speak or sing, it's your vocal cords vibrating. All of these things generate sound waves.

The properties of these waves affect how they sound. The frequency of a wave refers to how many times per second the wave transitions from its highest point to its lowest point and back again. This is typically measured in hertz (Hz), or number of cycles per second. The frequency of a wave determines its pitch. High frequency waves have a high pitch, and low frequency waves have a low pitch. The average human can hear frequencies from 15 or 20 Hz to roughly 20,000 Hz (20 kHz).

The amplitude of a wave refers to half the distance between a wave's highest point and its lowest. The larger the amplitude of a wave, the louder its volume, which is typically measured in decibels (dB). The decibel range for human hearing is complicated and depends on the frequency of the sound in question, but roughly ranges from 0 to 120 dB, with each change in 10 dB corresponding to a doubling of the volume.

Digital Audio

As early as World War II, engineers were experimenting with digital audio, converting the analog waves of sound into discrete values. This was accomplished by "sampling" the sound wave many times per second, with each sample recording the amplitude of the wave at that point (including whether the wave was "up" or "down"). By the Nyquist Theorem, the sample rate (number of samples per second) must be at least twice as high as the highest recorded frequency to prevent weird artifacts in the recording.

So, in the 1970's, when Philips and Sony began looking for a way to improve audio quality for recorded music, they turned to digital sampling. A sample rate of 44,100 samples per second (44.1 kHz) was chosen because it exceeded the target sample rate of 40 kHz (twice the highest frequency humans can hear, 20 kHz) and because that's how much information could be stored on a video tape, the storage medium of choice until the little silver plastic discs we know as CDs were perfected.

Each "sample" is a 16-bit number, ranging from -32,768 to 32,767. This number indicates the amplitude of the wave at the instant of sampling. Thus a sampled wave oscillating back and forth from -32,768 to 32,767 would be the loudest wave this format could represent, a wave changing from -1 to 1 would be the quietest, and a bunch of zeroes in a row would indicate complete silence. This range of values for the amplitude is fairly fine-grained, which allows even subtle volume differences to be accurately represented. Sampling audio in this digital fashion is known as Pulse Code Modulation (PCM), and is the most popular method of digital sampling.

PCM digital audio produces quite an accurate picture of the "live" sound, and only the keenest listeners with good equipment can distinguish between it and the original.

The Size Problem

It is possible (and fairly easy) to "rip" the audio data from a CD and store it into "WAV" files on a computer, and these files can be played back on demand. So ideally, you'd want to hear your music at this quality everywhere, since it's the highest quality you can typically purchase. You'd want copies of your music, at this quality, in your car, on your computer, in your portable music player, and in your stereo. Why is this not currently feasible? The answer is size.

A little math can reveal the space required to store sound information at this quality. Each sample is 16 bits, or two bytes. There are 44,100 samples each second, and since modern music is recorded in stereo, there is both a left and a right channel. This results in ( 2 * 44100 * 2 ) = 176,400 bytes to store one second's worth of samples. This means 10,584,000 bytes or approximately 10 megabytes to store just one minute of CD-quality audio. This may not sound too alarming, given many have hard drives holding tens and even hundreds of gigabytes, but it adds up quickly.

My personal music collection currently consists of 1307 songs on 102 different albums (a couple of these are double albums). The total playing time of all 1307 songs combined is 5,423 minutes and 23 seconds (over three days and eighteen hours) and so would require an estimated 53 gigabytes of hard drive space to store in perfect CD-quality!!! With a little cash, a personal computer could have that much storage for now, but most portable music players have less than 1% this much space.

As video DVDs with "surround sound" audio become more popular, this will only become more of a problem: such audio typically contains 5 channels (left, right, left rear, right rear, and center), nearly tripling the space requirements! And for DVD Audio discs it's even worse: up to six channels with 24-bit samples at 96 kHz, requiring almost ten times the space!

Clearly, for the near future (at least until portable music players have hundreds of gigs of storage), you won't be able to carry around your entire music collection.

Or can you?

Fortunately, there is a solution. Compression is the technique of making a file take up less space while still containing the same information. There are two categories of compression: lossless and lossy.

Lossless Compression

Lossless compression means that the compressed, smaller file can be expanded back into the original file without losing any information whatsoever. That is: take a file; compress it, and uncompress it again. If the original file is bit-for-bit identical, 100% of the time, for any given input file, then the compression scheme is lossless. No information is lost.

Unfortunately, compressing audio losslessly is hard. General-purpose compression programs like WinZip and gzip only manage about 5% on average. Even "next-generation" utilities like WinRAR and bzip2 only manage a few percent more.

There are special-purpose compressors (like flac) which were designed solely for losslessly compressing audio, but even they only manage about a 50% reduction in filesize on average. While this is enough for some, for music files to be truly portable they must be even smaller.

Lossy Compression

Lossy compression is any compression which causes information to be lost. Compressing and then uncompressing a file results in something similar, but not identical, to the original file. This is no good for programs, which must be interpreted by a computer, but is often just fine for humans. The trick is to remove little bits of information in places where it can't be perceived.

Lossy audio compression works using a psychoacoustic model. That is, by modeling how your ears (and your brain) hear sound, it is possible to find places to remove information that you wouldn't have perceived anyway. A full treatment of these techniques is beyond the scope of this document, but here are two simple examples:

Though humans can technically hear tones up to 20 kHz in pitch, most can't hear anything above 15 kHz, especially when other sounds are present. However, most CD-quality audio contains information for reproducing these tones anyway. By filtering out tones outside this range, you reduce the amount of information that has to be stored without affecting the perceived sound quality. (And even humans that can hear such tones wouldn't have heard them anyway on cheap computer speakers unable to produce such frequencies.)

Similarly, if a piece of music contains a loud bass drum hit (such as most rock and roll, a couple of times each second), the eardrum is too busy reacting to the percussive hits of the drum to register any other sounds at all for a few milliseconds. By simply omitting the samples immediately after such sounds, less information can be stored while still maintaining the same perceived sound quality. (Note: this is merely an example. I am not aware of any encoder which actually does this.)

Using sophisticated techniques such as these, lossy audio compression formats such as Ogg Vorbis and mp3 can achieve results which are provably indistinguishable from the original, CD-quality sound but are a mere 10 to 20% of the size.

And what's even better, being more aggressive with these techniques can result in files which are less than 5% of the original size but still sound quite good on normal equipment (think FM radio quality).

A Bit On Bitrates

The ultimate size of such files is driven by their bitrate. That is, how many bits does the compressor (a.k.a. encoder) use to represent each second of audio. Actual, uncompressed, CD-quality uses 176,400 bytes or 1,411,200 bits to store each second. This is roughly 1411 kilobits per second, or 1411 kbps. Typical lossy formats would only use anywhere from 64 to 256 kbps to store the "same" information.

The problem is that bitrates only speak to the size of the file, not its quality. For example, one could write a compression format that achieves a 256 kbps bitrate by taking only the first 256,000 out of every 1,411,200 bits (18%) in any given second. Although some foolish people might assume a song encoded in this format would sound better than a typical 128 kbps mp3, any listening test would be able to easily prove the inferiority of such a technique.

The mp3 format, developed by Fraunhofer and Thomson, is a heavily-patented format and was ground-breaking in its time. Because it was the first widely-adopted lossy audio compression codec, people associate certain bitrates with certain levels of quality.

However, even within the aging mp3 format, and even within a single bitrate (say, 128 kbps), the sound quality of various encoders varies drastically. The Xing encoder is fast but produces poor-sounding files even at 128 kbps. The Lame encoder is a bit slower but produces markedly better-sounding files at the same bitrate.

Newer lossy audio compression codecs like WMA and Ogg Vorbis use different psychoacoustic models and noticeably improve sound quality at a given bitrate even over the best mp3 encoder.

CBR? VBR? ABR?

To further muddy the waters, not even bitrate alone will tell the whole story. Early mp3 encoders (and most to this day) used what is called an "average bit rate" (ABR). If, for example, we encode a file at 128 kbps, then the encoder will use 128 kilobits to encode each second of the song, no matter what. So the first measure (consisting of perhaps two drum clicks) will use 128 kilobits and will represent that second nearly exactly. On the other hand, halfway through the song, where the lead guitar is ripping into a solo, the drummer is going crazy on the cymbals and the bass guitar is playing a funky groove, the encoder will still have to use 128 kilobits to encode that second, where it could have used, say 300. Thus that second will be represented rather poorly.

Newer mp3 encoders (like Lame) support what is called "variable bit rate" mode (VBR). This gives the encoder the freedom to save bits on simple sections that don't need as many bits to represent them well and thus have some "extra" bits left over to use for sections that really need them. This usually results in files which are slightly smaller than ABR files even at the same target bitrate, but which sound much better in the busy sections.

Occasionally you may see mp3s referred to as CBR, for "constant bit rate". This would mean that every sample of the encoded file must use exactly the same amount of bits. In reality, mp3 uses a bit reservoir to average bit rates over a small period of time, so technically ABR is being used. It is unlikely that any compressed audio formats use true CBR.

Unfortunately, though some newer mp3 encoders do support VBR, some portable hardware mp3 players can't play such mp3s. And even when their encoders and players support this mode, many people don't use it for whatever reason. (Habit? Ignorance?)

Almost every lossy audio compression codec newer than mp3 supports VBR, though some don't turn it on by default.

Just Say No To Bitrates

Testing by Fraunhofer and Thomson found that for mp3s, 256 kbps was true "CD-quality"; that is, their sound engineers could rarely tell the difference between mp3s encoded at that rate and the original CDs. These files were roughly 20% of the original filesize, but virtually indistinguishable in quality.

Since then, 128 kbps mp3s have become the "standard". Although most people with good equipment can hear the difference, they still sound good enough for the average listener. These files are roughly 10% of the original filesize, and are the source of the 1 megabyte per minute rule stores quote for determining how much music you can fit on a given portable mp3 player.

People with dull ears, bad equipment, or an irrational desire to fit twice as much music on a portable player encode mp3s at 64 kbps. These files sound not much worse than FM radio, and are a mere 5% of the original filesize.

The problem with these rules of thumb is that they only work for mp3s. There are now many formats newer than mp3 which all improve on sound quality. And, as more sophisticated psychoacoustic models are developed, the difference in sound quality between one bitrate for mp3 and the same bitrate for a different format will only continue to widen.

For example, a file encoded in the Ogg Vorbis format at quality 3 typically results in an average bitrate of 112 kbps but sounds better than an mp3 at 128 kbps and often as good as an mp3 encoded at 160 kbps.

For this reason, Ogg Vorbis discourages users from trying to achieve certain bitrates when encoding and instead concentrates on sound quality. In fact, the Ogg Vorbis format encoders don't normally consider bitrate at all (the default mode of operation is VBR), instead using a "quality" rating, which ranges from -1 to 10 in increments of 0.01 or so. This quality rating is a measure of how close to the original the compressed file should sound; the encoder uses as many or as few bits as necessary to satisfy the quality requirement. Each quality setting results in a rough average bitrate for a piece of average music, but this is a by-product of how the encoder has been tuned; the encoder does not aim at any particular bitrate.

The default quality setting is 3, which should be fine for the average user since it gives sound quality better than a 128 kbps mp3 but is over 10% smaller. Someone wanting sound almost identical to a 128 kbps mp3 can usually get by with quality 2, which sounds as good but is 25% smaller.

The rest of this document assumes that you will be encoding music using the Ogg Vorbis format.

Why Ogg Vorbis?

Ogg Vorbis is a good choice because the sound quality is among the best of the newest formats out there. Recent double-blind listening tests put Ogg Vorbis among the highest quality of all the "second-generation" compressed audio codecs. This means you either save space and get the same quality, get higher quality for the same space requirements, or a combination of both (i.e. a little smaller and a little better-sounding).

Secondly, Ogg Vorbis is not only Open Source (BSD license), but is completely patent-free. This means that hardware manufacturers wanting to support Ogg Vorbis in their portable music players can do so without paying license fees, unlike most other formats. Software developers can use the Ogg Vorbis format for music/sounds in their games without having to get permission from some powerful company and without paying royalties. And the open nature of the code for the format means that many people have the freedom to port the tools to many other systems or add features, fix bugs and improve the code if they so desire. In fact, the BSD license allows for developers to modify their code to suit their own needs, and they don't even have to publish their changes! Most other formats are heavily patented and tightly controlled.

Finally, the format is well-designed to have several features some of the others don't. Those familiar with id3 tags for mp3 files will be well aware of their limitations; Ogg Vorbis features a flexible tagging standard which allows complete customization of tags for a given file, including user-defined tags (like "Remixed by" or whatever you like).

Ogg Vorbis files support "bitrate peeling", which means you can produce a lower bitrate file from a higher bitrate file without re-encoding and at the same quality as if you'd encoded the file directly into the lower bitrate from the original file. No other lossy audio codec currently supports this. (N.B. Though all current files are peelable, tools to do so have yet to be written. Look for them soon.)

And Ogg Vorbis files are not limited to merely two channels of audio (left and right). They support up to 256 distinct channels, and thus are a natural fit for encoding the 6 channels of DVD audio alongside your DivX ;-) video.

Just to be clear, strictly speaking, the name "Ogg" refers to a generic container format which could hold many types of multimedia files (lossy compressed audio (Ogg Vorbis), lossless compressed audio (Ogg Flac), lossy compressed video (Ogg Tarkin), etc). "Vorbis" is the lossy compressed audio codec which is typically transported in Ogg files. "Ogg Vorbis" refers to both parts together: an Ogg format file containing audio compressed using Vorbis. And throughout this document, I've used the term "ogg" to generically mean "a file containing compressed audio in Vorbis format" since that's the file's extension, just like I've used "mp3" to mean "a file containing compressed audio in MPEG layer 3 format".

What Quality Should I Use?

To find out what quality level you should use to encode, you'll need to do some listening tests. First, get one of your CDs and use a CD ripper (like EAC) to rip a track or two into a WAV file on the hard drive. This will take up some space, as mentioned before. If you're comfortable with command-line tools, use oggenc to encode as specified below. If not, use a GUI tool like OggDrop.

Encode the test tracks using the default settings.

	oggenc Track01.wav

This will be VBR, quality 3. Listen to the encoded tracks and decide how they sound to you. Do they sound good? Any complaints? If you think they sound fine, then encode all your music using the default settings and don't even worry about bitrate, quality or anything. Your ears are the only metric that really matters.

If the files don't sound good to you, or if they do and you're just curious how much better they could sound, then try encoding them again at quality 4. Adjust the slider in OggDrop or issue the following command at the command line:

	oggenc -q 4 Track01.wav

Then listen to these versions. Can you tell a difference in sound quality? If not, then there's no reason not to encode at the default settings. If you can't hear the difference, you're only wasting space by encoding at a higher quality.

If you can hear the difference, is the increase in quality enough to justify the increase in filesize? If so, then perhaps keep increasing the quality by 0.5 or so until you can't hear the improvement from the previous setting. Technically, the sound quality does continue to improve all the way up to quality 10, but almost no one can hear any differences after quality 7.

If you're serious about sound quality, you might want to use ABX for your listening tests. ABX is a testing methodology that allows you to determine conclusively and repeatably if you are really hearing differences in two sound files. The PC ABX home page is a good place to get started.

Also, be advised that normal Ogg Vorbis files use lossy channel coupling, meaning that redundancies between the left and right channels are combined to save space. This does keep the files smaller, but also means that technically the stereo image of an Ogg Vorbis file might not always be identical to the original stereo image. If this concerns you, you'll want to encode at quality 6 or higher, which is where the lossy channel coupling is turned off and all channel coupling is lossless. Most can't tell the difference, but maybe you can.

On the other hand, if file size is more important to you than sound quality, try lowering the quality until the decrease in quality becomes significant enough for you. Some Ogg Vorbis users (with happily dull ears) report that they can't tell the difference between the CD and an Ogg at quality 0! These people are saving quite a bit of space and still listening to music with a sound quality acceptable to them.

For those who want to listen to their music at home (where hard drive space is plentiful) and on a portable device where space is at a premium, encode in the higher format. Ogg Vorbis files can always be "peeled" later to get the lower quality versions from the higher quality ones.

A few people are using Oggs for streaming over the web. For these endeavors, variable bit rate is not acceptable because though the bitrate averages properly, bitrate spikes can exceed bandwidth requirements. A CBR mode and even maximum and minimum bitrate settings exist in Ogg Vorbis, but no details are given here because the techniques used to produce such files always result in worse-sounding files than a file of the same size using the default settings. The guarantees on bandwidth come at a price.

For the average user, if you are encoding Ogg Vorbis files using any encoding settings other than "-q n", you are getting lower quality files than you could be at a given size.

A Note About "Transcoding"

Some people have a lot of music in mp3 format, but do not have the original CDs (cough). Others have the CDs, but spent months ripping and encoding all of them into mp3 format and don't want to go through the trouble again. Such people are often tempted to take their mp3s, decompress them into WAVs, and re-encode them into Ogg Vorbis files. Some have even gone so far as to create tools to automate this process.

If you care about sound quality, you should never, ever do this. Ogg Vorbis uses similar but different techniques to remove information, and by transcoding, you lose information twice. Similar to faxing a photocopy of a fax, the "transcoded" ogg will always sound worse than even the original mp3.

Besides, for most users this isn't necessary, as almost every player that supports oggs supports mp3s as well, and your mp3 collection can peacefully co-exist with your growing ogg collection. In fact, the only compelling reason to get rid of existing mp3s is ethical, not technical: either your mp3s were illegal copies or perhaps because you don't want to support a patented format.

copyright 2002 by Graham Mitchell
Last updated: Monday, 2002-09-22 at 18:44 CDT

Naar de index

De proef op de som...

Je kunt natuurlijk eindeloos over technische details blijven doorzagen, maar bij muziek gaat het er uiteindelijk toch om hoe het klinkt. En dit geldt zeker bij zoiets als compressie, dat voor een groot deel berust op de kenmerken en eigenaardigheden van ons gehoor. Het lijkt simpel: zet een aantal proefpersonen bij elkaar en laat ze een aantal neutraal genummerde muziekstukken horen die met verschillende codecs zijn gecomprimeerd en dit ook in verschillende bitrates. Dat zoiets toch best een hoop gedoe betekent laat een zeer informatief artikel in het Duits/Nederlandse tijdschrift C'T van december 2002 zien.

Het punt is dat de toegepaste compressiemethoden ook voor verschillende soorten muziek vaak een andere score laten zien. En dat lagere bitrates van zo'n 64 kb/s de beoordelingen ten opzichte 128 kb/s weer een andere verdeling kunnen vertonen. Ook de affiniteit van een luisteraar met een bepaald genre kan een rol spelen plus het weergavemedium zoals hoofdtelefoon of luidspreker. Opvallend is wel dat waar mp3 en vooral mp3PRO er soms niet best vanaf komen, Ogg Vorbis nagenoeg altijd in de voorste regionen scoorde, vooral bij de lagere bitrates. Soms merkwaardigerwijs nog beter dan het ongecomprimeerde Wav-formaat.

Behalve door een gevarieerde groep van experts, van scholier tot geluidstechnicus en operazangeres, werd de luisterproef ook online uitgevoerd door ruim 3500 gebruikers en hoewel de resultaten uiteen liepen ontstond daar in grote lijnen toch eenzelfde beeld. Hoewel het natuurlijk een momentopname betreft - want de meeste codecs blijven voorlopig in ontwikkeling - is dit een uitstekend test en ook door de zeer leesbare technische uiteenzetting van de verschillende formaten (in totaal 13 pagina's) was dit een gedegen artikel met een onverwachte uitkomst. Helaas is nabestellen ervan niet meer mogelijk.

Naar de index