dither - produce New Media

The vast majority of Podcast producers are not using multi- thousand dollar Neumann mics and/or highly efficient preamps in acoustically treated environments …

When recording (spoken word) audio via mic input, the noise floor is perceived as the level of ambient noise and residual preamp noise – NOT the system noise. Any such mic input will exhibit a higher perceived noise floor with a reduced SNR compared to a much more efficient DI or electronic instrument.

Consider the quantified theoretical dynamic range of 16 bit audio (96 dB). When recording with a mic in a typical environment – your system is incapable of effectively utilizing the full dynamic range of 16 bit audio due to the noted (elevated) perceived noise.

When producing Podcast audio, wide dynamics capabilities are irrelevant. In fact persistent wide dynamics in spoken word audio intended for Internet/Mobile/Podcast distribution will compromise intelligibility.

With all this in mind, what is the advantage of recording 24 bit (spoken word) Podcast audio with a theoretical dynamic range of 144 dB vs.16 bit audio? In my view there is no advantage, especially when proper down conversion techniques such as Dithering are for the most part ignored. An omission as such will compromise the sonic attributes of down converted audio derived from higher resolution source masters.

Are you striving for an efficient Podcast production workflow with excellent fidelity and adequate frequency response? 44.1 kHz (or 48 kHz) • 16 bit audio will be sufficient. Of course there will be optimization variables and requirements such as quality of gear, optimal recording levels, and ample headroom.

Notes:

– If you are producing highly dynamic episodic dramas, fine arts content, or complex narratives with music and sound effects elements – and you prefer to work with 24 bit media … by all means do so.

– When down converting from 24 bit to 16 bit in preparation for distribution, recognize the significance of Dithering.

– Be aware of MP3 codec filtering attributes, inherent frequency response limitations, artifacts, and the consequences of low bit rate encoding.

– Applying a low-pass filter to lossless audio prior to lossy encoding is recommended. Such a roll-off will effectively supply the lossy encoder with managed high frequency activity that is below the codec’s filtering threshold.

-paul.

In a professional workflow Dither will be applied to audio clips (or mixes) when reducing word length. This process will mitigate errors that occur due to the subtraction of digital audio bits. I thought I’d cover the basics.

Digital Audio

Digital Audio incorporates individual samples consisting of bits created by the process of Quantization. This is essentially the conversion of a continuous, linear range of values present in analog audio into a fixed range of discrete values. Bit Depth (a.k.a. Word Length or Resolution) represents the number of bits stored in a sample’s measure of amplitude. It indicates the extent of inherent vertical precision. Higher bit depths (or bits per sample) encompass improved vertical dynamic resolution resulting in an extended Dynamic Range.

1 bit = 6dB of Dynamic Range. Theoretically 16bit audio has a quantified Dynamic Range of 96 dB. 24 bit audio has a quantified Dynamic Range of 144 dB. However, in order to accurately assess Dynamic Range we must also recognize the amplitude of the highest spectral component of the inherent noise floor. Specifically, where it resides relative to the maximum Peak value that a system is capable of reproducing. Dynamic Range is the measurement of this ratio or range.

Signal to Noise Ratio (SNR) is the quantified range between the nominal average signal level and the average level of the noise floor. Audio with an extended Dynamic Range will exhibit a higher SNR compared to audio with a reduced Dynamic Range. In essence 24 bit audio will allow you to work with additional headroom without any increase in noise compared to 16 bit audio.

Word Length Reduction

Truncation is the removal of bits with no compensating replacement. The repositioning of samples after converting to a lower resolution creates Quantization Errors resulting in audible artifacts and distortion. Dither is technology that adds minimal perceived noise to audio before word length reduction. This noise will mitigate (mask/remove) the audibility of distortion caused by Quantization Errors. The process preserves fidelity and Dynamic Range of audio throughout bit-depth conversion and/or bit-depth reduction exporting.

There is a trade off: you are replacing bad noise with alternative “good” noise that is smoother, less audible, and much more consistent.

Noise Shaping is a supplemental option that pushes noise into frequency ranges that are less audible to humans, thus allowing greater Dither with reduced perceptual noise.

(Take a look at the Noise Shaped frequency response curve in the attached image. There is a clear visual indication of increased gain at higher frequencies).

Podcasting

So what does this all mean for the typical Podcast Producer? Is Dither just another obscure aspect of professional Audio Mastering and/or Post Production that can be safely ignored?

Consider the following variables:

If you are recording spoken word using properly configured gear in a reasonably quiet and optimized environment – there is no discernible advantage recording 24-bit audio in preparation for 16-bit encoding and delivery. In my opinion 16-bit audio from acquisition to distribution will be more than adequate.

If you elect to record 24 bit audio, and you are not properly implementing word length reduction to 16 bit, you are essentially nulling the advantages of the original higher resolution audio. In essence fidelity degradation (artifacts/distortion) will occur due to the absence of efficient error masking. This is not my opinion – it is a fact.

Remember, I’m specifically referring to spoken word audio slated for Podcast distribution. If you are tracking music, well then by all means make full use of the advantages of higher resolution audio recording.

Consider this: The stand-alone version of iZotope’s Ozone 8 Mastering Suite processes all imported audio to 32 bit word length. The manual specifically states:

“Ozone processes files at 32-bit so Dither is desirable for files being exported to values lower than 32-bit …

… When exporting to a bit depth lower than 32-bit, checking this (Dither option) box will apply high-quality dithering to the exported file. This allows you to preserve the sound quality and dynamic range of a higher bit depth, when exporting the audio file to a lower bit depth.”

Most DAWS include Dither options. In some cases it’s by way of a plugin. You may also notice Dither options included in application Preferences or Export dialogs.

Hopefully after reading this article you will understand what Dither is, it’s purpose, and whether you should consider implementing it. Please note: Dither must be applied at the very last stage of any processing chain.

-paul.

16 bit Audio

Bit Depth and Dither