🎬 Digital Audio Anatomy

Analog audio works by converting sound pressure waves into positive and negative voltages. In an analog to digital conversion, the audio is captured over time and stored as digital values. The waveform you see represents the loudness of the sound in its height (amplitude) and the sound’s frequency (AKA “pitch”) in its width.

Sample Rate

This is how frequently a digital sample is “taken” of your analog waveform. It’s samples per second and measured in Hertz. This will directly influence the frequency response of your audio capture because the fidelity with which the sampling happens determines it’s ability to capture rapidly recurring high frequencies. For simplicity’s sake, think of frequency as pitch: high frequencies sound sharp and tinny, while low frequencies are growling, visceral and easily transmitted through thin dorm room walls. The sample rate must be double the frequency of the audio source for effective capture. Humans can’t hear beyond 20kHz so theoretically 40khz would be sufficient. (Humans generally hear around 20Hz–20kHz.) Some argue that capturing frequencies above human perception has advantages but the evidence isn’t extremely compelling. For dialog at least, do not worry about recording above 48kHz. Common sampling rates include: 44.1khz (also called CD quality audio) 48 khz (a good sweet spot) and 96 khz.

Bit Depth

This controls the number of discrete levels your analog waveform is divided into, and how much data will be stored in every one of the above-mentioned samples. Sort of like dividing a $100 dollar bill into 2 $50 bills or 10 $10 bills. Similar to the concept of bit depth in video, the more levels capture, the more detail and subtlety can accurately be produced. The bit depth directly affects the dynamic range (difference between softest and loudest sounds) you can capture. With sound, the effect of bit depth is particularly apparent at the noise floor because the increased quantization means greater separation between signal and noise. Because low level signals are more easily discernible from noise, usable dynamic range is increased. 16 bit is a common audio bit depth and is the second oft-cited spec of “CD-quality” audio. 24 bits is worth the increase in file size because of aforementioned noise floor. If you have the option to record 24 bits then use it and record at a lower level, giving yourself more headroom before clipping. Higher than 24 bit doesn’t hold much value, though extremely high bit depths do allow for setting trim levels in post.

In this image the bit depth is four so we have 16 possible values (4^2).


DB (decibels) is a relative scale and has no quantitative meaning. A sound isn’t simply 50 decibels, though it could be 50 decibels louder than another sound. A 1dB level change is generally noticeable and a 6dB level change is perceived as being twice the volume. 


Digital audio is measured in dBFS or “decibels below full scale”. A digital audio signal peaks at 0– this means it only measures sound as values less than the maximum it can record which is 0. Digital levels are always negative.

dBu or dBv: references decibels to .775V. Frequently seen on VU meters, it’s an analog scale where 0 is a reference level, not a peak level. 0VU=+4dBu=-20dBFS. 0dBFS = 24dBu.

dBV: referenced to 1 volt and usually seen on “semi-pro” gear. 0dBV = 1 VoltVU: Analog audio is measured in “Volume Units” where 0dBVU is 1.228 volts if you really want to know. This doesn’t affect us much for the purposes of digital audio though it helps us understand some plugin user interfaces and mastering tools in the post production audio phase.

Image result for vu meter

Level Measure

RMS is the average sound level, verses “peak” levels which represent the loudest levels. A limiter or compressor let you bring RMS average up without clipping/distorting the peaks.


See the pro topic “Audio Levels in Post” for a look at this better, modern, measure of loudness.

Mic Level

The level generated by most microphones.1.5mv–70mv (1 millivolt is one thousandth of a volt).

Line Level

Level used for most audio mixers. Pro line level is +4DBu (1.23 volts) (4 decibels above 0dbU)Consumer line level is -10dBV (again, the “V” here means relative to 1 volt so this signal is closer to .32 volts or -7.8 dBu) (Keyboards and guitars are generally between mic and line level).

Speaker Level

10 volts. This is an output level designed for monitoring; it’s the very end of the chain.

Audio Containers

  • WAV: (RIFF file format) The Windows variety of linear pulse code modulated, generally uncompressed audio. A very universal file format for acquisition.
  • AIFF: (IFF file format) Linear pulse code modulated, generally uncompressed audio. AIFF is the Apple equivalent but both are very high quality formats.
  • XMF: (Extensible Music Format)
  • m4a/mp4: (AAC format) Advanced Audio Coding uses a modified discrete cosine transform algorithm, achieving superior quality at the same data rates as its predecessor MP3. This is a distribution format for the web. Sometimes the specific profile will be state like in YouTube’s upload recommendations where AAC-LC denote use of the “Low Complexity” profile.

See this video and this video for a more in-depth look at these topics.

Scroll to Top