Sound is the audible change in air pressure that propagates through the air as a longitudinal wave. Thus, the air particles move back and forth in the direction of movement of the sound, with a wave of compression and expansion moving at about 1235 km / h in air from the sound source.


The simplest form of sound is a single wave with a certain frequency, expressed in Hz. The air pressure changes can be represented as a sinusoid with the compression above the average pressure and the expansion below it. The frequency determines the pitch. Half of the difference between the maximum and minimum pressure is the amplitude. The sound pressure is expressed in Pa and is relative to a reference pressure of 2 × 10−5Pa. This is the hearing threshold of an average person at a frequency of 1000 Hz. Given the large range of the sound pressure, it is often plotted on a logarithmic scale and then expressed in decibels (dB). An amplitude twice greater is equivalent to 6 dB, while a ten times greater amplitude is 20 dB. A tuning fork comes close to a single wave, which has hardly any overtones in addition to the root. A tone generator can also generate this wave.

In practice, sound almost never consists of sinusoidal waveforms, and even a single note of a musical instrument is a complex wave. However, it can be decomposed into a series of single waves, each with its own frequency, amplitude and phase. The lowest frequency wave is the root, while the others are the overtones. With musical instruments, there are often standing waves because the string is clamped on two sides or the pipe has a certain length. If the frequency of the overtones is a multiple of that of the fundamental, then harmonics are used. The root is then the first harmonic and the overtone with a frequency five times higher is the fifth harmonic. In many cases, the fundamental determines the perceived pitch of the composite wave, while the overtones determine the timbre. The phases of the different harmonics are not the same and this is reflected in the onset or attack of a tone. However, many natural sounds do not have fixed frequencies and harmonics.
The red complex wave f is decomposed into blue single waves. These are plotted with amplitude at their frequency. By making a phase spectrum in addition to this size spectrum, the entire complex wave is described

A complex wave can be decomposed into single components that can be plotted in a Fourier spectrum using Fourier analysis. This can be converted back into the complex wave via fourier synthesis. The development of the Fast Fourier transform by James Cooley and John Tukey in 1965 drastically reduced the calculation time, enabling digitization of sound.

Sound is usually variable and cannot be captured with a single fourier spectrum. The change over time can then be plotted in a spectrogram, representing a series of Fourier spectra.

In addition to the physical aspects of sound, humans themselves play an important role in how sound is perceived and how different sounds can be distinguished from each other, despite arriving at the ears as a single composite wave. The relationship between them is part of psychoacoustics. The anatomy of the ear is of great importance. In the outer ear, sounds between 1.5 and 7 kHz are amplified, while high frequencies are attenuated. The degree of attenuation depends on the position of the sound source, with higher placed sounds being more muffled, which contributes to directional hearing. The sensitivity of the middle ear is maximum around 1 kHz and the combination of outer and middle ear gives maximum sensitivity for sounds between 1 and 3 kHz. The inner ear is located in the cochlea the basilar membrane which, with its specific shape and thousands of hair cells, detects frequencies between approximately 20 and 20,000 Hz. A specific frequency results in a maximum result of the hair cells at a specific place within this membrane. The ear is thus able to perceive different components of a complex wave. Georg Ohm even stated with Ohm’s acoustic law that the ear could distinguish all components, which is why the ear is called a fourier analyzer. However, this is not entirely true. For example, the ear only has a linear character at very low and very high frequencies, between which the frequencies are amplified as indicated. In addition, the membrane is not able to distinguish two closely spaced frequencies. Nevertheless, the ability to distinguish between different sounds in a cacophony is remarkable, making the comparison with fourier analysis meaningful. Only the distinction of phase does not go well in the ear and only succeeds in exceptional circumstances.

The fact that hearing is not linear has consequences for how it is experienced. An equal loudness is perceived as lower at very low and very high frequencies. This is expressed as loudness. How sound is experienced depends, besides loudness, on the pitch, timbre, duration, texture and direction. As mentioned, the perceived pitch usually depends on the root and the timbre of the overtones. The tone duration is determined by how long a tone is held. The texture depends on the complexity, such as monophonic sounds with a single tone and polyphonic sounds with many tones. The fact that sound is perceived differently is used in the lossy compression of MP3, among others.