Audio “Hello, World!”

Introduction

In the previous post, we tested our development environment by writing a trivial application which displayed the text “Hello, World!”, followed by a newline, in the terminal.

Now that we’ve confirmed that we can create and execute code, let’s see about getting some sound: we’ll generate a square wave tone at 440Hz as a test. Also known as Concert A, 440Hz is a general tuning standard for musical pitch.

Square Wave Plot

A plot of three cycles of our square wave.

In order to do that, we’ll break it into two steps:

  1. Generate the audio data for the 440Hz tone
  2. Deliver the audio data to the audio hardware so it ends up as sound

The Code

Python Version

Generating the audio data for the 440Hz tone

The following is some Python code to generate the audio data.

import struct
import sys

# The audio hardware's sampling rate in Hz (samples/sec.)
sample_rate=44100

# The square wave frequency in Hz (cycles/sec.)
frequency=440

# The duration of the square wave tone in ms.
duration=1000

# The amplitude of the square wave represented as a fraction of
# the maximum output volume.
amplitude=0.3

# Calculate the total number of samples required to produce a tone
# of the duration specified.
samples = sample_rate * duration / 1000

# Calculate the number of samples in each half of the tone's cycle.
tone_midpoint = sample_rate / frequency / 2

# Calculate the maximum (negative) sample value.
sample = -(1 << (16 - 1)) * amplitude

# Iterate over the range of samples we've calculated are required.
for i in range(0, samples):
    # Each time the iterator value reaches a half-cycle, change the
    # sample's sign, from the positive to the negative domain and
    # vice-versa.
    if i % tone_midpoint == 0:
        sample = -sample

    # Output the sample value to stdout as a little-endian 16bit integer
    sys.stdout.write(struct.pack('<h', sample))

Save the code as audio-hello-world.py or download a copy from the GitHub repository.

(NOTE: Although the GitHub version is packaged slightly differently from the code above, the algorithm is identical.)

The code can be run with:

python audio-hello-world.py

When you run it, don’t be alarmed by the apparent gibberish that it outputs, what you’re seeing is the raw PCM data that will be sent to audio hardware. You can clear the screen with the following command:

clear

 Delivering the audio data to the audio hardware

From the command line we can use an external program to deliver the data to our hardware by using a Unix pipe, in which the output of our program will be fed as the input to the audio player utility.

The command we’ll be piping the data to is aplay, a command-line audio file player for the ALSA sound card driver. In later posts we’ll be developing our own copy of this utility, and a companion utility to save raw data to sound files. If this utility isn’t available on your system, you’ll need to find an audio player/editor which can import raw PCM data.

Since our output data is in a “raw” format, we’ll have to inform the player about which format the data that is being piped to it is in. This is done using command line switches:

python audio-hello-world.py | aplay --file-type raw --rate=44100 --channels=1 --format=S16

If all goes well, you should hear one second of a reedy sounding tone.

C Version

The C version is very similar to the Python version, with the only major difference being the way it outputs the little-endian values (line 43).

#include <stdio.h>
#include <stdlib.h>

int gen_square_wave(int sample_rate, int frequency, int duration, float amplitude)
{
    /*
        Given the input parameters generates square waves and outputs them
        to stdout.

        Parameters:

        - `sample_rate`: The audio hardware's sampling rate in Hz
            (samples/sec.)
        - `frequency`: The square wave frequency in Hz (cycles/sec.)
        - `duration`: The duration of the square wave tone in ms.
        - `amplitude`: The amplitude of the square wave represented as a
            fraction of the maximum output volume.

    */

    // Calculate the total number of samples required to produce a tone of
    // the duration specified.
    int samples = sample_rate * duration / 1000;

    // Calculate the number of samples in each half of the tone's cycle.
    int tone_midpoint = sample_rate / frequency / 2;

    // Calculate the maximum (negative) sample value.
    int sample = -(1 << (16 - 1)) * amplitude;

    // Iterate over the range of samples we've calculated are required.
    int i;
    for(i=0; i < samples; i++)
    {
        // Each time the iterator value reaches a half-cycle, change the
        // sample's sign, from the positive to the negative domain and
        // vice-versa.
        if(i % tone_midpoint == 0)
            sample = -sample;

        // Output the sample value to stdout as a little-endian 16bit
        // integer
        printf("%c%c", sample & 0xff, (sample >> 8) & 0xff);
    }

    return 0;
}

int main(int argc, char *argv[])
{
    gen_square_wave(44100, 440, 1000, 0.3);
    return 0;
}

Save this version as audio-hello-world.c or download a copy from GitHub.

The code can be compiled and run with:

gcc -o audio-hello-world audio-hello-world.c
./audio-hello-world | aplay --file-type raw --rate=44100 --channels=1 --format=S16

The output should be the same as the Python version, one second of a reedy sounding tone .

(Courtesy Wikipedia)

Background

In order to understand how the code works, we need a little background on Pulse Code Modulation, or PCM. PCM is the format which modern sound hardware uses to encode and decode audio signals—which are analog—into a digital format.

An example of 4-bit pulse code modulation showing quantization and sampling of a signal (red).

An example of 4-bit pulse code modulation showing quantization and sampling of a signal (red).

Encoding an analog signal is done by measuring the amplitude of the sound’s waveform at regular intervals. These measurements are called samples and are measured in bits per sample. The number of intervals in one second is the sampling rate which is measured in samples per second, but will often be referred to in Hz. The hardware which performs this transformation is called an Analog to Digital Converter, or ADC.

To decode the digital signal, this operation is performed in reverse, and a smoothing filter is added  to remove the “stairstepping” inherent in the process. The hardware circuitry to do this is called a Digital to Analog Converter or DAC.

The amount of data dedicated to recording each sample determines its resolution, while the number of samples recorded per second is the granularity. These two values combined yield the fidelity (disregarding the hardware and any noise or artifacts which it may introduce.) Higher sampling rates and more bits per sample increase the fidelity, while lower values decrease it.

Finally there are channels. The information discussed thus far is enough to encode and decode a monaural waveform, but not stereo or other multi-channel audio. The audio hardware has an ADC/DAC for every input/output channel, ganged to operate at the same sampling rate. Each one generates or consumes a sample. The combination of all of the samples for each channel, per interval, is called a frame. In stereo recordings the sample for the first frame’s left channel is first, followed by the right, then the second frame’s left channel, then right, and continues this way until the end of data. In four channel data, the front left and right samples are first, followed by the rear pair. Organizing the data in this format rather than all of the data for one channel, followed by all of the data for another is called interleaving.

Code Analysis

With the background covered, we can proceed to dissect the code.

NOTE: The lines referred to indicate the line numbers in the original version, not in the snippets below.

Python Version

Lines 1-2 import some modules we use in the code:
import struct
import sys
Lines 4-15 set some constant values:
  • sampling_rate is the system’s audio playback frequency and should be set to the same value provided to aplay via the --rate=44100 command line option.
  • frequency is the tone frequency we wish to produce.
  • duration is the length of time we want the tone to play.
  • amplitude is a fractional value of the maximum possible amplitude for our tone to play at.
# The audio hardware's sampling rate in Hz (samples/sec.)
sample_rate=44100

# The square wave frequency in Hz (cycles/sec.)
frequency=440

# The duration of the square wave tone in ms.
duration=1000

# The amplitude of the square wave represented as a fraction of
# the maximum output volume.
amplitude=0.3
Lines 17-25 calculate values based on the values provided:
  • samples calculates the total number of samples to be generated. Since the sample rate is in samples/sec., and duration is in ms., we multiply the two together and divide by 1,000.
  • tone_midpoint is used to determine when one half of the tone’s cycle has passed. The sample rate is in samples/sec., and the tone frequency is in cycles/sec., so the dividend of the two results in samples/cycle, and we further divide this by 2 to get a half-cycle.
  • sample is a calculation which returns the value of our sample to output. The values we output are signed 16 bit integers, so when the value 1 is bit shifted 15 columns to the left it yields the maximum positive value that can be stored in a container of that size. This value is then multiplied by the amplitude value to reduce the volume. The value is negated due to a quirk in the way we calculate when to “flip” the sample into the opposite domain; although the tone would sound the same if it was left off, the square wave would begin its cycle in the negative domain; a plot of it would show it to be inverted (or out of phase by 180°).
# Calculate the total number of samples required to produce a tone
# of the duration specified.
samples = sample_rate * duration / 1000

# Calculate the number of samples in each half of the tone's cycle.
tone_midpoint = sample_rate / frequency / 2

# Calculate the maximum (negative) sample value.
sample = -(1 << (16 - 1)) * amplitude
Lines 27-36 perform the output loop:
  • i is our iterator value. We use this value in calculations to determine where in the tone’s cycle the current sample is.
  • i % tone_midpoint will equal 0 at the start, and the mid-point of our tone’s cycle, repeating each time a full cycle is complete. When it has reached this point, we flip the sample’s domain.
  • struct.pack('<h', sample) uses the Python struct module to convert the sample’s value to a binary format. The '<h' string tells the struct.pack() method we want 16 bit values, and to order the bytes in the little-endian format. Our C version performs this operation explicitly, and so reveals the inner workings of this operation.
  • sys.stdout.write() outputs its parameter to the standard output. It is nearly identical to Python’s print function, but doesn’t append a newline to the output (which would garble the PCM data.)
# Iterate over the range of samples we've calculated are required.
for i in range(0, samples):
    # Each time the iterator value reaches a half-cycle, change the
    # sample's sign, from the positive to the negative domain and
    # vice-versa.
    if i % tone_midpoint == 0:
        sample = -sample

    # Output the sample value to stdout as a little-endian 16bit integer
    sys.stdout.write(struct.pack('<h', sample))

C Version

The C version is so similar to the Python version we’ll only analyse the one major difference, line 43:

  • printf("%c%c", char, char) prints two characters (8 bits each) in a row.
  • sample & 0xff yields only the lowest 8 bits in an integer
  • (sample >> 8) & 0xff shifts an integer to the right by 8 bits, then yields only the lowest 8 bits in an integer
printf("%c%c", sample & 0xff, (sample >> 8) & 0xff);

NOTE: The printf function is notoriously slow; in most circumstances the following code is preferable:

fputc(output & 0xff, stdout); // Low byte
fputc((output >> 8) & 0xff, stdout); // High byte

In Conclusion

We’ve now seen how easy it is to generate a sound from scratch by using Unix pipes and outputting a series of characters—without comments, just 13 lines of Python code to generate a tone.

We used a square wave in our example in order to stay away from the small amount of trigonometry involved in generating sine waves. In the next post, we’ll generate them and some other wave forms.

Tagged with: , ,
Posted in Introductory

Getting Started

Welcome

Welcome to ToneGenerated, a blog devoted to audio programming. See the About page for an overview of its scope and purpose.

On the blog, we’re developing code using GNU/Linux,  ANSI C using the GNU C Compiler (GCC) to create binary executables, the Python programming language for scripting and prototyping, and some other command line tools; however, most projects have few if any outside dependencies, and should be readily ported to other operating systems, compilers or programming languages.

The Code

Hello World! in C

The traditional test of the C build environment is helloworld.c:

#include <stdio.h>;

int main(int argc, char *argv[])
{
    printf("Hello, World!\n");
    return(0);
}

When saved as helloworld.c, the code can be compiled with:

gcc -o helloworld helloworld.c

Assuming everything goes well, you’ll have just built an executable called helloworld.

After being run with:

./helloworld

The terminal should greet us with:

Hello, World!

Hello World! in Python

We’ll test Python scripting in the same way we did with C, by having it output “Hello, World!”:

print("Hello, World!")

This code can be run in an interactive in Python session. In a terminal run:

python

Either type or cut & paste the code above to test Python.

Alternatively the code can be saved to a file, such as helloworld.py and run with:

python helloworld.py

When run from a file, we can add a Shebang interpreter directive to the top of the file, then mark the file as executable, to tell the terminal shell which interpreter to use:

#/usr/bin/env python
print("Hello, World!")

Save the file, as helloworld.py and run:

chmod +x helloworld.py

The chmod command only needs to be run once, when the file is created. Now we can run the file in the same manner we did with our C version:

./helloworld.py

In Conclusion

Congratulations, you’ve just made the computer output a bunch of characters, which doesn’t seem like much–but in the next post I’ll show you why it is.

Tagged with: ,
Posted in Introductory

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1 other follower

Follow

Get every new post delivered to your Inbox.