1 February 2021 / DDSP, TB303

TB303 modeling

The Roland TB-303

The Roland TB-303 synthesizer was released in 1982 as a device meant to mimic bass sounds and sequences. Upon release it was a failure, not many units were sold, and its price went down. A few years later, the 303 started to be used also as a bass sequencer/synthesizer but it the context of a musical genre that it wasn’t designed for: Acid House. After a number of years and many releases, it provides the musical genre of its unique timbral signature, original Roland units are now sold at thousands of dollars, and many hardware and software emulators have been released.

This piece is from “Acid Tracks” by Phuture, considered the first Acid House record (1987). A 303 was to perform the bassline.

D.J. Pierre, one of Phuture’s composers, give some thoughts about the 303 and showcases its iconic sound sequencer capabilities.

Diferentiable Digital Signal Processing

Google Magenta’s DDSP library was released in 2019. It uses neural networks to learn the parameters of a digital signal processing pipeline based on user input (including the audio itself) to (re)create audio signals. Examples of models trained using DDSP are common, classical instruments such as violin, trumpet, etc.

D-303

I want to learn a DDSP model of the TB303. For this, I created 30 minutes of patterns using the ABL3 TB303 emulator.

Here are some audio snippets of the dataset:

I set up DDSP on a computer cluster, and trainied a model based on the recordings split into 4-second snippets. The model achieved the desired loss of ~5 at 400K steps (average loss of 4.94 in the last 1K steps)

DDSP is conditioned on user signals such as pitch and loudness. Therefore the generation of audio can be modified by the following features

NDT: Note detection
AQP: Adjustment for quiet parts (gate)
ATT: Force pitch to nearest note/Autotune
MPS: Manual Pitch Shift
LDS: Loudness Shift (output compensation)

Timbre transfer

Here are the results of timbre transfer using a few targets and the model created. The generation is conditioned with the aforementioned features in the form (NDT, AQP, ATT, MPS, LDS)

Original voice

When using a voice to do timbre transfer, the model is able to follow the voice pitches while retaining some characteristics of the 303.

Transfer (1, 30, 0.5, -1, 0)

With the conditioning modification the octave is correct. The pitch is jumpy but the original voice is also not neatly tuned.

Bass sounds

Original pulsing bass

Transfer

Square bass different notes

Transfer

New, simpler dataset of TB303

I made a second, much simpler dataset with the goal of training a better model. The hypothesis is by having longer, more steady notes, the modeling will work better.

Notes with the previous model jumped too much. The hunch is that behaviour happened due to different timbre happening for the same pitch and loudness.

Files in the new dataset consist of one note per second, ranging from C0 to D#5.

The set of notes were recorded with different settings for cutoff frequency and resonance.

F0	Res
0	0
25	0
50	0
75	0
100	0
0	25
0	50
…	…
75	100
100	100

In total, there are 21 minutes and 20 seconds of data. The following is a set of examples for varying f0 and res = 100%.

With this new, simpler dataset, the model is also much simpler, there aren’t any of the interesting rhythmic aspects that were part of the previous model.

As examples, here are four snippets of timbre transfer using the new model.

Flute melody

Sax

Sax (same example, but choosing one octave lower)

Voice

It is interesting to note that, since all notes were recorded one second apart, I have the impression the model is learning some kind of 1s detuned delay, for up to four delays. I guess the four delays come from the fact that, during training, the data is split into four seconds snippets, resulting in a total of four delays.

Modelling voice

To check if the delay is some kind of artifact of my implementation, I trained a model based on an a-capella recording of On and On by Erika Badu

I then tested the model in doing timbre transfer with the same audio targets from the TB303, resulting on:

Voice

Voice octave high

Voice octave low

Flute

Sax

Sine

Freq sweep from Badu retrain is a single frequency sweep from 100 to 110 Hz over 60 seconds using the retrained Badu model. I don’t happen to hear voice-like sounds during the sweep but there are interesting vibrato-like modulations.

Freq sweep 1

I’m still not quite certain if I’m actually listening to the Badu retrain or the sax model. From the code, it should be the voice, but I can’t hear it.

In freq-sweep-by-hand-2-from-badu-retrain.wav I’m creating a linear frequency sweep from 0 to 1000 Hz repeated a number of times.

Freq sweep 2

TB303 modeling

The Roland TB-303

Diferentiable Digital Signal Processing

D-303

Timbre transfer

Original voice

Bass sounds

New, simpler dataset of TB303

Modelling voice

RAVE modelling

Experiments with temperature in Jukebox

The Roland TB-303

Diferentiable Digital Signal Processing

D-303

Timbre transfer

Original voice

Bass sounds

New, simpler dataset of TB303

Modelling voice

Subscribe to clastic music blog

Subscribe to clastic music blog