/ DDSP, TB303

TB303 modeling

The Roland TB-303

The Roland TB-303 synthesizer was released in 1982 as a device meant to mimic bass sounds and sequences. Upon release it was a failure, not many units were sold, and its price went down. A few years later, the 303 started to be used also as a bass sequencer/synthesizer but it the context of a musical genre that it wasn’t designed for: Acid House. After a number of years and many releases, it provides the musical genre of its unique timbral signature, original Roland units are now sold at thousands of dollars, and many hardware and software emulators have been released.

This piece is from “Acid Tracks” by Phuture, considered the first Acid House record (1987). A 303 was to perform the bassline.

D.J. Pierre, one of Phuture’s composers, give some thoughts about the 303 and showcases its iconic sound sequencer capabilities.

Diferentiable Digital Signal Processing

Google Magenta’s DDSP library was released in 2019. It uses neural networks to learn the parameters of a digital signal processing pipeline based on user input (including the audio itself) to (re)create audio signals. Examples of models trained using DDSP are common, classical instruments such as violin, trumpet, etc.

D-303

I want to learn a DDSP model of the TB303. For this, I created 30 minutes of patterns using the ABL3 TB303 emulator.

Here are some audio snippets of the dataset:


I set up DDSP on a computer cluster, and trainied a model based on the recordings split into 4-second snippets. The model achieved the desired loss of ~5 at 400K steps (average loss of 4.94 in the last 1K steps)

DDSP is conditioned on user signals such as pitch and loudness. Therefore the generation of audio can be modified by the following features

  • NDT: Note detection
  • AQP: Adjustment for quiet parts (gate)
  • ATT: Force pitch to nearest note/Autotune
  • MPS: Manual Pitch Shift
  • LDS: Loudness Shift (output compensation)

Timbre transfer

Here are the results of timbre transfer using a few targets and the model created. The generation is conditioned with the aforementioned features in the form (NDT, AQP, ATT, MPS, LDS)

Original voice

When using a voice to do timbre transfer, the model is able to follow the voice pitches while retaining some characteristics of the 303.


Transfer (1, 30, 0.5, -1, 0)


With the conditioning modification the octave is correct. The pitch is jumpy but the original voice is also not neatly tuned.

Bass sounds

Original pulsing bass


Transfer


Square bass different notes


Transfer

New, simpler dataset of TB303

I made a second, much simpler dataset with the goal of training a better model. The hypothesis is by having longer, more steady notes, the modeling will work better.

Notes with the previous model jumped too much. The hunch is that behaviour happened due to different timbre happening for the same pitch and loudness.

Files in the new dataset consist of one note per second, ranging from C0 to D#5.

The set of notes were recorded with different settings for cutoff frequency and resonance.

F0 Res
0 0
25 0
50 0
75 0
100 0
0 25
0 50
75 100
100 100

In total, there are 21 minutes and 20 seconds of data. The following is a set of examples for varying f0 and res = 100%.

With this new, simpler dataset, the model is also much simpler, there aren’t any of the interesting rhythmic aspects that were part of the previous model.

As examples, here are four snippets of timbre transfer using the new model.


Flute melody


Sax


Sax (same example, but choosing one octave lower)


Voice


It is interesting to note that, since all notes were recorded one second apart, I have the impression the model is learning some kind of 1s detuned delay, for up to four delays. I guess the four delays come from the fact that, during training, the data is split into four seconds snippets, resulting in a total of four delays.

Modelling voice

To check if the delay is some kind of artifact of my implementation, I trained a model based on an a-capella recording of On and On by Erika Badu

I then tested the model in doing timbre transfer with the same audio targets from the TB303, resulting on:


Voice


Voice octave high


Voice octave low


Flute


Sax


Sine

Freq sweep from Badu retrain is a single frequency sweep from 100 to 110 Hz over 60 seconds using the retrained Badu model. I don’t happen to hear voice-like sounds during the sweep but there are interesting vibrato-like modulations.


Freq sweep 1

I’m still not quite certain if I’m actually listening to the Badu retrain or the sax model. From the code, it should be the voice, but I can’t hear it.

In freq-sweep-by-hand-2-from-badu-retrain.wav I’m creating a linear frequency sweep from 0 to 1000 Hz repeated a number of times.


Freq sweep 2