Dojikko v2

Posted on 2018-07-08 at 21:00 in Videos.

I’ve given my little robot a huge upgrade – she can now see the world properly! This video is just an introduction, and there’ll be a proper demonstration of her path-following abilities later.

[Watch in HD]

Her brain is now a Raspberry Pi instead of an Arduino, and she sees with an infrared camera (for better low-light performance) in greyscale, instead of just measuring the distance in front of her. This means she can now have a proper goal – instead of just moving towards walls and then turning, she can now drive along a path!

She uses a neural network to judge how quickly she should be driving and how to steer. Although she only sees at 128×64 resolution, this is a huge improvement! Currently, I’m still in the process of training her well (driving along paths with her recording the view and the controls that I’m giving her).

In a future video, I will also go into details of the circuitry, including the way that the Raspberry Pi can hold its own power on and only turn it off once it’s finished shutting down, because the only explanations for how to do this that I could find online required a ridiculous number of components and constantly leaked small amounts of power when turned off, which this way does not. Plus, this way only requires a relay, transistor and resistor.

Please forgive the inverted colours of the subtitles!

I only noticed this after I had subtitled the entire video, and there’s no easy way to batch-change this in the video editor. I tried using a hex editor to find/replace the colours, but to no avail… orz
I could pretend that it’s a throw-back to the time when I used the colours this way, but it was actually a mistake.

BaWaMI (revision 135)

Posted on 2018-06-19 at 19:01 in Music, Programs.

This update fixes a bunch of bugs and issues, and improves on what is saved between runs. As always, full details of changes are below, but please make sure that you check the details of which settings are now saved between runs to avoid any surprises, and because it has affected a couple of command line parameters.

You can download this new version here (7.82 MB).

Read the rest of this entry »

Gyroscope MIDI Controller

Posted on 2018-01-23 at 14:57 in Music, Programs, Videos.

I made a program to send pitch-bend messages to Bawami (my MIDI synth) based on the strongest reading out of the X/Y/Z axes of the gyroscope on the GY-87 sensor board, via an Arduino. Gently moving the sensor makes for a really natural-feeling control for vibrato, allowing really subtle (or not-so-subtle) pitch changes.

[Watch in HD]

I was able to get readings from the board to Windows at a stable speed of 400 Hz, but to avoid spamming too many MIDI messages (a problem if sending them outside the computer to some hardware synth), the pitch-bends are “only” being sent at 100 Hz. =P

The GY-87 also has X/Y/Z accelerometers, but these were way too sensitive to orientation to be convenient to use as a controller. Gravity is always pulling down on one axis, so if you tilt the sensor then it massively overwhelms the readings that you actually want (the ones caused by moving the sensor around). The best use I could get from them was tracking the maximum difference between 2 points in time and sending that as a MIDI message, which basically just made it respond to vibrations (and only made positive numbers). The gyros naturally only detect changes, so the readings centre around 0 and go negative when turning in one direction and positive in the other, ideal for vibrato.

BaWaMI (revision 134)

Posted on 2017-12-05 at 18:45 in Music, Programs.

This is a tiny update which simply fixes the checkbox to enable/disable responding to MIDI channel coarse/fine tuning messages, on the “MIDI params” tab of the config window, so that it actually has an effect. Previously, Bawami always responded to those messages even if the checkbox was unticked.

You can grab this fixed version here (7.80 MB).

BaWaMI (revision 133)

Posted on 2017-11-29 at 00:40 in Music, Programs.

This is a big update which fixes a bunch of bugs, especially ones related to the PC speaker, and graphical mistakes. A new internal tuning system means Bawami now supports a big range of tuning messages (their effects can combine together!), plus there are a few new instruments and tweaks to existing ones.

Some of the MIDI Tuning Standard messages are quite advanced, and you’d typically use some other scale-related software to generate the SysEx messages rather than hand-crafting them, but they mean that Bawami can now play with tuning other than equal temperament, or different scales entirely (e.g. Arabic).

You can grab this new version from here (7.80 MB), and view details of all the changed stuff in the full post, below:

Read the rest of this entry »

Testing MMSSTV with messed-up signals

Posted on 2017-10-31 at 05:11 in Random, Videos.

I applied a couple of strong vibratos to an SSTV signal (a picture encoded as one long sound) just to see what effect the unstable frequency would have when decoded using MMSSTV. Amazingly, it was still able to detect the signal and start decoding, but of course, it looks too scrambled to make out. I like how the artifacts look, though.

[Watch in HD]

I’m using Virtual Audio Cable to connect MMSSTV (encoder/decoder) with Audacity (which I used to apply the excessive vibratos), and Audio Repeater to “echo” the sound from the virtual cable to the speakers, so I can hear it live (and capture it in the video). Audio Repeater introduces about half a second of delay, though.

SSTV (slow-scan television) is a way of transmitting pictures over the air when you have very little bandwidth available (around 2.6 KHz, vs several MHz for ordinary analogue TV), sometimes used by amateur radio operators. It works by modulating the frequency of a sine wave according to the brightness of the pixels (per colour channel) row-by-row, so by applying a vibrato to the sound, the sound is pulled into and out of phase (but still stays in-phase on average). In other words, the rows are being shifted left/right (each colour channel independently). That’s why the image is rough along the vertical edges instead of being a nice straight line – sometimes, each colour channel is being pulled out of phase and dragged to the right, and sometimes it’s being pulled to the left (which causes it to wrap back to the right with inverted colours, because it’s interrupting the time slot that was dedicated to a different colour channel). Fun stuff to mess around with!

BinToUTF8 – Public release

Posted on 2017-05-04 at 23:48 in Programs.

Because several people have asked for it, I’ve decided to release my program for converting any binary file to a valid UTF-8-encoded text file (and vice-versa). This is the program I made to be able to train the open-source neural network software “torch-rnn” on audio, even though it’s only designed to work with text, in these previous videos.

My program is a console-mode program, so it has no graphical interface, and it’s an EXE, so it’ll only run on Windows (and maybe Wine). It’s also slow, because I hadn’t had the pressure (from the idea of making it public) to optimize it until I suddenly decided to release it this evening. It comes with pseudocode and a technical description for any programmers who want to remake it to run on other OSes, though (they’re the same text files I linked to in the blog post for my first neural network video).

The download contains BinToUTF8.exe, which you can use yourself on the command prompt (run it from the command prompt without any parameters to see usage instructions). It also contains several batch files, which make it much more convenient to use – you only have to drag a binary or text file onto the batch file on Windows Explorer to automatically launch BinToUTF8.exe with the appropriate command line parameters.

A brief description is below, but make sure you read the included “info.txt” to find out what each batch file does and avoid accidentally overwriting any of your own files!

The program works by assigning a unique Unicode or ASCII character to each of the 256 possible byte values in your binary file. There are 2 modes for this:

  • Byte/Character Lookup (BCL) mode (recommended):

Characters are assigned on a “first-come, first-served” basis, meaning that bytes appearing near the beginning of the file will be assigned ASCII characters, and Chinese Unicode characters will only be used once no more ACSII characters are available. This is done to allow you to pass text from the start of the file to torch-rnn using torch-rnn’s -start_text parameter, which does not support Unicode characters. A utf8.bcl file is made when converting to text and is required when converting back to binary. This file is the lookup table for converting between bytes and Unicode characters which the program made when converting the binary file to text.

  • Non-BCL mode (default, not recommended for torch-rnn):

All bytes are converted to Chinese Unicode characters and none are converted to ASCII. This means the text file will be larger, but more importantly, you won’t be able to use any of this text with torch-rnn’s -start_text parameter. The conversion in this mode may be faster, and no utf8.bcl file is made or required.

Text files made using the BCL mode cannot be converted back to binary using the non-BCL mode, and vice-versa. To convert text back to binary correctly, you must use the same mode that you used when converting the original binary file to text.

You can download BinToUTF8 from here (19 KB). Now, have fun!

(By the way, if training torch-rnn on audio files, you should use an 8-bit audio encoding such as 8-bit PCM, U-law or A-law, to be kind to torch-rnn.)

BaWaMI (revision 132)

Posted on 2017-04-23 at 23:02 in Music, Programs.

This biggest update ever to my MIDI software synth contains dozens of bug- and crash-fixes, improvements to live MIDI input, and a big new feature for instruments called “multi-osc” (explained below), which many instruments now take advantage of! It’s stable when clicking “Apply” to restarting the sound system, which often caused crashes in the past, and there are a couple of new features to do with overriding controls. Also, one particular system file (included since a long time ago) is now correctly checked / set up when Bawami starts, which may fix Bawami not being able to start for some people. All these improvements mean that Bawami has grown to version 0.7!

The new “multi-osc” feature for instrument files allows one note to trigger more than one sound channel, massively improving the sound of some instruments. This opens the door to having a proper Fifths instrument, octave basses, octave-stacked strings, detuned Honkey Tonk, better organs and more! Of course, I updated lots of instruments to take advantage of this, and added new GS instruments whose sounds simply weren’t possible to generate before. Multi-osc is enabled by default, but can be disabled if you want to keep CPU usage as low as possible (if you really hate the new sound, you can replace all instrument files with those from the previous version, or have fun editing them yourself!).

You can grab this shiny new version from here (7.79 MB), and view the full post to see exactly what’s changed, below:

Read the rest of this entry »

Neural Network Tries to Generate English Speech (RNN/LSTM)

Posted on 2016-12-24 at 20:56 in Programs, Videos.

By popular demand, I threw my own voice into a neural network (3 times) and got it to recreate what it had learned along the way!

[Watch in HD]

This is 3 different recurrent neural networks (LSTM type) trying to find patterns in raw audio and reproduce them as well as they can. The networks are quite small considering the complexity of the data. I recorded 3 different vocal sessions as training data for the network, trying to get more impressive results out of the network each time. The audio is 8-bit and a low sample rate because sound files get very big very quickly, making the training of the network take a very long time. Well over 300 hours of training in total went into the experiments with my voice that led to this video.

The graphs are created from log files made during training, and show the progress that it was making leading up to immediately before the audio that you hear at every point in the video. Their scrolling speeds up at points where I only show a short sample of the sound, because I wanted to dedicated more time to the more impressive parts. I included a lot of information in the video itself where it’s relevant (and at the end), especially details about each of the 3 neural networks at the beginning of each of the 3 sections, so please be sure to check that if you’d like more details.

I’m less happy with the results this time around than in my last RNN+voice video, because I’ve experimented much less with my own voice than I have with higher-pitched voices from various games and haven’t found the ideal combination of settings yet. That’s because I don’t really want to hear the sound of my own voice, but so many people commented on my old video that they wanted to hear a neural network trained on a male English voice, so here we are now! Also, learning from a low-pitched voice is not as easy as with a high-pitched voice, for reasons explained in the first part of the video (basically, the most fundamental patterns are longer with a low-pitched voice).

The neural network software is the open-source “torch-rnn“, although that is only designed to learn from plain text. Frankly, I’m still amazed at what a good job it does of learning from raw audio, with many overlapping patterns over longer timeframes than text. I made a program (explained here, and available for download here) that substitutes raw bytes in any file (e.g. audio) for valid UTF-8 text characters and torch-rnn happily learned from it. My program also substituted torch-rnn’s generated text back into raw bytes to get audio again. I do not understand the mathematics and low-level algorithms that go make a neural network work, and I cannot program my own, so please check the code and .md files at torch-rnn’s Github page for details. Also, torch-rnn is actually a more-efficient fork of an earlier software called char-rnn, whose project page also has a lot of useful information.

I will probably soon release the program that I wrote to create the line graphs from CSV files. It can make images up to 16383 pixels wide/tall with customisable colours, from CSV files with hundreds of thousands of lines, in a few seconds. All free software I could find failed hideously at this (e.g. OpenOffice Calc took over a minute to refresh the screen with only a fraction of that many lines, during which time it stopped responding; the lines overlapped in an ugly way that meant you couldn’t even see the average value; and “exporting” graphs is limited to pressing Print Screen, so you’re limited to the width of your screen… really?).

Noob Pancakes

Posted on 2016-09-25 at 20:56 in Random, Videos.

This isn’t going to become a thing on this channel – I was just hungry and wanted to record it… I think I should stick to computer stuff. If I hadn’t put any effort into editing this, I would’ve put this on my other channel.

[Watch in HD]

If I hadn’t put any effort into editing this, I would’ve put this on my other channel.