Blog  |   Puzzles  |   Books  |   About

Text to Song

I’m still working on making artificial yet authentic 17th century vocal music, so I thought I’d provide some MP3 samples to listen to, and give capsule reviews of a few of the different software packages I am using, in particular, Flinger, and Tracktion.

If you haven’t read my previous post about this project, check it out first and come back. Otherwise, some of this won’t make a heck of a lot of sense.

The program I am writing, Organum, is written in Perl. The program outputs algorithmically constructed 17th Century Latin hymns in the ABCPlus format, which can be converted to 4-part vocal sheet music (or piano/organ music) and to MIDI, using utilities available at the ABCPlus website. I’ve mentioned this excellent and free notation software in a previous post.

Last Saturday, I began producing audio files in which the computer “sings” the Kircherian hymns in Latin. Here is the very first result, “Ave Maris,” which is riddled with errors (mispronunciations and pitch problems), but nonetheless sent a chill up my spine the first time I heard it.

Ave Maris (MP3)

The lyrics being sung are:

Ave, Maris stella,
Déi mater alma,
atque semper virgo,
félix caeli porta.

Here’s a somewhat more polished example, using Kircher’s tables which produce music in the “Florid Style.”:

Veni Creator Spiritus (MP3)

The lyrics being sung (and/or mangled) are:

Veni, creator spiritus,
Mentes tuorum visita,
Imple superna gratia
Quae tu creasti pectora.

I produced the vocal recordings using an obscure program called Flinger, which was developed by the late Dr. Mike Macon.

I learned about Flinger by perusing Princeton’s excellent VOCE website, which contains a history of speech synthesis, with a special focus on singing synthesis – a somewhat neglected topic, academically, because it doesn’t produce oodles of cash the way non-musical text-to-speech technologies do. VOCE is also where I learned about Joseph Faber’s 1845 artifical talking head, shown above, and other fabulous and eerie Promethian inventions. Needless to say, Father Kircher’s name comes up in this curious history. Once you learn his name, you’ll discover that his fingers are everywhere…

Speech and Singing Synthesis is something that, like computer graphics of the human face, is extremely difficult to do well, because we have a very large amount of neural circuitry devoted to listening and watching humans. Nonetheless, I enjoy the uncanny valley effect that singing synthesis produces. It’s eerie, but good-eerie.

Flinger (the name is derived from “Festival Singer”), is a singing synthesis program based on the flexible Festival speech synthesis engine from the University of Edinburgh. Flinger uses Festival to “sing” a MIDI file which contains information about which notes to sing, and which syllables to enunciate. Festival is controlled by a Scheme (Lisp) engine and has a flexible and broad set of parameters that can be tweaked. Flinger can output the singing directly to the speakers on the computer, or to a WAV file. Flinger can only perform one “voice” at a time.

Because Dr. Macon is no longer available to shepherd the Flinger project, the project is languishing at the Orgeon Science and Health University (OGI). The software is not well supported, and a bit persnickety. It is obvious from looking at the activity (or lack thereof) on the website that Flinger is in danger of becoming “abandonware” (if it isn’t already). Hopefully, one day the good folks at OGI will realize that it is more beneficial to them to release the source code, rather than to keep Flinger proprietary.

I had to do some work to get my MIDI files into a format that Flinger could handle. The format produced by the abc2midi utility I was using wouldn’t work, so I had to generate the MIDI files myself, in Perl. I accomplished this by reverse engineering the demo MIDI file that accompanies Flinger, and then writing code to produce files in the same format I found the CPAN Perl module: Perl::MIDI particularly helpful. It allows you to easily produce a raw dump of a MIDI file and then produce files which mimic the format of the dumped file.

To make a choir, I have to produce 4 midi files (one for each vocal part) and render each vocal line individually in Flinger (using slightly different vocal settings for each voice) producing 4 different WAV files (or more, if I want more singers in the choir). I then mix the results together in an audio editor and add reverb (LOTS of reverb, it hides a multitude of sins). There are two ways I commonly accomplish this type of task. For quick jobs, I will often use the excellent free audio editor, Audacity.

However, if I know I’m going to be spending some time massaging the audio, then I will use my favorite all-in-one home-studio software, Tracktion. Tracktion isn’t free, but it is incredibly full-featured for the price ($150) and, in my opinion, has the best user-interface of any piece of music software with equivalent functionality (e.g. Cubase, Cakewalk, etc. ). Here’s a screenshot showing the mix of my four synthetic vocal lines:

I originally bought Tracktion a couple of years ago, when it was being sold by it’s creator Julian (Jules) Storer, at Raw Material software.

Tracktion is a model of my conviction that the best software is written by one or two extremely talented designer/programmers, rather than large teams of middling programmers. Jules had a very clear idea when he started writing Tracktion of how he wanted it to work, and what he believed was broken in other music software packages. He intentionally violated a number of design conventions which are ubiquitious in music software (such as using tons of windows for different functions, and mimicking the user interfaces of historically popular audio interfaces). Instead Jules asked the question, “If you forget the last 100 years of music studio history, what is actually the most efficient and logical way to construct music on a computer?” Jules then set about answering that question by making some great music software.

Tracktion has since been picked up by the smart folks at Mackie, and it is up to version 2.0. It costs a bit more than it used to, but it is still dirt cheap (at $150) considering what you get (and what you don’t get). I worry that when it gets up to 6 or 7.0, it will start to lose some of its former elegance, due to feature creep, but right now, it is still a thing of beauty.

One Response to “Text to Song”

  1. semiquaver Says:

    Wow. This is a terrific project – straight out of Borges.

    Please, if you have a mailing list, I would *love* to stay informed of your progress!

    I’m a composer, early music fan, computer musician, clavichord player, Tracktion lover etc etc. I have written a couple of traditional operas and some music for Winnie the Pooh played on a couple of Tracy Chapman records and this and that. Text setting is my preoccupation. I’ve written an interactive piece using Max that displays lyrics as a melody is played on a keyboard : text and melody come together in the audience member’s mind… But I have longed for a decent computer vocalisation program… (I tried ages ago to use Ircam’s Chant but found it difficult to control….)

    Also love your sunset project – great stuff. Cheers! – Michael Webster