Business, Numbers, Money, People
The long strange relationship between electronic music and business machines.
As a budding music nerd, I got pulled into the world of electronic music after discovering Scream Tracker in high school. I had been recording my various bands with a 4-track but wanted to try some more experimental directions. My summer job at the time was helping organize some logistical workbooks for the neurology department where my dad worked. It’s funny to look back at Scream Tracker and 90s-era Microsoft Excel next to each other. For me, computer music has always been doing clerical data entry into a spreadsheet.
Having grown up in the shadow of the business machine, electronic music has always had this bizarre bureaucratic dressing. Ironically, the dominant mode of transmitting creative ideas into computer music is the keyboard, and it’s not clear from context to which homonym I’m referring. Do I mean the one with letters or the one with white and black keys? Why not both?
Look at drum machines. We took our most visceral instruments—percussion you physically pummel to summon massive grooves—and put them in a box that looks like an answering machine.
And yet these boxes can move a club in ways a drummer never could.
Mechanical business machines could conjure the most intense human emotions. Our spreadsheet samplers and a single drum loop created an alien genre that still gets people moving today.
But it always seemed like these music tools were a technological crutch and that the real future of music would be bringing the sonic versatility of electronic music back to the old physical reality of banging on things to get people to dance. Wouldn’t people soon tire of going to shows to watch people turn knobs?
Ostensibly, I was admitted to graduate school to work on AI for music. I guess we didn’t call it AI at the time, because AI was a dirty word in the year 2000 (footnote It still should be a dirty word, damn it!). My lab had developed what my advisor, Neil Gershenfeld, called a “Digital Stradivarius.” Neil figured that as long as you could sample the sonic timbre of a genuine Stradivarius, you could use machine learning to synthesize the nuances of playing a real one. Bernd Schoner’s PhD used expectation maximization and some clever hardware to make a pretty nice-sounding digital cello. He called it a “marching cello” as it was far less bulky than its analog counterpart:
But as software improved, I quickly became disenchanted with musical hardware. More and more devices could be replaced by your laptop. By the early 2000s, I could, on the same computer, record and mix a rock band, make upbeat techno, and create live ambient music. Though die-hards hold onto their room-sized modular synthesizers, we crossed the laptop singularity decades ago.
Fast forward to 2024, the most dominant tool for electronic music production and beyond is the software Ableton Live. I have been a committed Live user since the beta tests in 2001. Though it initially just played looping samples, it now has impressive synthesis engines, support for programming your own plug-ins, and a built-in search engine for managing your library of sounds.
25 years later, we don’t have a better marching cello, but we have an amazing Stradivarius sample pack. It turned out that samples were all you need. You didn’t need machine learning. And instead of a novel device for bowing and nuanced articulation, you still input commands like you did in 1999. Enter the data with a keyboard and mouse into a spreadsheet:
But if the future of music is the same as the future of the spreadsheet, what does that say about the future of AI and music? Maybe that means we should look at which areas LLMs are supposedly going to revolutionize and think about how to directly map those applications onto computer music.
As you, my dear readers, know, I don’t think LLMs are going to walk, talk, see, write, reproduce themselves, or be conscious of their existence. But I also don’t think they are useless. In particular, the most compelling and impressive application of large language models thus far has been code generation. The sorts of projects graduate students can build with GPT4 are beyond impressive. What would the analog be in music? As I wrote on Monday and hint at today, the most challenging part of modern digital audio is making sense of the infinite collection of synthesizer presets and sample packs.
At its most abstract level, could we create a prompt system for synthesizers? People can make incredible music with the graphical programming language MaxMSP. Can we simplify creating MaxMSP patches with GPT4? MaxMSP is just code, after all! This isn’t that far away from what I can tell. GPT4 knows what pd is (it’s the open source version of MaxMSP), and it can sort of help you get started building patches. I asked it to create a pd patch of a simple additive synthesizer, and it gave me a long spiel and “a simplified textual diagram of what the patch might look like.”
Tightening this up seems pretty straightforward, doesn't it?
Moving from code to information retrieval, could we build better search tools for the vast collection of sounds we now have access to? Ableton Live ships with gigabytes of presets, and finding the one you have in mind can take hours. Can we build a chatbot that helps us navigate the infinite sea of sounds? As it exists now, the search capabilities in software like Ableton Live only index the keywords in the names of the presets. What if they could get access to more metadata? Perhaps search could then return a list of instruments and plugins associated with a natural language prompt. More ambitiously, what about sound search? Could we take a sample of music and suggest sample packs that might get you close to emulating the style? Bringing richer data to this search could open up endless creative possibilities.
Another idea that’s probably within our reach is converting existing songs into playable instruments. Ableton Live already has an “audio to midi” converter that takes sounds and converts the melody and rhythm into primitive computer music code. Could better AI tooling take a recording and produce a sampler instrument that recreates something that approximates the input’s nuance? Maybe we could upload a band’s recording, extract melodies and rhythms of the different players, and give continuous control parameters that the producer could manipulate. Where are we with source separation technology?
The issue with my wishlist is music remains a low-margin business. Listening, playing, and dancing to music are fundamental to our lives, but you’re not going to get a big VC check to innovate them. So a last question: Can you build the sorts of projects I’m asking about using cheap, open-source models? My guess is you can! And someone really should.