Riffusion was a side project created by Hayk Martiros and Seth Forsgren that allowed users to create music using pictures of audio rather than actual audio. It was published over a year ago. Although it seems paradoxical (no pun intended), my colleague Devin Coldewey provided the details here.
Riffusion garnered Forsgren and Martiros a lot of attention despite their method’s drawbacks. Considering the intrigue (and controversy) surrounding AI-generated music technology, this is not entirely unexpected. According to Forsgren, millions of individuals have used Riffusion, and the site has been mentioned in research papers from major tech firms, including Meta, Google, and TikTok’s parent company ByteDance.
Investors appeared to have paid some notice as well. Riffusion, which is now being advised by the musical group The Chainsmokers and has closed a $4 million seed round headed by Greycroft with participation from South Park Commons and Sky9, was commercialized by Forsgren and Martiros this year.
Riffusion is also releasing a new, free-to-use software called Riffusion; it is an updated version of the app from the previous year and allows users to define lyrics and a musical style to create “riffs” that can be shared with friends or the public.
According to Forsgren, “[the new Riffusion] empowers anyone to create original music via short, shareable audio clips,” she said in an email interview with TechCrunch. Users only need to provide the words and the musical genre, and our algorithm quickly creates riffs that include singing and unique artwork. Riffs are a new method of expression and communication that significantly lowers the barrier to music creation, from motivating artists to greeting your mom with “Good morning!”
When they were undergraduates at Princeton, Matiros and Forsgren became friends and have been performing music in a side project for the past ten years. While Matiros joined drone firm Skydio as one of its first workers, Forsgren had previously built two venture-backed technology businesses, Hardline and Yodel.
Forsgren claims that the potential for generative AI tools to foster human connection via creativity motivated him and Matiros to develop Riffusion.
“The pandemic gave us all a lot more time at home—and led me to learn how to play the piano,” Forsgren added. “Music has a powerful ability to bring us together when we’re feeling alone. Riffusion seeks to use generative AI, a field that is young and undergoing rapid change, to develop a joyful new instrument that will enable everyone to produce music throughout their life actively.
The six-person Riffusion team, which includes Forsgren and Matiros, developed an audio model from scratch to power the improved Riffusion. The new model is focused on spectrograms, visual representations of audio that display the amplitude of various frequencies across time. This is similar to the approach used to create the original Riffusion.
Forsgren and Martiros created spectrograms of musical recordings and labeled the resulting visuals with the appropriate labels, such as “jazz piano,” “blues guitar,” and so on. The model was “taught” by feeding it this collection of sounds, what specific sounds “look like” and how it would recreate or mix them given a text cue (for example, “lo-fi beat for the holidays,” “mambo but from Kenya,” “a folksy blues song from the Mississippi Delta,” etc.).
According to Forsgren, users may direct the model to produce particular outputs by describing musical aspects in normal language or recording their voices. “We believe the product will enable audio engineers and music producers to explore fresh concepts and find inspiration in new ways.”
An example of a voice that was recorded using Riffusion’s voice-recording feature with the prompt “punk rock anthem, male vocals, energetic guitar and drums” is shown below:
Homemade tunes that employ generative AI to create recognizable sounds that may be mistaken for legitimate, or at least near enough, have been gaining popularity. The company that represents Travis Scott was outraged when a Discord group focused on generative audio recently published an entire album utilizing an AI-generated version of his voice.
With worries over intellectual property, music labels have quickly reported AI-generated recordings to streaming partners like Spotify and SoundCloud. In most cases, they have prevailed. However, it is still unclear if “deepfake” music infringes on the copyright of musicians, record companies, and other rights holders.
Forsgren was eager to point out that the updated Riffusion couldn’t imitate well-known songs or artist names since it wasn’t taught.
“The product isn’t built to produce deepfakes and doesn’t recognize famous artist names in its prompts,” he claimed. Instead, it allows users to utilize the app to create personalized messages and intriguing hooks. It’s not unusual for a song you write to get stuck in your mind and cause you to sing along all day.
There isn’t currently a definite monetization plan in place. For the time being, according to Forsgren and Martiros, their main priorities are expanding the Riffusion team and creating complementing new generative AI technologies.
However, Forsgren also alluded to collaborating more closely with musicians like The Chainsmokers to see how technology may be incorporated into their production methods.
“Generative music is still in its infancy. “Exciting tools in the space include models like Google’s MusicLM, Facebook’s MusicGen, and Stability’s Stable Audio,” Forsgren noted. But Riffusion stands out as one of the first to provide consumers with a fun and user-friendly website via which they could make lyrics in their music.