Finding a Pop Song's Features in Latent Space

Production notes · Image of Itself (自身の画像) feat. Lem V4Bi · June 2026

Back

I spent June composing and producing a song called Image of Itself. I decided it's worth writing about my process before I actually release the song itself.

My songwriting usually begins with the usual — thinking up a fitting chord progression, writing a melody over them, adding harmonies. Where I diverge from the norm is in my sound design and arrangement style. Everything in my songs is unapologetically synthetic. All the instruments, including the vocal, are computer-generated. I tune by hand the virtual singer — an entity based on my voice — making for a vocaloid-like performance. After programming all the notes of the lead synths, I draw with my digital tablet the midi pitch bend curves in my DAW.

For this song in particular, a song whose lyrics revolve around a non-human entity's pursuit of a ground truth, a meaning, a value, I decided to go with a form of embodiment in my practice beyond my usual area of expertise. Instead of micromanaging the outputs, I went with chaos. I generated sound effects on Lyra-8, an analogue synthesizer whose entire signal chain is feeding back on itself as one plays it, making it impossible to predict its behavior in advance. Aside from using a voice rendered via a vocoder based on machine-learning, I took it upon myself to find the one shot percussion samples and loops within the latent space of Stable Audio 3.0, an AI model trained on Creative Commons audio. The composition also includes synth strings, a track programmed entirely by Claude Opus 4.8. Typing prompts and auditioning results proved at once more difficult than programming sounds in a synthesizer and less time consuming than doing so — and satisfying precisely because it meant giving up some control in places where that felt aligned with the overall conceptual goal.

The Japanese translation of an original English text was also made with AI. The model often resorted to ungrammatical structures that even I, a beginner learner of Japanese, noticed were off — I used a dictionary quite a lot, never fully knowing if the lyrics are what I think they are. That uncertainty is part of the appeal. I am trying to become someone who isn't zeroed in on the details, who can find value in uncertainty rather than be paralyzed by it — someone who can look forward to a tomorrow.

AI is a technology which won't be uninvented. As a neurodivergent person, I approximate a sort of “normal” against which I often feel broken — and in a way, so does an AI. Using these tools to make this song, a song whose subject is itself a non-human entity searching for meaning, is my way of eschewing that normal and embracing my identity. My concern is not with AI itself but with who owns it: the current way artists are typically dealing with it — boycotting — is going to end up making it so the companies that make these things will ultimately win. We must organize as artists to create open-weight models. I hope we manage, some day, to remove corporate ownership from the equation.

Accompanying this text is an image generated by BLENDFORM, a small browser application written by AI on my instruction (vibecoded completely hands-off). It takes an audio file and produces from it a deterministic visual — the same file always yields the same image. The audio's frequency spectrum and loudness envelope drive the structure; gradient layers are composited using bitwise operations — XOR, NOR, NAND. The result is not a waveform or a spectrogram but something stranger: a fixed image that belongs to the song, derived entirely from what the song is. In this sense, Image of Itself is, literally, an image of itself — generated by a non-human process, about a non-human process, using a tool made by a non-human process.

Drop in your own audio and watch it become a one-of-a-kind image:

Try BLENDFORM →

More to explore: Lem Voicebank Showcase · About Lem · All blogs