Bilabials and Velars: The Phonetics of Beatboxing

A few weeks ago, I watched a video of beatboxing great and first female beatboxing world champion Butterscotch featured by Jason Kottke. Beatboxing has long fascinated me, possibly dating back to when I first heard Matisyahu beatbox on his original Live at Stubbs album. Butterscotch breaks down her own beatbox style, and it’s an absolute treat to watch.

Like a lot of SLPs, I end up thinking a lot in terms of phonetics, and found myself fascinated by Butterscotch’s clear descriptions of her own beatbox style. Here’s how I would describe each, from a phonetics perspective.

Level 1: Bass Drum

Butterscotch describes this as the boots aspect of her boots and cats warm up. This is most often a bilabial sound [b], and she uses variable amounts of plosive pressure in order to produce it. The louder/harder bass drum line, the stronger the plosive. What’s most interesting, though is when she references “forcing the air” to produce that sound. The [b] phoneme is a voiced stop, and voicing requires glottic closure (meaning that the vocal cords are adducted). Watching Butterscotch perform this slowly made me realize she pairs a glottal stop with the plosive, which allows her to control her airflow. This contributes to the greater rush of air, and the “break” between beats means that she immediately adducts her vocal cords. The longer she retains that glottic closure, the greater the buildup in pressure, and the more powerful the bass drum plosive.

Level 2: Snare

The snare sound is the cats, in this case the [k] phoneme. [k] is a voiceless velar stop, in which the back of the tongue contacts the velum. Given its voiceless nature, the vocal cords are abducted, allowing air flow for either inspiration or expiration (breathing in or breathing out). This will be important later. When producing this snare sound, Butterscotch keeps her jaw in a closed position, while the lips remain relaxed and open, different from the typical [k] English phoneme which relaxes the jaw to open the mouth wider, creating a more resonant sound.

Level 3: Hi Hat

The hi hat uses a [t] phoneme, with the tongue tip contacting the palate at the alveolar ridge. She retains the same jaw position noted with level 2’s snare, which is a slight variation of typical [t] production. Again, typical [t] production has a more relaxed jaw position to allow for greater resonance. I suspect there’s two reasons for this: (1) keeping the articulators closer together means there’s less distance for the tongue to travel between sounds, meaning that there’s less movement required of the tongue between positions, and (2) having a less open oral cavity means there’s less physical space for air to escape, which may help in a small way to further control airflow, and thus keep beatboxing for longer. The end of the hi hat (the “sprinkler” effect Butterscotch references) is a take the stop [t] and shifts it into a fricative [s] to release air and signify the end of that beat. The closed hi hat is a clear [t] and the open hi hat is the stop to fricative [ts].

Level 4: Basic Beat

The basic rhythm Butterscotch notes of bass-hi hat-snare-hi-hat-bass is one familiar to every SLP: it’s a variation of our classic “p-t-k” (puh-tuh-kuh) we like to use to assess motor coordination, with a hint of voicing at the beginning.

What’s also interesting is to watch Butterscotch’s breathing patterns while doing basic rhythms. On a base-snare-hi hat pattern, there’s an exhale for the base, sharp inhale for the snare, and another quick exhale for the hi hat. The [k] position of the tongue for the snare is changes the air flow, and the sound waves are pulled backwards into the mouth, which changes the resonance and makes it sound weaker to the listener’s ear. You’ll notice a clear difference when the [k] snare is produced on an exhale, which is normal for speech production. I suspect that almost all air exchange for beatboxers happens within the superior lobes of the lungs, so there’s times when they need to pause and take deep breaths, as there’s only so much reserve for the alveoli in the inferior lobes. Because of the frequent glottic closure, air flow is further limited, and the frequent change of inspiration and expiration is considerably faster than it is at rest and during normal speech.

Level 5: Changing the Snare

At this level, Butterscotch points to four new phonemes for an alternative to the [k] snare. She points to the [p, f, sh, s] as alternatives. She uses the blends of [p-sh] and [p-s] in different ways, which is a subtle shift of tongue movement to change the airflow from an affricate (a blend of a stop and a fricative) to a full fricative. Affricates usually start with the tongue in the same position as the fricative that immediately follows (think “t-sh” for the ‘ch’ in chair), but in this case, she uses a voiceless bilabial [p] immediately followed by the alveolar [sh]. The differences here all point to how close her lips and jaw are, which allows for quicker movement of the tongue between positions. Butterscotch also points out here what I noted from the beat above, how she uses changes in breathing patterns to control airflow and allow her to beatbox longer before needing a deeper breath.

Level 6: Two Sounds at Once

Here’s where things start to get interesting. Butterscotch points to creating two sounds at the same time, acting as two sounds on the same beat. SLP nerds likely hear the very quick initiation of the [b] plosive just before the release of air for the [t] hi hat, largely because if not, no air will even be flowing to initiate either of the two sounds. The two sounds on that first beat require greater expiration of air, so Butterscotch uses a louder [k] snare immediately following those two allows for extra inspiration of air before moving on. It makes a lot of sense, and is fascinating because not only does it make for an interesting beat, it’s fun to watch.

At this point, Butterscotch is ready to move beyond percussion style sounds and go deeper.

Level 7:  Adding a Bassline

Butterscotch demonstrates a bilabial lip placement, relaxes both the superior and inferior orbicularis oris (the upper and lower lip), and then forces air through them, effectively creating a bilabial fricative. If you watch closely at the first part of the demo, Butterscotch shows greater contraction of the lower lip, and the bulk of vibration happens on the upper lip. When starting a beat, she seems to start with the plosive before the fricative, and pairs this effectively a back vowel (probably the “o”, or [oʊ] for the IPA nerds among us). The pitch change is likely happening at the level of the vocal folds, where both the length and the intensity level of closure can affect the pitch. Remember when I mentioned above about glottic closure to prevent airflow before starting a beat? That’s where this shows up in force, literally.

Level 8: Humming

Humming adds a really interesting layer to this. The act of humming itself is a natural nasal sound. The soft palate, or velum, is relaxed, allowing airflow into the nasal passages. Humming requires glottic closure in order to vibrate vocal folds, and those vibrations resonate up the oropharynx and, because the lips are closed, the air then has to travel into the nasopharynx to be released. When Butterscotch adds percussive beats on top of the hum, if there truly is nasal airflow, that would mean that her velum isn’t fully contacting the pharyngeal wall, and there would be a combination of nasal and pharyngeal air flow. Obviously, a video like this won’t allow us to visualize, so we’ll have to make a couple of assumptions here: a combination of oral and nasal airflow would (1) reduce the loudness of the beats while (2) also reducing the loudness of the hum itself. This is because air would be traveling in two directions, so there would be less pressure for both, and thus, less loudness and resonance. Given that the hum sounds pretty consistent, I think it’s safe to guess that Butterscotch is able to relax her velum to allow for nasal airflow voluntarily, which is indeed a very challenging thing to do given that velar movement is largely automatic. Super cool.

Level 9: Adding Lyrics

I love it that the first word Butterscotch uses is banana. I also love that the words she pulls out all make use of the same tongue placement for all the beats themselves. Banana, beatbox, and pop tart all set her up for success on rhythm. The [k] for crunchy taco is less natural, probably more because of the need for a more open mouth to get a good [k] in speech than because of the [kr] blend. Also, given how she favors the [k] snare on an inhale, it makes it even tougher to elicit speech sounds that need to be produced on an exhale.

Level 10: Sound Effects and Instruments

I love the creativity of these sound effects. Lip bass, as described above, is a bilabial fricative. Butterscotch also demonstrates what she calls tongue bass, which seems to be kind of a reverse velar fricative, which to my ear sounds like an approximation of the [k] tongue placement and vibrated by use of an inhale rather than exhale (the lips round for this, which seems to allow for a way to draw the air in). The brow bass (I’m not sure I’m spelling that right) is a quick bass beat similar to the bass drum but which is cut off quickly by closing the lips; ”slizzer” roll seems to be the reverse of the tongue bass, so it’s that velar fricative sound on an exhale. Trumpet is a hum with partially open lips, so there’s definitely some nasal sounds. Scratching is a bit tougher to identify, but seems to involve very quick glottal stops in succession along with changing of oral structures to create variety. The instruments like the violin seem to be variants of hums, with perhaps a much tighter closure of the vocal folds, along with higher pitch, to prevent too much vibration and thus reduce the resonance we hear.

Level 11: Intricate Beats

This is basically our motor speech “puh-tuh-kuh” task on overdrive, and it is glorious. Her motor cortex is basically on fire.

Level 12: Adding a Real Instrument

Beatboxing meets jazz singing, yes please. The slizzer roll seems to make an appearance at the end, and this time seems like a controlled tongue flap at the end.

Level 13: Live Looping

A combination of all of the above, using a loop station, instruments, and beatboxing style to create a one of a kind song, every time it’s done. I love the range of influences evident in Butterscotch’s music, it’s one of my favorite things.