Karaoke, using Bash Scripts


Today is Christmas. As a programmer, I still kind of feel like coding.

Lying in bed, I started thinking about Karaoke. Because of COVID-19, there's no place to go for Karaoke. I'm from China, where we Karaoke all the time. However, the main-stream Karaoke app from China also got banned lately (from U.S), due to copyright issues. Karaoke isn't really the most modern technology. I was thinking – with the plethora of tools we have today, can I hack something up by myself?

I spent roughly one hour on some bash scripting. Now I have a CLI Karaoke machine on my Ubuntu.

The CLI User Interface, for Karaoke…

First, a high-level demo of the script. By typing the command below, (using a Youtube URL), a mplayer window will pop up.

karaoke.sh 'https://www.youtube.com/watch?v=FUorCLHAi5Y&ab_channel=bingjiehuang'

We may then sing in front of the microphone. Hit ENTER when complete. Then, after a couple of seconds of post-processing, we get an MP4 video. The video blends our voice into the background music. In case we are not happy with the volume/delay in the audio, we can tweak some parameters and re-generate the mp4 file.

Admittedly, CLI for Karaoke is not the perfect UI. But for a programmer, it is good enough: I can search for a link on Youtube, and then manage all my "artwork" locally :D


First, the main script, karaoke.sh


set -e

if [[ $# -ne 1 ]]; then
  cat <<EOF

  kareoke.sh video_file.mp4


  kareoke.sh https://youtube_url

It will start playing that video file, and also record the audio,
generate a new new video file that overlays the audios.
  exit 1

if [[ $1 =~ ^http.* ]]; then
  rm -f kareoke.mp4
  youtube-dl "$1" -o kareoke.mp4 -f 18 --no-continue

music_basename="$(basename "$video_path")"
output_basename="$(dirname "$video_path")/${music_basename%.*}"


fifo_file="$(mktemp -u)"
mkfifo "$fifo_file"

echo "Start playing $video_path. Hit ENTER to end recording." >&2
mplayer "$video_path" &

arecord -f cd -t raw >"$fifo_file" &
lame -r - "$output_mp3" <"$fifo_file" &

read -r -p "Press enter to finish recording."
kill -SIGINT "$record_proc"
kill -SIGINT "$mplayer_proc"
wait "$lame_proc"
rm "$fifo_file"
echo "Recording finished" >&2

mix_audios.sh "$video_path" "$output_mp3" "$output_mp4"
echo "Mixture video: $output_mp4. If unsatisfied, tune it by running: mix_audios.sh $video_path $output_mp3 $output_mp4" >&2

Then the audio merging utilities: mix_audios.sh


set -e

if [[ $# -ne 3 ]]; then
  cat <<EOF

  mix_audios.sh bgm.mp4 audio.mp3 mixed.mp4

  This will overlay the recorded audio onto the background mp4,
  resulting in a mixed.mp4 file.

  exit 1

temp_dir="$(mktemp -d)"

ffmpeg -i "$1" "$temp_dir/bgm.mp3"
sox -v 0.2 "$temp_dir/bgm.mp3" "$temp_dir/bgm_quieter.mp3"
sox "$2" "$temp_dir/vocal.mp3" reverb trim 0.2
ffmpeg -y -i "$temp_dir/bgm_quieter.mp3" -i "$temp_dir/vocal.mp3" -filter_complex amix=inputs=2:duration=longest "$temp_dir/mixed.mp3"
ffmpeg -y -i "$1" -i "$temp_dir/mixed.mp3" -c:v copy -map 0:v:0 -map 1:a:0 "$3"

rm -Rf "$temp_dir"

Make sure to chmod +x for these two scripts, and putting them into one of the directories that $PATH has access.


The core of the program is to manage two processes:

  1. Start a mplayer process that plays the Youtube video

  2. At the same time start recording.

When the user hits enter, we will kill the two processes, and start the audio post-processing. The program is pretty self-explanatory, which requires several common Linux tools (available via apt-get):

  1. ffmpeg: video/audio merging

  2. lame: mp3/raw audio conversion

  3. sox: Post-processing for audios, to adjust delay and volume.

  4. mplayer: playing the audio

  5. youtube-dl: downloading Youtube video to local disk. The official Ubuntu version might be outdated – you'll probably want to pip install one locally.

I honestly don't think I'm using the best tools and command-line options for this script, but they seem to work well together. One option I personally found to be quite cool is the reverb option in sox, which adds the Karaoke sound effect into audios. Only after enabling it, the result starts to get a Karaoke feeling as if I'm singing in a studio. Two samples attached, as a simple comparison.

  1. Before reverb: mp3

  2. After reverb: mp3