Today is Christmas. As a programmer, I still kind of feel like coding.
Lying in bed, I started thinking about Karaoke. Because of COVID-19, there's no place to go for Karaoke. I'm from China, where we Karaoke all the time. However, the main-stream Karaoke app from China also got banned lately (from U.S), due to copyright issues. Karaoke isn't really the most modern technology. I was thinking – with the plethora of tools we have today, can I hack something up by myself?
I spent roughly one hour on some bash scripting. Now I have a CLI Karaoke machine on my Ubuntu.
The CLI User Interface, for Karaoke…
First, a high-level demo of the script. By typing the command below,
(using a Youtube URL), a mplayer
window will pop up.
karaoke.sh 'https://www.youtube.com/watch?v=FUorCLHAi5Y&ab_channel=bingjiehuang'
We may then sing in front of the microphone. Hit ENTER when complete. Then, after a couple of seconds of post-processing, we get an MP4 video. The video blends our voice into the background music. In case we are not happy with the volume/delay in the audio, we can tweak some parameters and re-generate the mp4 file.
Admittedly, CLI for Karaoke is not the perfect UI. But for a programmer, it is good enough: I can search for a link on Youtube, and then manage all my "artwork" locally :D
Code
First, the main script, karaoke.sh
#!/bin/bash
set -e
if [[ $# -ne 1 ]]; then
cat <<EOF
Usage:
kareoke.sh video_file.mp4
or
kareoke.sh https://youtube_url
It will start playing that video file, and also record the audio,
generate a new new video file that overlays the audios.
EOF
exit 1
fi
video_path="$1"
if [[ $1 =~ ^http.* ]]; then
rm -f kareoke.mp4
youtube-dl "$1" -o kareoke.mp4 -f 18 --no-continue
video_path="$PWD/kareoke.mp4"
fi
music_basename="$(basename "$video_path")"
output_basename="$(dirname "$video_path")/${music_basename%.*}"
output_mp3="$output_basename.mp3"
output_mp4="${output_basename}_mixed.mp4"
fifo_file="$(mktemp -u)"
mkfifo "$fifo_file"
echo "Start playing $video_path. Hit ENTER to end recording." >&2
mplayer "$video_path" &
mplayer_proc=$!
arecord -f cd -t raw >"$fifo_file" &
record_proc=$!
lame -r - "$output_mp3" <"$fifo_file" &
lame_proc=$!
read -r -p "Press enter to finish recording."
kill -SIGINT "$record_proc"
kill -SIGINT "$mplayer_proc"
wait "$lame_proc"
rm "$fifo_file"
echo "Recording finished" >&2
mix_audios.sh "$video_path" "$output_mp3" "$output_mp4"
echo "Mixture video: $output_mp4. If unsatisfied, tune it by running: mix_audios.sh $video_path $output_mp3 $output_mp4" >&2
Then the audio merging utilities: mix_audios.sh
#!/bin/bash
set -e
if [[ $# -ne 3 ]]; then
cat <<EOF
Usage:
mix_audios.sh bgm.mp4 audio.mp3 mixed.mp4
This will overlay the recorded audio onto the background mp4,
resulting in a mixed.mp4 file.
EOF
exit 1
fi
temp_dir="$(mktemp -d)"
ffmpeg -i "$1" "$temp_dir/bgm.mp3"
sox -v 0.2 "$temp_dir/bgm.mp3" "$temp_dir/bgm_quieter.mp3"
sox "$2" "$temp_dir/vocal.mp3" reverb trim 0.2
ffmpeg -y -i "$temp_dir/bgm_quieter.mp3" -i "$temp_dir/vocal.mp3" -filter_complex amix=inputs=2:duration=longest "$temp_dir/mixed.mp3"
ffmpeg -y -i "$1" -i "$temp_dir/mixed.mp3" -c:v copy -map 0:v:0 -map 1:a:0 "$3"
rm -Rf "$temp_dir"
Make sure to chmod +x
for these two scripts, and putting them into
one of the directories that $PATH has access
.
Explanations
The core of the program is to manage two processes:
-
Start a
mplayer
process that plays the Youtube video -
At the same time start recording.
When the user hits enter, we will kill the two processes, and start the audio post-processing. The program is pretty self-explanatory, which requires several common Linux tools (available via apt-get):
-
ffmpeg
: video/audio merging -
lame
: mp3/raw audio conversion -
sox
: Post-processing for audios, to adjust delay and volume. -
mplayer
: playing the audio -
youtube-dl
: downloading Youtube video to local disk. The official Ubuntu version might be outdated – you'll probably want to pip install one locally.
I honestly don't think I'm using the best tools and command-line
options for this script, but they seem to work well together. One
option I personally found to be quite cool is the reverb
option in
sox
, which adds the Karaoke sound effect into audios. Only after
enabling it, the result starts to get a Karaoke feeling as if I'm
singing in a studio. Two samples attached, as a simple comparison.