Saturday, January 8, 2011

Rhythmbox playlist editing with the magic of command-line diff

As a near full-time Linux desktop user, I haven't found anything I like better for music than GNOME's Rhythmbox. The UI is a little funky sometimes, but all I really want from a music player is the ability to find all the songs in my library and make playlists.

One of the problems I run into sometimes relates to my fanatical music ripping. I extract all the audio from my CDs in both FLAC, for high quality, and MP3 formats. I only put the FLAC version into the main music library, the MP3 copies are strictly for copying over to my portable player (Sansa Fuse, which also works fine with Linux with no special software). Every now and then I make the mistake of adding the directory that contains the MP3 files to my music library, and then I'm screwed. There's now two copies of every song, and weeding them out is a giant mess.

When I did this again recently, decided to just wipe my whole library out and start over. I added most of the same songs back in again. One ugly surprise though: all of my playlists were deleted! Now that I know what not to do here, I wanted to share that info.

One of the things I like about Rhythmbox is that all its metadata is stored in simple XML files, so I've recovered from errors like this before. Depending on what version you're running, below your home directory should be .local/share/rhythmbox/playlists.xml or its older variation, .gnome2/rhythmbox/playlists.xml

Since I'm paranoid, I made a backup of this file and the music library before I touched anything, so I had the original playlist file with all the songs for reference. Apparently what happens here is that when you exit Rhythmbox, it removes any file in a playlist that isn't in the library anymore. So the procedure I had to go through went like this:

  1. Restore the original big playlist file
  2. Add the directories I think it was missing to the library, then exit
  3. Compare the original playlist file with the new one, to see what files are missing.
  4. Repeat until no files are missing.

In command line form, that looked like this:
cp $HOME/backup/playlists.xml $HOME/.local/share/rhythmbox/playlists.xml 
rhythmbox
kdiff3 /home/gsmith/personal/music/playlists.xml /home/gsmith/.local/share/rhythmbox/playlists.xml &

After a few rounds of that, I got to where the difference between the original and new playlists was down to only three files. The common thing about these files is that they had punctunation characters in them: comma and ampersand, AKA ",&". For reasons I haven't fully figured out, when I added those files back to the library, the format it saved those names in was escaped slightly differently. So they were in the music library, but didn't match the playlist perfectly, and thus deleted at every exit.

To fix this, I manually copied those files from the library back into the playlist again. Then I exited the program and figured out what order they used to be in like this:
diff -c $HOME/backup/playlists.xml $HOME/local/share/rhythmbox/playlists.xml

This form of context diff makes it straightforward to see what lines the song originally appeared in. I tweaked the file using a regular editor (vi worked fine) until the differences were all adjacent lines, so that the new file names were directly replacing the original ones, comfirming the edits with that same diff again. Save that, and finally my original playlists are back in the order I liked them in.

Now that I realize how easy it is to lose playlist entries, I've now added playlists.xml to the list of files I keep under version control. One last twist here to be aware of. Normally, the way I do that is put the file into my personal git directory, then symbolically link the original location to it. The version of Rhythmbox I have here does not respect this at all. When exiting and saving, it silently overwrote the configuration file with

This "wipe out everything I don't like when exiting" behavior from Rhythmbox is rather immature, given it's a program that could be running with a music library mounted over intermittent network storage. And not following symlinks is just absurd. But, at least with plain text, readable XML files, I can use standard UNIX tools to fix all those limitations. It's still far less stupid to recover from than what happens, say, when you screw up your iTunes library after running into the same sort of limitations, like bad behavior with intermittent network mounts. Binary configuration files suck.