Day Of Week Bias In Phish Setlists

30 Apr 2020 -

Why isn’t it Friday today? Well it’s never Friday when that song is played, and it’s always Tuesday when Tuesday is played. But, are there other interesting patterns in Phish song selection for different days of the week? I decided to make my first mini project in the Phish Stats journey this one because I thought it would be complicated enough to test my skills and straightforward enough to build a little confidence. I don’t think it will appeal to fans as much as some other projects I’m thinking about, but I’m glad I started with it.

Just the Jams

Equate my life with Sand

Set 1

I’m using the Phish.in database mainly because it has song duration in it (which I don’t need for this project but will for future projects) and is easier to access than the phish.net one. I knew there were some quirks in the data because it is scraped from AUD files and not one central setlist database. But, looking at the distinct track titles, it looks pretty minimal and not worth spending time on cleaning, yet.

I assumed there were some changes in the distribution of shows across a week throughout Phishs’ career so I looked into that. The main thing I saw was that Phish stopped playing on Mondays and Thursdays in 3.0. Not completely, but before 2009, about 10% of shows were on Mondays and after it has only been 4%, and for Thursdays the frequency dropped from 15% to 7%. This difference was mostly moved to the weekends where there was an increased percentage of shows for Friday - Saturday. These differences seem significant enough to find a solution for. When I did an initial look at overall day of week bias, I saw that some of the most frequent biased songs were those that were either new for 3.0 or hadn’t been played much in 3.0. So, likely that change in show frequency contributes a bit of noise to the data. When I refine the data, I’ll try to create a day of week show frequency for each song based on debut date to try to account for this change in frequency.

I also made an arbitrary decision to only use songs that have been played more than 100 times[1]. I figure it will be more interesting for others to see songs that are more common and reducing to songs with small sample sizes would create more chaos in the data that may make it feel less interesting. Plus, this is a band that has over 100 songs that they’ve played 100 times, how cool is that?

Setbreak

One thing I noticed after going through my first mock up of the data was that it stopped at the Mexico 2019 run, thus about 35 of the most recent shows were missing. Probably not enough to make a huge difference in the data but some of the margins are pretty tiny so it’s worth going through the process again to get updated numbers. Plus, I think this will push me to try to make the process a little cleaner than last time. Also, TMWSIY and HYHU came up a few times and I realized that this was because it was counted twice for every show. Thus, I’ll need to account for that in the next round.

I’m sure I could figure out how to do the math calculations in sql that I’ll need; but, it feels like there are enough steps that it would get cumbersome there. So, I decided to get the raw numbers I need from sql and use google sheets to do the calculations. I’ve done so many of these types of calculations in google sheets before, so it’ll definitely be faster.

Set 2

I wanted to create a way to compare each song’s day of week distribution against the day of week distribution for all shows during the history of that specific song, and not against the day of week distribution of all shows Phish ever played. First thing I did was figure out that the show_ids weren’t in chronological order. I need a new id for each show which is chronological order for this project to create show distributions based on debut date, and also for several future projects. It turned out that the easiest thing was to create a new table with a new column that was just the row number after the shows table was sorted by date. This gives me the debut date for every song in a table.

My goal was to create a table that showed the totals of shows played on each day of the week after the debut date for each song. I was hoping this would erase some of the era bias that I saw in the first round. (Only for the 3.0 songs.) But, I couldn’t figure out how to do it and decided on a simpler solution.[2] I decided to compare the 1.0 and 3.0 songs against the day of week show distribution from just that era instead of over Phish’s whole career. 3.0 songs were pretty obvious, but I ended up adding 46 days and Sand to the list because they were played so much more often in 3.0 than any other period. Any songs where 90% of their plays were in 1.0 got the 1.0 song tag.

The only other data cleanup I did was removing The Man Who Stepped Into Yesterday from the data set since it was played over 100 times but was counted twice in every setlist (almost), so really it was played less than 100 times. I could have done the same with Hold Your Head Up but that has over 200 tracks and just doubling every number shouldn’t really have an impact on this type of distribution.

Encore

There are a lot of messy track names with either missing capitalization or segues of several songs making it so they don’t get counted. While these differences are likely significant due to the tiny margins between some of the ranks, I’m not going to prioritize going through them all and breaking them up. And, I’m sure, at this point, I’d make some mistakes in updating the database. It also seems like there shouldn’t be any specific reason why this would happen on one day of the week more than others so maybe it all evens out. If the data ever gets cleaned up in the future, by me or someone else, I can go back and update all the lists pretty easily.

Another potentially interesting project could be to look at how song lengths are distributed across different weekdays. Do we never miss a Sunday show because they play extra Back on the Trains and fewer Mike’s Grooves or is it because it is more likely to have a monster jam or two?

Notes

[1] It would be pretty easy to recreate this data for songs played less than 100 times. I could do that if there was enough interest.
[2] If it’s possible to do this kind of thing in sql, which I’m guessing it is, I’d love some tips on how to do it.

Mystified and Mishapen

What do you think? Do you have any guesses? I could only really guess one song that I thought might show up more on a certain day when I started. But it was cool to put the answers into context after seeing how it all shook out. I hope you enjoy doing that same.

Top songs by day of the week

Free! Stolen, stolen, stolen
By p-value By Total By p-value By Total
Sunday Back on the Train (.001) BoTT + Tweezer Reprise (13) Sand (.008) Mike’s Song (-19)
Monday Fire (.0006) Hold Your Head Up and Poor Heart (11) Light (.016) Harry Hood (-11)
Tuesday Weekapaug Groove (.022) Weekapaug Groove (15) The Landlady (.005) Tweezer (-17)
Wednesday Sample in a Jar (.010) Sample in a Jar (14) Dinner and a Movie (.002) Dinner and a Movie (-11)
Thursday Uncle Pen (.0008) Uncle Pen (17) Light(< .00001) Slave to the Traffic Light (-14)
Friday Sand (.0004) Sand (16) Tweezer Reprise (.033) Tweezer Reprise (-15)
Saturday Lawn Boy(.001) Harry Hood (25) I Didn’t Know (.003) I Didn’t Know (-18)

A note about what the p-value means - So imagine that the band is so focused on their setlists being random that they get together every Monday morning and pick out all the songs they want to play for the week. Then they throw them all in a hat for the shows of the week and keep picking out songs during each show to decide what to play next. Now, let’s say they have always done that. And there are 1000 other universes where they do the same. The p-value shows how many of those 1000 universes will have 61 (or more) Lawn Boys. And, since the universe is a donut, we don’t actually have to assume. There are at least 1000 other universes where Phish is doing just this. \

Most and least biased songs in order

Most Bias Most Consistent
1. Sand 1. Scent of a Mule
2. Light 2. Rocky Top
3. Backwards Down the Number Line 3. Rift
4. 46 Days 4. Guelah Papyrus
5. Tweezer Reprise 5. NICU
6. Carini 6. Reba
7. Fee 7. Big Black Furry Creatures from Mars
8. Back on the Train 8. The Moma Dance
9. Dinner and a Movie 9. Sparkle
10. Lawn Boy 10. Runaway Jim

Setlist Notebook

Do you have a friend that always calls the same song on Saturday? “Bro, it’s Saturday, we’re totally getting a 2001 today.” Does it annoy you that just because they played it at like three Saturday shows you went to together doesn’t mean there is a pattern? Well, you can look it up in the spreadsheet here.

Secret Set

Bonus question: What’s the most played song that has never been played on each given day of the week?

Sunday: Bike (25 times played)
Monday: Halfway to the Moon (43)
Tuesday: Life on Mars? (23)
Wednesday: Donna Lee (20)
Thursday: Golden Age (53)
Friday: Terrapin (27)
Saturday: Bittersweet Motel (18)