Billboard Hot 100 Analytics: Using Data to Understand The Shift in Popular Music in The Last 60 Years

What’s the most common thing you hear from “older” people about the popular modern music? The general theme is: “Your music is too loud and lacks content”. They talk about the “old” days with the meaningful songs, the soulful artistes, the deep bass guitars that can move you to tears. When they say that, they are comparing this:

Downtown by Petula Clark, 1965

To this:

Stir Fry by Migos, 2018

 

There’s a clear difference, obviously. However, this will be taking one data point to make a general conclusion (which humans are very good at). I, being a millennial and a Data Scientist, found this an interesting topic to poke at. Has what makes music “great” really changed that much? Has the sound, the lyrics and the “message” changed? And if they have changed, how exactly have they changed?

Using billboard’s Hot 100 charts from 1950 – 2015 and Spotify’s API, we want to take a closer look at how much popular music has changed in the past six decades and find out what really distinguishes the music of today from the rest.

My Approach

For this post, I define “great music” as making it into the Billboard’s Hot 100. I got the data from a generous GitHub user Keven Schaich. The data contains a lot of interesting features like Sentiment, Gunning fog index (which estimates the number of years of formal education needed to understand a text at first reading), Number of words, Number of repetitive words/phrases etc.

In addition, Spotify has an interesting API endpoint called get_audio_features. The endpoint allows you to get song features like loudness, Instrumentalness (how much instruments are used), energy, liveness (the presence of a live audience), Speechiness, song duration etc. This brings the total song features to about 30 for Billboard’s Hot 100 between 1950 and 2015.

All these features are explained here and here and I will also explain some as we progress in the post.

Initially, I set out to use Python for this project and I did. Kinda. I had my first iteration of data collection all done with Python’s pandas and a python package called spotipy.

Along the line, however, I reviewed my methodology and found a more interesting dataset. For this, I went back to R specifically because of the tidyr::gather() function (it’s so annoying pivoting data in pandas jeez).

Here’s the code in R and Python which are different in most ways except a function called get_audio_features. My final dataset can be found here.

The amount of time I spent on data gathering is in sharp contrast with my other projects because, unlike my other projects, someone took the time to put a ready-to-use dataset together. This is a major reason why I share all the data I gather so hopefully, someone out there won’t spend 6 weeks on trying to gather data.

Let’s begin!

1.   In the past sixty years, we have had only two major changes in music

By using an algorithm called clustering, we can find similarities/clusters of artistes and their music using their song features.

Using this approach, we have two clusters of artistes – The String Lovers and The Poetics. The reason we chose these weird names lies in the two song features that define these clusters best: Instrumentalness and Speechiness.

Instrumentalness predicts whether a track contains no vocals on a scale of 0 to 1. “Ooh” and “aah” sounds are treated as instrumentals as well. The closer the value is to 1, the more likely there is no vocal content (e.g. a soundtrack) and the closer it is to zero, the more vocal it is (e.g. rap or spoken word).

Speechiness detects the presence of spoken words in a track.

  • The String Lovers score high on Instrumentalness but low Speechiness. This means that artistes in this period tend to favor instruments as opposed to speech.
  • The Poetics are the direct opposite. They score pretty high in Speechiness but very low on Instrumentalness.

Figure 1

The other interesting thing about these clusters is when they appear on the Billboards Hot 100.

  • Most String Lovers appeared on Billboard before the 1990s.
  • Most Poetics appeared on Billboard after the 1990s.

Figure 2

  • The 90s itself seemed to be a pivotal time in music as we see with the ~50-50 split between String Lovers and Poetics. This meant that artistes were split between going with this new type of music or sticking to the existing sound.

2.   The use of instruments dropped mostly because rock bands became less popular

Between the late ’60s and the early 2000s, bands were so popular that there were as many bands as solo artistes.

Before the 2000s, the more bands there were in a year, the higher the average Instrumentalness in that year.

Figure 3

However, after the 90s, the number of bands had little or no effect on the use of instruments.

Figure 4

Except the two outliers, the number of bands had virtually no effect on the use of instruments.  This is interesting because, like I mentioned earlier, bands were still popular in the early 2000s.

So, what happened?

I’m sure you guessed it. The TYPE of bands changed.

Figure 5

Before the 90s, about 60% of bands were rock bands – the types typically with one lead singer and a bunch of instrumentalists.

However, from the 2000s to present day, the percentage of rock bands dropped significantly making way for a new brand of bands which were generally made up of ALL singers: Pop bands. Think Destiny’s Child, Pussycat Dolls, Fifth Harmony, One Direction – you name it!

3.   We might also owe the emergence of Poetics to the rise of Hip-Hop

Apart from the increase in Speechiness and use of words, Poetics use two-times more complex words (e.g. Jay-Z saying opulence instead of wealth) than String Lovers and use words with more syllables. One genre immediately pops into everyone’s mind when we think of word-bending artistes: Hip-Hop.

Figure 6

Seeing as Hip-Hop tops all other genres in word-related features, it comes as no surprise that Hip-Hop gained mainstream popularity in the 90s – corresponding to the rise of The Poetics.

Figure 6b.png

4.   While the style of music has changed a lot over time, popular songs for the past sixty years have been mostly about loving women

To arrive at this, I used an algorithm called topic modeling. As the name implies, the algorithm searches for topics in a given text.

In our case, the text are lyrics from billboard songs.

Let’s see how these topics change over the decades:

Figure 7

This is absolutely amazing!

Like the features of songs, song lyrics also fall clearly into two buckets with Topic 1 capturing ’50s to ’80s, Topic 2 capturing the decades after the ’90s and the ’90s as a transition period!

This means that the sound and “message” of songs changed at pretty much the same rate.

So, what are these topics?

Figure 8

The topics are almost the same thing! Top songs have disproportionately been, for the past sixty years, “Yeah, I love my baby”.

There’s also something interesting going on here. A major difference between both topics is that before the 90s, songs might have had a more “direct” approach – you can see that a major topic is “gonna” e.g. “I’m gonna love you”. While after the 90s, it seemed a bit more indirect, like asking for permission hence replacing “gonna” for “wanna”. “Wanna” could also depict a more futuristic, imaginative approach to loving women.

5.   The more “quiet” genres ceased to exist in the Poetic Era

This sort of confirms that we tend to prefer louder music now than before.

Figure 9

The five most “quiet” genres are – Jazz, Swing, Folk, Blues and Disco.

These genres also ceased to exist as popular music in the Poetic Era except Jazz which seemed to survive by one artiste (Norah Jones).

Figure 10

What do these all mean?

In summary:

  • The 90s was an extremely important time in music.
  • The decline of rock bands and the rise of Hip-Hop played a major role in steering music to where it is today.
  • Love is a popular theme across songs for the past six decades but the approach to love might differ across the different eras of music.
  • Yes, modern artistes may be louder but it’s BECAUSE we have content :).
  • Bonus Point: Michael Jackson, despite being most popular in the 80s, is a Poetic! He was ahead of his time!

Fun Stuff and Things to Keep in Mind

  • I took a different (and more fun) approach to showcasing the data for this project. I built a dashboard using HTML, CSS, js and chart.js! The app is not (yet) optimized for mobile so, it’s best to use it on a laptop.

Here’s the link: http://bit.ly/music-dashboard

    • The dashboard has two tabs. The first one “Artist Dashboard”, shows you the average song features for individual artistes.
  1.  Figure 11
    • The second tab “Comparison Dashboard” allows you to compare song features for up to three artistes and looks like the screenshot below.
  2. Figure 12
    • You can share the results on Twitter or Facebook using the icons at the top right.
    • Just in case you forget what the features mean, hover over the title and you’d get a little tool-tip explaining it 🙂
  • The Poetic era (as I like to call it) is an ongoing era so some of these insights may change if we had 2016 to 2018 data (especially with the rise of trap music). However, I don’t expect the effects to be much.
  • It would be interesting to measure how “politically-aware” a song is. I will probably post the outcome of that on Twitter.
  • As usual, I am constrained by data collection methods of the generous GitHub user, Spotify’s algorithm and how Billboard arrives at the Hot 100.

Hope you had as much fun reading this as I had creating this 🙂

Advertisements

10 thoughts on “Billboard Hot 100 Analytics: Using Data to Understand The Shift in Popular Music in The Last 60 Years

  1. David King says:

    Agree, interesting paper. FYI – There are a few of other earlier discussions papers dealing with related topics that provide additional ways to look at similar shifts. First is Mauch et al. “The evolution of popular music: USA 1960–2010” based on similar Billboard Data for 17K songs in conjunction with audio data (http://rsos.royalsocietypublishing.org/content/2/5/150081). Second, is a much shorter blog entry “The Evolution of Pop Lyrics and a Tale of Two LDA’s” by James Thompson that looks at the lyrics of the same 17K songs (see http://myinspirationinformation.com/visualisation/d3-js/the-evolution-of-pop-lyrics/). Finally, a while back I wrote a 3 part series (dataffiti.com) on the “Analysis of Rap Lyrics” using the weekly Billboard (BB) 15 Hot Rap Songs from 1980-2015 supplemented from lyrics from ChartLyrics.com and Genius.com. Finally, for those interested in larger data sets for analysis of popular music, there are various renditions of the LabRosa/ Echo Nest million song dataset.

    Like

  2. Myles says:

    “The String Lovers score high on Instrumentalness but low Speechiness. This means that artistes in this period tend to favor instruments as opposed to speech.”

    Um… singing?

    You make it sound as though it didn’t exist, when, for the Hot 100 from 1950 – 1990, it was the dominant factor, other than a very few exceptions.

    I’m also very curious as to what “rock” was in 1950.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s