Billboard Hot 100 Analytics: Using Data to Understand The Shift in Popular Music in The Last 60 Years

What’s the most common thing you hear from “older” people about the popular modern music? The general theme is: “Your music is too loud and lacks content”. They talk about the “old” days with the meaningful songs, the soulful artistes, the deep bass guitars that can move you to tears. When they say that, they are comparing this:

Downtown by Petula Clark, 1965

To this:

Stir Fry by Migos, 2018

 

There’s a clear difference, obviously. However, this will be taking one data point to make a general conclusion (which humans are very good at). I, being a millennial and a Data Scientist, found this an interesting topic to poke at. Has what makes music “great” really changed that much? Has the sound, the lyrics and the “message” changed? And if they have changed, how exactly have they changed?

Using billboard’s Hot 100 charts from 1950 – 2015 and Spotify’s API, we want to take a closer look at how much popular music has changed in the past six decades and find out what really distinguishes the music of today from the rest.

My Approach

For this post, I define “great music” as making it into the Billboard’s Hot 100. I got the data from a generous GitHub user Keven Schaich. The data contains a lot of interesting features like Sentiment, Gunning fog index (which estimates the number of years of formal education needed to understand a text at first reading), Number of words, Number of repetitive words/phrases etc.

In addition, Spotify has an interesting API endpoint called get_audio_features. The endpoint allows you to get song features like loudness, Instrumentalness (how much instruments are used), energy, liveness (the presence of a live audience), Speechiness, song duration etc. This brings the total song features to about 30 for Billboard’s Hot 100 between 1950 and 2015.

All these features are explained here and here and I will also explain some as we progress in the post.

Initially, I set out to use Python for this project and I did. Kinda. I had my first iteration of data collection all done with Python’s pandas and a python package called spotipy.

Along the line, however, I reviewed my methodology and found a more interesting dataset. For this, I went back to R specifically because of the tidyr::gather() function (it’s so annoying pivoting data in pandas jeez).

Here’s the code in R and Python which are different in most ways except a function called get_audio_features. My final dataset can be found here.

The amount of time I spent on data gathering is in sharp contrast with my other projects because, unlike my other projects, someone took the time to put a ready-to-use dataset together. This is a major reason why I share all the data I gather so hopefully, someone out there won’t spend 6 weeks on trying to gather data.

Let’s begin!

1.   In the past sixty years, we have had only two major changes in music

By using an algorithm called clustering, we can find similarities/clusters of artistes and their music using their song features.

Using this approach, we have two clusters of artistes – The String Lovers and The Poetics. The reason we chose these weird names lies in the two song features that define these clusters best: Instrumentalness and Speechiness.

Instrumentalness predicts whether a track contains no vocals on a scale of 0 to 1. “Ooh” and “aah” sounds are treated as instrumentals as well. The closer the value is to 1, the more likely there is no vocal content (e.g. a soundtrack) and the closer it is to zero, the more vocal it is (e.g. rap or spoken word).

Speechiness detects the presence of spoken words in a track.

  • The String Lovers score high on Instrumentalness but low Speechiness. This means that artistes in this period tend to favor instruments as opposed to speech.
  • The Poetics are the direct opposite. They score pretty high in Speechiness but very low on Instrumentalness.

Figure 1

The other interesting thing about these clusters is when they appear on the Billboards Hot 100.

  • Most String Lovers appeared on Billboard before the 1990s.
  • Most Poetics appeared on Billboard after the 1990s.

Figure 2

  • The 90s itself seemed to be a pivotal time in music as we see with the ~50-50 split between String Lovers and Poetics. This meant that artistes were split between going with this new type of music or sticking to the existing sound.

2.   The use of instruments dropped mostly because rock bands became less popular

Between the late ’60s and the early 2000s, bands were so popular that there were as many bands as solo artistes.

Before the 2000s, the more bands there were in a year, the higher the average Instrumentalness in that year.

Figure 3

However, after the 90s, the number of bands had little or no effect on the use of instruments.

Figure 4

Except the two outliers, the number of bands had virtually no effect on the use of instruments.  This is interesting because, like I mentioned earlier, bands were still popular in the early 2000s.

So, what happened?

I’m sure you guessed it. The TYPE of bands changed.

Figure 5

Before the 90s, about 60% of bands were rock bands – the types typically with one lead singer and a bunch of instrumentalists.

However, from the 2000s to present day, the percentage of rock bands dropped significantly making way for a new brand of bands which were generally made up of ALL singers: Pop bands. Think Destiny’s Child, Pussycat Dolls, Fifth Harmony, One Direction – you name it!

3.   We might also owe the emergence of Poetics to the rise of Hip-Hop

Apart from the increase in Speechiness and use of words, Poetics use two-times more complex words (e.g. Jay-Z saying opulence instead of wealth) than String Lovers and use words with more syllables. One genre immediately pops into everyone’s mind when we think of word-bending artistes: Hip-Hop.

Figure 6

Seeing as Hip-Hop tops all other genres in word-related features, it comes as no surprise that Hip-Hop gained mainstream popularity in the 90s – corresponding to the rise of The Poetics.

Figure 6b.png

4.   While the style of music has changed a lot over time, popular songs for the past sixty years have been mostly about loving women

To arrive at this, I used an algorithm called topic modeling. As the name implies, the algorithm searches for topics in a given text.

In our case, the text are lyrics from billboard songs.

Let’s see how these topics change over the decades:

Figure 7

This is absolutely amazing!

Like the features of songs, song lyrics also fall clearly into two buckets with Topic 1 capturing ’50s to ’80s, Topic 2 capturing the decades after the ’90s and the ’90s as a transition period!

This means that the sound and “message” of songs changed at pretty much the same rate.

So, what are these topics?

Figure 8

The topics are almost the same thing! Top songs have disproportionately been, for the past sixty years, “Yeah, I love my baby”.

There’s also something interesting going on here. A major difference between both topics is that before the 90s, songs might have had a more “direct” approach – you can see that a major topic is “gonna” e.g. “I’m gonna love you”. While after the 90s, it seemed a bit more indirect, like asking for permission hence replacing “gonna” for “wanna”. “Wanna” could also depict a more futuristic, imaginative approach to loving women.

5.   The more “quiet” genres ceased to exist in the Poetic Era

This sort of confirms that we tend to prefer louder music now than before.

Figure 9

The five most “quiet” genres are – Jazz, Swing, Folk, Blues and Disco.

These genres also ceased to exist as popular music in the Poetic Era except Jazz which seemed to survive by one artiste (Norah Jones).

Figure 10

What do these all mean?

In summary:

  • The 90s was an extremely important time in music.
  • The decline of rock bands and the rise of Hip-Hop played a major role in steering music to where it is today.
  • Love is a popular theme across songs for the past six decades but the approach to love might differ across the different eras of music.
  • Yes, modern artistes may be louder but it’s BECAUSE we have content :).
  • Bonus Point: Michael Jackson, despite being most popular in the 80s, is a Poetic! He was ahead of his time!

Fun Stuff and Things to Keep in Mind

  • I took a different (and more fun) approach to showcasing the data for this project. I built a dashboard using HTML, CSS, js and chart.js! The app is not (yet) optimized for mobile so, it’s best to use it on a laptop.

Here’s the link: http://bit.ly/music-dashboard

    • The dashboard has two tabs. The first one “Artist Dashboard”, shows you the average song features for individual artistes.
  1.  Figure 11
    • The second tab “Comparison Dashboard” allows you to compare song features for up to three artistes and looks like the screenshot below.
  2. Figure 12
    • You can share the results on Twitter or Facebook using the icons at the top right.
    • Just in case you forget what the features mean, hover over the title and you’d get a little tool-tip explaining it 🙂
  • The Poetic era (as I like to call it) is an ongoing era so some of these insights may change if we had 2016 to 2018 data (especially with the rise of trap music). However, I don’t expect the effects to be much.
  • It would be interesting to measure how “politically-aware” a song is. I will probably post the outcome of that on Twitter.
  • As usual, I am constrained by data collection methods of the generous GitHub user, Spotify’s algorithm and how Billboard arrives at the Hot 100.

Hope you had as much fun reading this as I had creating this 🙂

My Journey Into Data Science

Quite a number of people have asked me about my switch from Chemical Engineering to Data Science. How did I do it? When did I do it? Why did I do it? I felt today (January 6, 2018) was a befitting day to answer these questions as it marks the third year since I enrolled for my first programming course. I hope sharing my story would give some insight into what I did to become a Data Scientist and encourage budding “anythings” everywhere to pursue their passion fiercely.

My first exposure to Data Science was from a book that had nothing to do with Data Science

In March 2014, I stumbled on a book called The Power of Habit: Why We Do What We Do in Life and Business by Charles Duhigg. In a section of the book called The Habits of Organizations, Charles wrote about a large retail chain that used data on what a female customer bought to predict the likelihood that she was pregnant. To put it lightly, I was mind blown and I had to find out more.

I searched everywhere for what this sorcery was called. After a few months and with the help of my friends, I stumbled on something very similar to what I read in The Power of Habit. It was called was Business Analytics.

This discovery came at a tipping point for me because, at the time, I was in my final year of college and had just finished an internship with an Oil & Gas company. My experience there made me weary of taking up Chemical Engineering as a career because I felt like it just wasn’t for me. This realization also made me open to new challenges and pivoting career wise. Business Analytics seemed to fit right into that.

I created my first Data Science learning path from an answer on Quora

By 2014, I had graduated and began my National Youth Service Corps. During my NYSC, I stumbled on Quora from a Twitter recommendation and I loved it.

In case you are wondering, IDEALLY, NYSC is a one-year mandatory program in Nigeria where you are deployed to a state you aren’t affiliated with to serve in some capacity as either a government worker, teacher or anything else really.

On Quora, I found out that Business Analytics had many names and one was Data Science. I also found a very helpful answer which I recommend to this day for anyone looking to start out as a Data Scientist: How can I become a Data Scientist?

This answer helped shape my first ever learning path for Data Science in January 2015 (Forgive my terrible handwriting).

Written January 2015. Other courses on the left side of the page are The Analytics Edge and Google Analytics

Written January 2015. Other courses on the left side of the page are The Analytics Edge and Google Analytics

I completed 15 MOOCs on Data Science within a year

I primarily learnt Data Science through online courses. I never used a book (I tried). All the courses were free (because I didn’t care for a certificate) and where they were not free like Coursera, I got 100% Financial Aid.

I kissed a lot of frogs when it came to online courses so if you are looking for a loose guide on how to get started in Data Science I’ll save you the stress and focus only on the courses that were worthwhile.

1. Learnt Programming

This was the very first thing on my learning path and the scariest of them all. It was scary because I didn’t have a Computer Science background and the only time I was exposed to programming in College, I absolutely hated it. However, this time I felt I had all the time in the world and nothing to lose so I enrolled for Codecademy’s Learn Python course.

The course was so hard and a lot of it did not make sense to me. I could spend as much as two weeks trying to get a while loop to work and I had no idea what file I/O meant but by sheer brute force, I completed the course.

This was the first time I completed an online course after numerous attempts to do so previously. That gave me some confidence to keep on learning.

2. Learnt core Data Science

A lot of people ask me why I choose to use R over Python. It was by sheer coincidence that my first exposure to Data Science was in R from a course called The Analytics Edge from MIT on edX.

The ten-week course uses a case study approach to teach different parts of Data Science from Machine Learning to Visualization to Optimization using R. It was very demanding and very rewarding. The amazing experience I had on this course is what makes me lean a bit more to R than Python. The course gave me a great foundation and I still refer to my notes from 2015 sometimes.

3. Other helpful courses

Another course I loved, which I took towards the end of 2015, was Data Visualization and Communication with Tableau from Duke University on Coursera. It’s a five-week course that gives a great foundation on the use of Tableau. The instructor is amazing and the best I’ve been exposed to so far.

The next on my list would be Managing Big Data with MySQL from Duke University on Coursera. It’s a four-week course with the same amazing instructor as the Tableau course and teaches both MySQL and Teradata.

Others worth mentioning are: Introduction to BigData with Apache Spark (A four course series) from UCBerkeley on edX and Excel for Data Analysis and Visualization from Microsoft on edX.

How I started my blog — where the real learning started

If you read a lot of Quora answers or articles on how to become a better Software Engineer/Data Scientist/Designer and the likes, you’d see a recurring advice: Do personal projects to deepen your skill set. I had tried to do that a few times in 2015 but I wasn’t able to do anything reasonable because, frankly, I was not ready.

By 2016, I had slowed down on online courses because 90% of the courses had the same content and assumed you’re a beginner so it became a bit repetitive. By this time, I felt I was ready to start doing personal projects using a blog. The writing part was not an issue because I used to write in High School. My issue, however, was around consistency and creativity. Was I creative enough to put together interesting projects and could I do it consistently? You never know until you try, right? And that’s how I started my blog The Art and Science of Data in June 2016. My learning grew exponentially working on the content for my blog.

I wrote my first two posts within a month and then went on a year-long hiatus

My first post was Predicting The English Premier League Standings which I posted in September 2016 and then What Twitter Feels about Network Providers in Nigeria which was posted in October 2016. The amount of positive responses absolutely floored me. I got about 1,500 views and numerous responses on both posts and for the first time, I felt confident in my skills.

This experience taught me that creativity is not some talent that you either have or don’t. Creativity is born by experience and confidence in your skills because the possibilities of what can be done expands with the more you know.

Then I went on a year-long hiatus on my blog. This happened for many reasons.

  1. I had tried to write a blog post in December 2016 that was a hot mess. I cleaned it up later and used it for my Women in Machine Learning and Data Science Workshop called The ABC-XYZ of Data Science.
  2. After that, I had what I’ll call “The Data Scientist’s block”. I literally had no ideas and could not think up anything useful or interesting.
  3. My approach to my blog is a bit different from most data science blogs because mine involves a lot of research and iterations. It also makes my publishing cycle much longer than others.
  4. Work was grueling and adulting was catching up with me so I became a couch potato.

I finally had an idea in June 2017 on billionaires and with the help of my friends, I published A Data Driven Guide to Becoming a Consistent Billionaire in October 2017 (yes, it took me four months to put it together).

Within three days of publishing, it had 30,000 views. It was everywhere. A sizable number of sites plagiarized the post and I didn’t care. My work was good enough to be plagiarized!

My Little Victories So Far

Apart from the 40,000 views I’ve gotten so far on my post A Data Driven Guide to Becoming a Consistent Billionaire, 2017 was an interesting year for me. For the first time, the work I have put in for the past three years was being validated.

  1. I won a United Nations Data Visualization Contest with my Tableau visualization on “Visualizing Malaria: The Killer Disease Killing Africa” which looked something like this.

2. I got invited to speak at Stanford’s Women in Data Science Conference holding in Nigeria on the exact same topic as this post.

3. I have numerous collaborations lined up for 2018 both in Nigeria and abroad.

4. I facilitated a workshop at The Women in Machine Learning and Data Science in November 2017.

Truthfully, I’m a bit surprised that I got this far. I remember writing in my notepad “Rosebud, you will never be good enough for this” but here I am. I still have a lot of learning to do but I am also grateful for where I am today.

My Advice for You

I’m no expert neither am I John Maxwell who gives nuggets of self-help advice but here are a few things that have really helped me.

  1. Don’t be afraid to let go of something that’s not working out. It took me till 2016 to fully let go of my Oil & Gas dreams even though I knew I was not passionate about it.
  2. Don’t be afraid to be called crazy. I cannot count the number of times people subtly and not-so-subtly told me I was crazy for leaving Chemical Engineering especially when Data Science was relatively new in Nigeria. It used to get to me but now I smile and say to myself “When I blow, you’ll understand”.
  3. Read. Read. Read.The books that opened up this field to me had nothing to do with Data Science. Reading expands your realm of possibilities.
  4. Love to learn. Have learning goals every year and stick to a medium (books/audio/video/classroom) that works best for you.
  5. Always, always put your best foot forward. Let the work that you put out there be the very best work it could be. It would speak for you. 99% of the opportunities I have gotten today came, in part, because of my blog.
  6. Most importantly, you are not an island. Have a tight-knit support system that would tell you the truth even when it hurts. You’d be better for it.

Good luck 🙂

I want to especially thank my amazing support system and all the people that got me here. They are too numerous to mention but I love you guys so much. I want to especially thank Tobi, Didun and Miracle for the support, the tough love, the brutal feedback and telling me where exactly to put an apostrophe. You have been there from day 1. You know all my struggles. You saw me at the very beginning and still believed I could do it. Thank you for making a better Data Scientist and a better person. I wouldn’t trade you for the world.