

This concluded the second and final part of the data gathering process. For a detailed description of the audio feature variables please check the codebook.

The challenge is to untangle the double nested list which is returned (see script “3_audio_features_query.ipynb”). By creating chunks (of 100 songs) and sending them as one request it is possible to get more audio features at once. Spotify only allows a limited number of requests, so that theoretically one request results in one set of audio features for a limited number of songs. Again, a developer offered a template to query the audio features. Spotify offers an API which is accessible to registered developers. I next queried the audio features for all songs.
#United states spotify charts code#
(Update : unfortunately, Spotify seem to have revoked permission for querying the source code of their webpage or at least through the “requests” library) Head of the charts data frame Country This concluded the first part of the data gathering process.

The scraping output included 70 CSV files which had to be merged and cleaned (for more details please check script “2_import_merge_charts.R”). The modifications included (i) changing the time interval from daily to weekly charts which substantially reduced the dataset size (ii) scraping the songs’ URLs as they included the song IDs which were necessary to query the audio features (iii) adding a “memory” feature such that the scraping didn’t have to start from scratch whenever it was disrupted (e.g., due to connection issues). I scraped the charts, i.e. the top 200 songs for 70 countries from Spotify’s website with a modified script found on GitHub. This was the most time consuming part of the project.
