In [1]:
import os
import glob
import pandas as pd
import numpy as np
import librosa
import random
import time 
import pickle
import queue
import threading

import scipy.io.wavfile as wav
from scipy.fftpack import fft
from scipy import signal
from scipy.spatial.distance import squareform

from librosa.display import specshow, waveplot

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

import IPython.display as ipd

import matplotlib
import matplotlib.pyplot as plt

np.random.seed(42)

%matplotlib inline
In [13]:
from keras.models import Sequential
from keras.layers import Dense, MaxPooling2D, Conv2D, Flatten, Dropout, Input, BatchNormalization, CuDNNLSTM
from keras.models import Model, load_model
from keras.callbacks import Callback, EarlyStopping
from keras.metrics import top_k_categorical_accuracy
C:\python36\envs\machine_learning\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Introduction

As we did in the previous Notebook regarding Speech Claissification with an Audio MNIST Dataset, ze zill now try to classify audio file from FMA dataset. The downloaded dataset contains 8000 audio samples of 30s (this is the small one, there is a medium of 25gb with 25k audio and a large one close to 1Tb with a lot more audio clip). As this is a trial, the small dataset will be enought and the advantage is to have a balanced dataset of 8 classes. Let's start by cleaning up datas we have from attached dataset

Cleanup Datasets

To start we will check the content of all files but our current objective is only to extract classes we have for our audio files. We will use other metadata in a futur notebook.

genres

In [2]:
genres = pd.read_csv("fma_metadata/genres.csv", index_col=0)
genres.head()
Out[2]:
#tracks parent title top_level
genre_id
1 8693 38 Avant-Garde 38
2 5271 0 International 2
3 1752 0 Blues 3
4 4126 0 Jazz 4
5 4106 0 Classical 5

So for nearly every genres, we have a parent class. This is because we can decompose a music genre into multiple sub-genres. For our dataset, we will focus only on top-level genres to have a balanced dataset

artist

In [3]:
artists = pd.read_csv("fma_metadata/raw_artists.csv", index_col=0)
artists.head()
Out[3]:
artist_active_year_begin artist_active_year_end artist_associated_labels artist_bio artist_comments artist_contact artist_date_created artist_donation_url artist_favorites artist_flattr_name ... artist_location artist_longitude artist_members artist_name artist_paypal_name artist_related_projects artist_url artist_website artist_wikipedia_page tags
artist_id
1 2006.0 NaN NaN <p>A Way Of Life, A Collective of Hip-Hop from... 0 Brown Bum aka Choke 11/26/2008 01:42:32 AM NaN 9 NaN ... New Jersey -74.405661 Sajje Morocco,Brownbum,ZawidaGod,Custodian of ... AWOL NaN The list of past projects is 2 long but every1... http://freemusicarchive.org/music/AWOL/ http://www.AzillionRecords.blogspot.com NaN ['awol']
10 NaN NaN Mistletone, Marriage Records <p>"Lucky Dragons" means any recorded or perfo... 3 Lukey Dargons 11/26/2008 01:43:35 AM http://glaciersofnice.com/shop/ 111 NaN ... Los Angeles, CA -118.243685 Luke Fischbeck\nSarah Rara Lucky Dragons NaN NaN http://freemusicarchive.org/music/Lucky_Dragons/ http://hawksandsparrows.org/ NaN ['lucky dragons']
100 2004.0 NaN Captcha Records (HBSP-2X), Pickled Egg (Europe) <p><span style="font-family:Verdana, Geneva, A... 1 Chris Kalis 11/26/2008 02:05:22 AM NaN 8 NaN ... Chicago, IL -87.629798 Chris Kalis, Harry Brenner, Scott McGaughey, B... Chandeliers NaN Killer Whales, \nMichael Columbia\nMandate\nMr... http://freemusicarchive.org/music/Chandeliers/ thechandeliers.com NaN ['chandeliers']
1000 NaN NaN NaN <p><a href="http://marzipanmarzipan.com">Marzi... 0 NaN 12/04/2008 09:24:35 AM NaN 0 NaN ... NaN 12.567380 NaN Marzipan Marzipan NaN NaN http://freemusicarchive.org/music/Marzipan_Mar... https://soundcloud.com/marzipanmarzipan NaN []
10000 NaN NaN NaN <p><span style="font-family:'Times New Roman',... 0 NaN 1/21/2011 02:11:31 PM NaN 1 NaN ... NaN NaN Jack Hertz\nPHOBoS\nBlue Hell Jack Hertz, PHOBoS, Blue Hell NaN NaN http://freemusicarchive.org/music/Jack_Hertz_P... http://surrism.phonoethics.com/surrism-phonoet... NaN ['jack hertz phobos blue hell']

5 rows × 24 columns

This frame contains multiple info about artists. this may be used on a more complex work later but for now we will just ignore it. For example an artist/album is more likely to contain always the same type of song.

albums

In [4]:
albums = pd.read_csv("fma_metadata/raw_albums.csv", index_col=0)
albums.head()
Out[4]:
album_comments album_date_created album_date_released album_engineer album_favorites album_handle album_image_file album_images album_information album_listens album_producer album_title album_tracks album_type album_url artist_name artist_url tags
album_id
1 0 11/26/2008 01:44:45 AM 1/05/2009 NaN 4 AWOL_-_A_Way_Of_Life https://freemusicarchive.org/file/images/album... [{'image_id': '1955', 'image_file': 'https://f... <p></p> 6073 NaN AWOL - A Way Of Life 7 Album http://freemusicarchive.org/music/AWOL/AWOL_-_... AWOL http://freemusicarchive.org/music/AWOL/ []
100 0 11/26/2008 01:55:44 AM 1/09/2009 NaN 0 On_Opaque_Things https://freemusicarchive.org/file/images/album... [{'image_id': '4403', 'image_file': 'https://f... NaN 5613 NaN On Opaque Things 4 Album http://freemusicarchive.org/music/Bird_Names/O... Bird Names http://freemusicarchive.org/music/Bird_Names/ []
1000 0 12/04/2008 09:28:49 AM 10/26/2008 NaN 0 DMBQ_Live_at_2008_Record_Fair_on_WFMU_Record_F... https://freemusicarchive.org/file/images/album... [{'image_id': '31997', 'image_file': 'https://... <p>http://blog.wfmu.org/freeform/2008/10/what-... 1092 NaN DMBQ Live at 2008 Record Fair on WFMU Record F... 4 Live Performance http://freemusicarchive.org/music/DMBQ/DMBQ_Li... DMBQ http://freemusicarchive.org/music/DMBQ/ []
10000 0 9/05/2011 04:42:57 PM NaN NaN 0 Live_at_CKUT_on_Montreal_Sessions_1434 https://freemusicarchive.org/file/images/album... [{'image_id': '12266', 'image_file': 'https://... <p>Live Set on the Montreal Session February 2... 1001 NaN Live at CKUT on Montreal Sessions 1 Radio Program http://freemusicarchive.org/music/Sundrips/Liv... Sundrips http://freemusicarchive.org/music/Sundrips/ []
10001 0 9/06/2011 12:02:58 AM 1/01/2006 NaN 0 Grounds_Dream_Cosmic_Love https://freemusicarchive.org/file/images/album... [{'image_id': '24091', 'image_file': 'https://... <p>Recorded in Linnavuori, Finland, 2005 (with... 504 NaN Ground's Dream Cosmic Love 1 Album http://freemusicarchive.org/music/Uton/Grounds... Uton http://freemusicarchive.org/music/Uton/ []

This is exactly the same with this dataset regarding albums

echonest

In [5]:
echonest = pd.read_csv("fma_metadata/echonest.csv", header=[0, 1], skipinitialspace=True, index_col=0)
echonest.head()
Out[5]:
audio_features metadata ... temporal_features
track_id acousticness danceability energy instrumentalness liveness speechiness tempo valence album_date album_name ... 214 215 216 217 218 219 220 221 222 223
2 0.416675 0.675894 0.634476 0.010628 0.177647 0.159310 165.922 0.576661 NaN NaN ... -1.992303 6.805694 0.233070 0.192880 0.027455 0.06408 3.67696 3.61288 13.316690 262.929749
3 0.374408 0.528643 0.817461 0.001851 0.105880 0.461818 126.957 0.269240 NaN NaN ... -1.582331 8.889308 0.258464 0.220905 0.081368 0.06413 6.08277 6.01864 16.673548 325.581085
5 0.043567 0.745566 0.701470 0.000697 0.373143 0.124595 100.260 0.621661 NaN NaN ... -2.288358 11.527109 0.256821 0.237820 0.060122 0.06014 5.92649 5.86635 16.013849 356.755737
10 0.951670 0.658179 0.924525 0.965427 0.115474 0.032985 111.562 0.963590 2008-03-11 Constant Hitmaker ... -3.662988 21.508228 0.283352 0.267070 0.125704 0.08082 8.41401 8.33319 21.317064 483.403809
134 0.452217 0.513238 0.560410 0.019443 0.096567 0.525519 114.290 0.894072 NaN NaN ... -1.452696 2.356398 0.234686 0.199550 0.149332 0.06440 11.26707 11.20267 26.454180 751.147705

5 rows × 249 columns

This dataset contains several information about the song itself. We may use it later on in addition of the study we will do on audio.

In [6]:
for col in echonest:
    if col[0] == "metadata":
        echonest.drop(col, axis=1, inplace=True) 
    elif col[0] == "ranks":
        echonest.drop(col, axis=1, inplace=True)
    elif col[0] == "social_features":
        echonest.drop(col, axis=1, inplace=True)

With this filtering, we only keep data about the audio itself and not anymore the author, rankings and so on

tracks

In [7]:
tracks = pd.read_csv("fma_metadata/tracks.csv", header=[0, 1], skipinitialspace=True, index_col=0)
tracks.head()
Out[7]:
album ... track
comments date_created date_released engineer favorites id information listens producer tags ... information interest language_code license listens lyricist number publisher tags title
track_id
2 0 2008-11-26 01:44:45 2009-01-05 00:00:00 NaN 4 1 <p></p> 6073 NaN [] ... NaN 4656 en Attribution-NonCommercial-ShareAlike 3.0 Inter... 1293 NaN 3 NaN [] Food
3 0 2008-11-26 01:44:45 2009-01-05 00:00:00 NaN 4 1 <p></p> 6073 NaN [] ... NaN 1470 en Attribution-NonCommercial-ShareAlike 3.0 Inter... 514 NaN 4 NaN [] Electric Ave
5 0 2008-11-26 01:44:45 2009-01-05 00:00:00 NaN 4 1 <p></p> 6073 NaN [] ... NaN 1933 en Attribution-NonCommercial-ShareAlike 3.0 Inter... 1151 NaN 6 NaN [] This World
10 0 2008-11-26 01:45:08 2008-02-06 00:00:00 NaN 4 6 NaN 47632 NaN [] ... NaN 54881 en Attribution-NonCommercial-NoDerivatives (aka M... 50135 NaN 1 NaN [] Freeway
20 0 2008-11-26 01:45:05 2009-01-06 00:00:00 NaN 2 4 <p> "spiritual songs" from Nicky Cook</p> 2710 NaN [] ... NaN 978 en Attribution-NonCommercial-NoDerivatives (aka M... 361 NaN 3 NaN [] Spiritual Level

5 rows × 52 columns

Same point here, we don't need those information yet. This is related to the complete track and we want to create a classifier to predict class of 30s sample of audio

features

In [8]:
features = pd.read_csv("fma_metadata/features.csv", header=[0, 1, 2], skipinitialspace=True, index_col=0)
features.head()
Out[8]:
feature chroma_cens ... tonnetz zcr
statistics kurtosis ... std kurtosis max mean median min skew std
number 01 02 03 04 05 06 07 08 09 10 ... 04 05 06 01 01 01 01 01 01 01
track_id
2 7.180653 5.230309 0.249321 1.347620 1.482478 0.531371 1.481593 2.691455 0.866868 1.341231 ... 0.054125 0.012226 0.012111 5.758890 0.459473 0.085629 0.071289 0.000000 2.089872 0.061448
3 1.888963 0.760539 0.345297 2.295201 1.654031 0.067592 1.366848 1.054094 0.108103 0.619185 ... 0.063831 0.014212 0.017740 2.824694 0.466309 0.084578 0.063965 0.000000 1.716724 0.069330
5 0.527563 -0.077654 -0.279610 0.685883 1.937570 0.880839 -0.923192 -0.927232 0.666617 1.038546 ... 0.040730 0.012691 0.014759 6.808415 0.375000 0.053114 0.041504 0.000000 2.193303 0.044861
10 3.702245 -0.291193 2.196742 -0.234449 1.367364 0.998411 1.770694 1.604566 0.521217 1.982386 ... 0.074358 0.017952 0.013921 21.434212 0.452148 0.077515 0.071777 0.000000 3.542325 0.040800
20 -0.193837 -0.198527 0.201546 0.258556 0.775204 0.084794 -0.289294 -0.816410 0.043851 -0.804761 ... 0.095003 0.022492 0.021355 16.669037 0.469727 0.047225 0.040039 0.000977 3.189831 0.030993

5 rows × 518 columns

Similarly to echonest, this dataset contains lot of datas regarding each samples which can be usefull for a more more complex model

Pre-processing those dataset for later

Even if we won't use it now, we can at least filter those dataset to keep only datas related to our audio samples and store their lightweited version.

after test, there were 6 corrupted audio files deleted manually, so we have 7994 audio files

In [9]:
list_index = []

for audio_path in glob.glob("fma_small/*/*.mp3"):
    id_ = os.path.basename(audio_path)[:-4]
    list_index.append(int(id_))

print("We have {} audio clips".format(len(list_index)))
We have 7994 audio clips

Now let's filter our datasets and save them.

In [10]:
tracks = tracks[tracks.index.isin(list_index)]

genres.to_csv("preprocessed_meta/genres.csv")
tracks.to_csv("preprocessed_meta/tracks.csv")
echonest[echonest.index.isin(list_index)].to_csv("preprocessed_meta/echonest.csv")
features[features.index.isin(list_index)].to_csv("preprocessed_meta/features.csv")

unique_album = np.unique(tracks[('album', 'id')].values)
unique_artist = np.unique(tracks[('artist', 'id')].values)

artists[artists.index.isin(unique_artist)].to_csv("preprocessed_meta/artists.csv")
albums[albums.index.isin(unique_album)].to_csv("preprocessed_meta/albums.csv")

Creating label dataset

In [11]:
def clean_classes(x):
    if "," in x:
        genres = x.split(",")
        new_genres = set(str(converter[int(c)]) for c in genres)
        return int(new_genres.pop())
    else:
        return converter[int(x)]

converter = genres[["top_level"]].to_dict()["top_level"]
y = tracks[('track', 'genres')].str.replace("[^0-9,]", "")
y = y.apply(clean_classes).to_frame()
y.columns = ['genres']
y = y.join(genres["title"], on = ["genres"])
y.head()
Out[11]:
genres title
track_id
2 21 Hip-Hop
5 21 Hip-Hop
10 10 Pop
140 17 Folk
141 17 Folk
In [12]:
y["title"].value_counts()
Out[12]:
Folk             1000
Pop              1000
Instrumental     1000
International    1000
Electronic        999
Experimental      999
Rock              999
Hip-Hop           997
Name: title, dtype: int64

As said in the dataset description our dataset is balanced with 1000 song on 8 classes. Due to the deletion of 6 corrupted files, we have few classes with few less song but this won't be a problem. We can save this output dataframe. and clear our memory.

In [13]:
y.to_csv("preprocessed_meta/classes.csv")
In [14]:
del tracks
del albums
del artists
del echonest
del features
del y

This filtering allows a reduction of datas from 1.36gb to 84mb as we have only 8k samples but as already mentionned, we won't use those information yet

Pre-processing Audio

Now we have cleaned our datasets, we can focus on audio clips. First try I've done is with CNN on FFT of each songs. Let's do the pre-processing. For visualization we will use everytime the same audio.

In [1]:
base_song_path = "fma_small/000/000002.mp3"

Preparing FFT datas

With default setting of the FFT creates a large matrix of (1025, 2582) of float32. If we want to save it for all song we will need close to 79Gb. To reduce it, parameter has been adjust to reduce this size bi removing overlap and also lose a bit of information by storing it in Float16. This parameter provides a 1025 x 646 matrix for a size of 9.85 Gb.

In [7]:
y, sr = librosa.load(base_song_path, sr=None, mono = True)

if sr != 44100:
    y = librosa.resample(y, sr, 44100)
    sr = 44100

D = librosa.stft(y, 
                 n_fft = 2048, 
                 hop_length = 2048, # hop_length = 20 ms
                 win_length = 2048,
                 window = signal.tukey(2048),
                ) 
Xdb = librosa.amplitude_to_db(abs(D))

print("Every sample will have a matrix of {}".format(Xdb.shape))

plt.figure(figsize=(20, 12))
librosa.display.specshow(Xdb, sr=sr, x_axis='time', y_axis='hz')
plt.ylabel("Freq")
plt.xlabel("Time")
plt.title("FFT of the song", fontsize=15)
plt.show()

plt.figure(figsize=(20, 12))
librosa.display.waveplot(y, sr=sr)
plt.ylabel("Amplitude")
plt.xlabel("Time")
plt.title("Waveplot of the Song", fontsize=15)
plt.show()
Every sample will have a matrix of (1025, 646)