Introduction

Based on the result using meta data on the small dataset, it seems that it's the best approach we have so far. This dataset was quite simple with 8 classes and 7994 observations. To avoid overfitting, we applied PCA and got 90+% of accuracy.

In this notebook, the objectif will be to do it on the complete dataset. There will be more complexe things because classes will be unbalanced and some audio may have multiple main classes. Based on the result, we may train additionnal model to sub classify audio.

In [16]:
import pandas as pd
import numpy as np
import itertools

from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import shuffle
from sklearn.metrics import accuracy_score

import matplotlib.pyplot as plt

pd.set_option('max_info_columns', 999)
pd.options.display.max_rows = 200

%matplotlib inline
In [17]:
from keras.models import Sequential
from keras.layers import Dense, Flatten, Dropout
from keras.models import Model
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects
from keras.layers import Activation
from keras.utils.np_utils import to_categorical
C:\python36\envs\machine_learning\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
In [1]:
def clean_classes(x):
    if len(x) == 0:
        return "-1"
    elif "," in x:
        list_genre = x.split(",")
        new_genres = set(naming[converter[int(c)]] for c in list_genre)
        return ",".join(list(new_genres))
    else:
        return naming[converter[int(x)]]

Preparation of data

Loading Features

In [11]:
features = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/features.csv", header=[0, 1, 2], skipinitialspace=True, index_col=0)
features.head()
Out[11]:
feature chroma_cens ... tonnetz zcr
statistics kurtosis ... std kurtosis max mean median min skew std
number 01 02 03 04 05 06 07 08 09 10 ... 04 05 06 01 01 01 01 01 01 01
track_id
2 7.180653 5.230309 0.249321 1.347620 1.482478 0.531371 1.481593 2.691455 0.866868 1.341231 ... 0.054125 0.012226 0.012111 5.758890 0.459473 0.085629 0.071289 0.000000 2.089872 0.061448
3 1.888963 0.760539 0.345297 2.295201 1.654031 0.067592 1.366848 1.054094 0.108103 0.619185 ... 0.063831 0.014212 0.017740 2.824694 0.466309 0.084578 0.063965 0.000000 1.716724 0.069330
5 0.527563 -0.077654 -0.279610 0.685883 1.937570 0.880839 -0.923192 -0.927232 0.666617 1.038546 ... 0.040730 0.012691 0.014759 6.808415 0.375000 0.053114 0.041504 0.000000 2.193303 0.044861
10 3.702245 -0.291193 2.196742 -0.234449 1.367364 0.998411 1.770694 1.604566 0.521217 1.982386 ... 0.074358 0.017952 0.013921 21.434212 0.452148 0.077515 0.071777 0.000000 3.542325 0.040800
20 -0.193837 -0.198527 0.201546 0.258556 0.775204 0.084794 -0.289294 -0.816410 0.043851 -0.804761 ... 0.095003 0.022492 0.021355 16.669037 0.469727 0.047225 0.040039 0.000977 3.189831 0.030993

5 rows × 518 columns

In [12]:
features.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 106574 entries, 2 to 155320
Data columns (total 518 columns):
(chroma_cens, kurtosis, 01)           106574 non-null float64
(chroma_cens, kurtosis, 02)           106574 non-null float64
(chroma_cens, kurtosis, 03)           106574 non-null float64
(chroma_cens, kurtosis, 04)           106574 non-null float64
(chroma_cens, kurtosis, 05)           106574 non-null float64
(chroma_cens, kurtosis, 06)           106574 non-null float64
(chroma_cens, kurtosis, 07)           106574 non-null float64
(chroma_cens, kurtosis, 08)           106574 non-null float64
(chroma_cens, kurtosis, 09)           106574 non-null float64
(chroma_cens, kurtosis, 10)           106574 non-null float64
(chroma_cens, kurtosis, 11)           106574 non-null float64
(chroma_cens, kurtosis, 12)           106574 non-null float64
(chroma_cens, max, 01)                106574 non-null float64
(chroma_cens, max, 02)                106574 non-null float64
(chroma_cens, max, 03)                106574 non-null float64
(chroma_cens, max, 04)                106574 non-null float64
(chroma_cens, max, 05)                106574 non-null float64
(chroma_cens, max, 06)                106574 non-null float64
(chroma_cens, max, 07)                106574 non-null float64
(chroma_cens, max, 08)                106574 non-null float64
(chroma_cens, max, 09)                106574 non-null float64
(chroma_cens, max, 10)                106574 non-null float64
(chroma_cens, max, 11)                106574 non-null float64
(chroma_cens, max, 12)                106574 non-null float64
(chroma_cens, mean, 01)               106574 non-null float64
(chroma_cens, mean, 02)               106574 non-null float64
(chroma_cens, mean, 03)               106574 non-null float64
(chroma_cens, mean, 04)               106574 non-null float64
(chroma_cens, mean, 05)               106574 non-null float64
(chroma_cens, mean, 06)               106574 non-null float64
(chroma_cens, mean, 07)               106574 non-null float64
(chroma_cens, mean, 08)               106574 non-null float64
(chroma_cens, mean, 09)               106574 non-null float64
(chroma_cens, mean, 10)               106574 non-null float64
(chroma_cens, mean, 11)               106574 non-null float64
(chroma_cens, mean, 12)               106574 non-null float64
(chroma_cens, median, 01)             106574 non-null float64
(chroma_cens, median, 02)             106574 non-null float64
(chroma_cens, median, 03)             106574 non-null float64
(chroma_cens, median, 04)             106574 non-null float64
(chroma_cens, median, 05)             106574 non-null float64
(chroma_cens, median, 06)             106574 non-null float64
(chroma_cens, median, 07)             106574 non-null float64
(chroma_cens, median, 08)             106574 non-null float64
(chroma_cens, median, 09)             106574 non-null float64
(chroma_cens, median, 10)             106574 non-null float64
(chroma_cens, median, 11)             106574 non-null float64
(chroma_cens, median, 12)             106574 non-null float64
(chroma_cens, min, 01)                106574 non-null float64
(chroma_cens, min, 02)                106574 non-null float64
(chroma_cens, min, 03)                106574 non-null float64
(chroma_cens, min, 04)                106574 non-null float64
(chroma_cens, min, 05)                106574 non-null float64
(chroma_cens, min, 06)                106574 non-null float64
(chroma_cens, min, 07)                106574 non-null float64
(chroma_cens, min, 08)                106574 non-null float64
(chroma_cens, min, 09)                106574 non-null float64
(chroma_cens, min, 10)                106574 non-null float64
(chroma_cens, min, 11)                106574 non-null float64
(chroma_cens, min, 12)                106574 non-null float64
(chroma_cens, skew, 01)               106574 non-null float64
(chroma_cens, skew, 02)               106574 non-null float64
(chroma_cens, skew, 03)               106574 non-null float64
(chroma_cens, skew, 04)               106574 non-null float64
(chroma_cens, skew, 05)               106574 non-null float64
(chroma_cens, skew, 06)               106574 non-null float64
(chroma_cens, skew, 07)               106574 non-null float64
(chroma_cens, skew, 08)               106574 non-null float64
(chroma_cens, skew, 09)               106574 non-null float64
(chroma_cens, skew, 10)               106574 non-null float64
(chroma_cens, skew, 11)               106574 non-null float64
(chroma_cens, skew, 12)               106574 non-null float64
(chroma_cens, std, 01)                106574 non-null float64
(chroma_cens, std, 02)                106574 non-null float64
(chroma_cens, std, 03)                106574 non-null float64
(chroma_cens, std, 04)                106574 non-null float64
(chroma_cens, std, 05)                106574 non-null float64
(chroma_cens, std, 06)                106574 non-null float64
(chroma_cens, std, 07)                106574 non-null float64
(chroma_cens, std, 08)                106574 non-null float64
(chroma_cens, std, 09)                106574 non-null float64
(chroma_cens, std, 10)                106574 non-null float64
(chroma_cens, std, 11)                106574 non-null float64
(chroma_cens, std, 12)                106574 non-null float64
(chroma_cqt, kurtosis, 01)            106574 non-null float64
(chroma_cqt, kurtosis, 02)            106574 non-null float64
(chroma_cqt, kurtosis, 03)            106574 non-null float64
(chroma_cqt, kurtosis, 04)            106574 non-null float64
(chroma_cqt, kurtosis, 05)            106574 non-null float64
(chroma_cqt, kurtosis, 06)            106574 non-null float64
(chroma_cqt, kurtosis, 07)            106574 non-null float64
(chroma_cqt, kurtosis, 08)            106574 non-null float64
(chroma_cqt, kurtosis, 09)            106574 non-null float64
(chroma_cqt, kurtosis, 10)            106574 non-null float64
(chroma_cqt, kurtosis, 11)            106574 non-null float64
(chroma_cqt, kurtosis, 12)            106574 non-null float64
(chroma_cqt, max, 01)                 106574 non-null float64
(chroma_cqt, max, 02)                 106574 non-null float64
(chroma_cqt, max, 03)                 106574 non-null float64
(chroma_cqt, max, 04)                 106574 non-null float64
(chroma_cqt, max, 05)                 106574 non-null float64
(chroma_cqt, max, 06)                 106574 non-null float64
(chroma_cqt, max, 07)                 106574 non-null float64
(chroma_cqt, max, 08)                 106574 non-null float64
(chroma_cqt, max, 09)                 106574 non-null float64
(chroma_cqt, max, 10)                 106574 non-null float64
(chroma_cqt, max, 11)                 106574 non-null float64
(chroma_cqt, max, 12)                 106574 non-null float64
(chroma_cqt, mean, 01)                106574 non-null float64
(chroma_cqt, mean, 02)                106574 non-null float64
(chroma_cqt, mean, 03)                106574 non-null float64
(chroma_cqt, mean, 04)                106574 non-null float64
(chroma_cqt, mean, 05)                106574 non-null float64
(chroma_cqt, mean, 06)                106574 non-null float64
(chroma_cqt, mean, 07)                106574 non-null float64
(chroma_cqt, mean, 08)                106574 non-null float64
(chroma_cqt, mean, 09)                106574 non-null float64
(chroma_cqt, mean, 10)                106574 non-null float64
(chroma_cqt, mean, 11)                106574 non-null float64
(chroma_cqt, mean, 12)                106574 non-null float64
(chroma_cqt, median, 01)              106574 non-null float64
(chroma_cqt, median, 02)              106574 non-null float64
(chroma_cqt, median, 03)              106574 non-null float64
(chroma_cqt, median, 04)              106574 non-null float64
(chroma_cqt, median, 05)              106574 non-null float64
(chroma_cqt, median, 06)              106574 non-null float64
(chroma_cqt, median, 07)              106574 non-null float64
(chroma_cqt, median, 08)              106574 non-null float64
(chroma_cqt, median, 09)              106574 non-null float64
(chroma_cqt, median, 10)              106574 non-null float64
(chroma_cqt, median, 11)              106574 non-null float64
(chroma_cqt, median, 12)              106574 non-null float64
(chroma_cqt, min, 01)                 106574 non-null float64
(chroma_cqt, min, 02)                 106574 non-null float64
(chroma_cqt, min, 03)                 106574 non-null float64
(chroma_cqt, min, 04)                 106574 non-null float64
(chroma_cqt, min, 05)                 106574 non-null float64
(chroma_cqt, min, 06)                 106574 non-null float64
(chroma_cqt, min, 07)                 106574 non-null float64
(chroma_cqt, min, 08)                 106574 non-null float64
(chroma_cqt, min, 09)                 106574 non-null float64
(chroma_cqt, min, 10)                 106574 non-null float64
(chroma_cqt, min, 11)                 106574 non-null float64
(chroma_cqt, min, 12)                 106574 non-null float64
(chroma_cqt, skew, 01)                106574 non-null float64
(chroma_cqt, skew, 02)                106574 non-null float64
(chroma_cqt, skew, 03)                106574 non-null float64
(chroma_cqt, skew, 04)                106574 non-null float64
(chroma_cqt, skew, 05)                106574 non-null float64
(chroma_cqt, skew, 06)                106574 non-null float64
(chroma_cqt, skew, 07)                106574 non-null float64
(chroma_cqt, skew, 08)                106574 non-null float64
(chroma_cqt, skew, 09)                106574 non-null float64
(chroma_cqt, skew, 10)                106574 non-null float64
(chroma_cqt, skew, 11)                106574 non-null float64
(chroma_cqt, skew, 12)                106574 non-null float64
(chroma_cqt, std, 01)                 106574 non-null float64
(chroma_cqt, std, 02)                 106574 non-null float64
(chroma_cqt, std, 03)                 106574 non-null float64
(chroma_cqt, std, 04)                 106574 non-null float64
(chroma_cqt, std, 05)                 106574 non-null float64
(chroma_cqt, std, 06)                 106574 non-null float64
(chroma_cqt, std, 07)                 106574 non-null float64
(chroma_cqt, std, 08)                 106574 non-null float64
(chroma_cqt, std, 09)                 106574 non-null float64
(chroma_cqt, std, 10)                 106574 non-null float64
(chroma_cqt, std, 11)                 106574 non-null float64
(chroma_cqt, std, 12)                 106574 non-null float64
(chroma_stft, kurtosis, 01)           106574 non-null float64
(chroma_stft, kurtosis, 02)           106574 non-null float64
(chroma_stft, kurtosis, 03)           106574 non-null float64
(chroma_stft, kurtosis, 04)           106574 non-null float64
(chroma_stft, kurtosis, 05)           106574 non-null float64
(chroma_stft, kurtosis, 06)           106574 non-null float64
(chroma_stft, kurtosis, 07)           106574 non-null float64
(chroma_stft, kurtosis, 08)           106574 non-null float64
(chroma_stft, kurtosis, 09)           106574 non-null float64
(chroma_stft, kurtosis, 10)           106574 non-null float64
(chroma_stft, kurtosis, 11)           106574 non-null float64
(chroma_stft, kurtosis, 12)           106574 non-null float64
(chroma_stft, max, 01)                106574 non-null float64
(chroma_stft, max, 02)                106574 non-null float64
(chroma_stft, max, 03)                106574 non-null float64
(chroma_stft, max, 04)                106574 non-null float64
(chroma_stft, max, 05)                106574 non-null float64
(chroma_stft, max, 06)                106574 non-null float64
(chroma_stft, max, 07)                106574 non-null float64
(chroma_stft, max, 08)                106574 non-null float64
(chroma_stft, max, 09)                106574 non-null float64
(chroma_stft, max, 10)                106574 non-null float64
(chroma_stft, max, 11)                106574 non-null float64
(chroma_stft, max, 12)                106574 non-null float64
(chroma_stft, mean, 01)               106574 non-null float64
(chroma_stft, mean, 02)               106574 non-null float64
(chroma_stft, mean, 03)               106574 non-null float64
(chroma_stft, mean, 04)               106574 non-null float64
(chroma_stft, mean, 05)               106574 non-null float64
(chroma_stft, mean, 06)               106574 non-null float64
(chroma_stft, mean, 07)               106574 non-null float64
(chroma_stft, mean, 08)               106574 non-null float64
(chroma_stft, mean, 09)               106574 non-null float64
(chroma_stft, mean, 10)               106574 non-null float64
(chroma_stft, mean, 11)               106574 non-null float64
(chroma_stft, mean, 12)               106574 non-null float64
(chroma_stft, median, 01)             106574 non-null float64
(chroma_stft, median, 02)             106574 non-null float64
(chroma_stft, median, 03)             106574 non-null float64
(chroma_stft, median, 04)             106574 non-null float64
(chroma_stft, median, 05)             106574 non-null float64
(chroma_stft, median, 06)             106574 non-null float64
(chroma_stft, median, 07)             106574 non-null float64
(chroma_stft, median, 08)             106574 non-null float64
(chroma_stft, median, 09)             106574 non-null float64
(chroma_stft, median, 10)             106574 non-null float64
(chroma_stft, median, 11)             106574 non-null float64
(chroma_stft, median, 12)             106574 non-null float64
(chroma_stft, min, 01)                106574 non-null float64
(chroma_stft, min, 02)                106574 non-null float64
(chroma_stft, min, 03)                106574 non-null float64
(chroma_stft, min, 04)                106574 non-null float64
(chroma_stft, min, 05)                106574 non-null float64
(chroma_stft, min, 06)                106574 non-null float64
(chroma_stft, min, 07)                106574 non-null float64
(chroma_stft, min, 08)                106574 non-null float64
(chroma_stft, min, 09)                106574 non-null float64
(chroma_stft, min, 10)                106574 non-null float64
(chroma_stft, min, 11)                106574 non-null float64
(chroma_stft, min, 12)                106574 non-null float64
(chroma_stft, skew, 01)               106574 non-null float64
(chroma_stft, skew, 02)               106574 non-null float64
(chroma_stft, skew, 03)               106574 non-null float64
(chroma_stft, skew, 04)               106574 non-null float64
(chroma_stft, skew, 05)               106574 non-null float64
(chroma_stft, skew, 06)               106574 non-null float64
(chroma_stft, skew, 07)               106574 non-null float64
(chroma_stft, skew, 08)               106574 non-null float64
(chroma_stft, skew, 09)               106574 non-null float64
(chroma_stft, skew, 10)               106574 non-null float64
(chroma_stft, skew, 11)               106574 non-null float64
(chroma_stft, skew, 12)               106574 non-null float64
(chroma_stft, std, 01)                106574 non-null float64
(chroma_stft, std, 02)                106574 non-null float64
(chroma_stft, std, 03)                106574 non-null float64
(chroma_stft, std, 04)                106574 non-null float64
(chroma_stft, std, 05)                106574 non-null float64
(chroma_stft, std, 06)                106574 non-null float64
(chroma_stft, std, 07)                106574 non-null float64
(chroma_stft, std, 08)                106574 non-null float64
(chroma_stft, std, 09)                106574 non-null float64
(chroma_stft, std, 10)                106574 non-null float64
(chroma_stft, std, 11)                106574 non-null float64
(chroma_stft, std, 12)                106574 non-null float64
(mfcc, kurtosis, 01)                  106574 non-null float64
(mfcc, kurtosis, 02)                  106574 non-null float64
(mfcc, kurtosis, 03)                  106574 non-null float64
(mfcc, kurtosis, 04)                  106574 non-null float64
(mfcc, kurtosis, 05)                  106574 non-null float64
(mfcc, kurtosis, 06)                  106574 non-null float64
(mfcc, kurtosis, 07)                  106574 non-null float64
(mfcc, kurtosis, 08)                  106574 non-null float64
(mfcc, kurtosis, 09)                  106574 non-null float64
(mfcc, kurtosis, 10)                  106574 non-null float64
(mfcc, kurtosis, 11)                  106574 non-null float64
(mfcc, kurtosis, 12)                  106574 non-null float64
(mfcc, kurtosis, 13)                  106574 non-null float64
(mfcc, kurtosis, 14)                  106574 non-null float64
(mfcc, kurtosis, 15)                  106574 non-null float64
(mfcc, kurtosis, 16)                  106574 non-null float64
(mfcc, kurtosis, 17)                  106574 non-null float64
(mfcc, kurtosis, 18)                  106574 non-null float64
(mfcc, kurtosis, 19)                  106574 non-null float64
(mfcc, kurtosis, 20)                  106574 non-null float64
(mfcc, max, 01)                       106574 non-null float64
(mfcc, max, 02)                       106574 non-null float64
(mfcc, max, 03)                       106574 non-null float64
(mfcc, max, 04)                       106574 non-null float64
(mfcc, max, 05)                       106574 non-null float64
(mfcc, max, 06)                       106574 non-null float64
(mfcc, max, 07)                       106574 non-null float64
(mfcc, max, 08)                       106574 non-null float64
(mfcc, max, 09)                       106574 non-null float64
(mfcc, max, 10)                       106574 non-null float64
(mfcc, max, 11)                       106574 non-null float64
(mfcc, max, 12)                       106574 non-null float64
(mfcc, max, 13)                       106574 non-null float64
(mfcc, max, 14)                       106574 non-null float64
(mfcc, max, 15)                       106574 non-null float64
(mfcc, max, 16)                       106574 non-null float64
(mfcc, max, 17)                       106574 non-null float64
(mfcc, max, 18)                       106574 non-null float64
(mfcc, max, 19)                       106574 non-null float64
(mfcc, max, 20)                       106574 non-null float64
(mfcc, mean, 01)                      106574 non-null float64
(mfcc, mean, 02)                      106574 non-null float64
(mfcc, mean, 03)                      106574 non-null float64
(mfcc, mean, 04)                      106574 non-null float64
(mfcc, mean, 05)                      106574 non-null float64
(mfcc, mean, 06)                      106574 non-null float64
(mfcc, mean, 07)                      106574 non-null float64
(mfcc, mean, 08)                      106574 non-null float64
(mfcc, mean, 09)                      106574 non-null float64
(mfcc, mean, 10)                      106574 non-null float64
(mfcc, mean, 11)                      106574 non-null float64
(mfcc, mean, 12)                      106574 non-null float64
(mfcc, mean, 13)                      106574 non-null float64
(mfcc, mean, 14)                      106574 non-null float64
(mfcc, mean, 15)                      106574 non-null float64
(mfcc, mean, 16)                      106574 non-null float64
(mfcc, mean, 17)                      106574 non-null float64
(mfcc, mean, 18)                      106574 non-null float64
(mfcc, mean, 19)                      106574 non-null float64
(mfcc, mean, 20)                      106574 non-null float64
(mfcc, median, 01)                    106574 non-null float64
(mfcc, median, 02)                    106574 non-null float64
(mfcc, median, 03)                    106574 non-null float64
(mfcc, median, 04)                    106574 non-null float64
(mfcc, median, 05)                    106574 non-null float64
(mfcc, median, 06)                    106574 non-null float64
(mfcc, median, 07)                    106574 non-null float64
(mfcc, median, 08)                    106574 non-null float64
(mfcc, median, 09)                    106574 non-null float64
(mfcc, median, 10)                    106574 non-null float64
(mfcc, median, 11)                    106574 non-null float64
(mfcc, median, 12)                    106574 non-null float64
(mfcc, median, 13)                    106574 non-null float64
(mfcc, median, 14)                    106574 non-null float64
(mfcc, median, 15)                    106574 non-null float64
(mfcc, median, 16)                    106574 non-null float64
(mfcc, median, 17)                    106574 non-null float64
(mfcc, median, 18)                    106574 non-null float64
(mfcc, median, 19)                    106574 non-null float64
(mfcc, median, 20)                    106574 non-null float64
(mfcc, min, 01)                       106574 non-null float64
(mfcc, min, 02)                       106574 non-null float64
(mfcc, min, 03)                       106574 non-null float64
(mfcc, min, 04)                       106574 non-null float64
(mfcc, min, 05)                       106574 non-null float64
(mfcc, min, 06)                       106574 non-null float64
(mfcc, min, 07)                       106574 non-null float64
(mfcc, min, 08)                       106574 non-null float64
(mfcc, min, 09)                       106574 non-null float64
(mfcc, min, 10)                       106574 non-null float64
(mfcc, min, 11)                       106574 non-null float64
(mfcc, min, 12)                       106574 non-null float64
(mfcc, min, 13)                       106574 non-null float64
(mfcc, min, 14)                       106574 non-null float64
(mfcc, min, 15)                       106574 non-null float64
(mfcc, min, 16)                       106574 non-null float64
(mfcc, min, 17)                       106574 non-null float64
(mfcc, min, 18)                       106574 non-null float64
(mfcc, min, 19)                       106574 non-null float64
(mfcc, min, 20)                       106574 non-null float64
(mfcc, skew, 01)                      106574 non-null float64
(mfcc, skew, 02)                      106574 non-null float64
(mfcc, skew, 03)                      106574 non-null float64
(mfcc, skew, 04)                      106574 non-null float64
(mfcc, skew, 05)                      106574 non-null float64
(mfcc, skew, 06)                      106574 non-null float64
(mfcc, skew, 07)                      106574 non-null float64
(mfcc, skew, 08)                      106574 non-null float64
(mfcc, skew, 09)                      106574 non-null float64
(mfcc, skew, 10)                      106574 non-null float64
(mfcc, skew, 11)                      106574 non-null float64
(mfcc, skew, 12)                      106574 non-null float64
(mfcc, skew, 13)                      106574 non-null float64
(mfcc, skew, 14)                      106574 non-null float64
(mfcc, skew, 15)                      106574 non-null float64
(mfcc, skew, 16)                      106574 non-null float64
(mfcc, skew, 17)                      106574 non-null float64
(mfcc, skew, 18)                      106574 non-null float64
(mfcc, skew, 19)                      106574 non-null float64
(mfcc, skew, 20)                      106574 non-null float64
(mfcc, std, 01)                       106574 non-null float64
(mfcc, std, 02)                       106574 non-null float64
(mfcc, std, 03)                       106574 non-null float64
(mfcc, std, 04)                       106574 non-null float64
(mfcc, std, 05)                       106574 non-null float64
(mfcc, std, 06)                       106574 non-null float64
(mfcc, std, 07)                       106574 non-null float64
(mfcc, std, 08)                       106574 non-null float64
(mfcc, std, 09)                       106574 non-null float64
(mfcc, std, 10)                       106574 non-null float64
(mfcc, std, 11)                       106574 non-null float64
(mfcc, std, 12)                       106574 non-null float64
(mfcc, std, 13)                       106574 non-null float64
(mfcc, std, 14)                       106574 non-null float64
(mfcc, std, 15)                       106574 non-null float64
(mfcc, std, 16)                       106574 non-null float64
(mfcc, std, 17)                       106574 non-null float64
(mfcc, std, 18)                       106574 non-null float64
(mfcc, std, 19)                       106574 non-null float64
(mfcc, std, 20)                       106574 non-null float64
(rmse, kurtosis, 01)                  106574 non-null float64
(rmse, max, 01)                       106574 non-null float64
(rmse, mean, 01)                      106574 non-null float64
(rmse, median, 01)                    106574 non-null float64
(rmse, min, 01)                       106574 non-null float64
(rmse, skew, 01)                      106574 non-null float64
(rmse, std, 01)                       106574 non-null float64
(spectral_bandwidth, kurtosis, 01)    106574 non-null float64
(spectral_bandwidth, max, 01)         106574 non-null float64
(spectral_bandwidth, mean, 01)        106574 non-null float64
(spectral_bandwidth, median, 01)      106574 non-null float64
(spectral_bandwidth, min, 01)         106574 non-null float64
(spectral_bandwidth, skew, 01)        106574 non-null float64
(spectral_bandwidth, std, 01)         106574 non-null float64
(spectral_centroid, kurtosis, 01)     106574 non-null float64
(spectral_centroid, max, 01)          106574 non-null float64
(spectral_centroid, mean, 01)         106574 non-null float64
(spectral_centroid, median, 01)       106574 non-null float64
(spectral_centroid, min, 01)          106574 non-null float64
(spectral_centroid, skew, 01)         106574 non-null float64
(spectral_centroid, std, 01)          106574 non-null float64
(spectral_contrast, kurtosis, 01)     106574 non-null float64
(spectral_contrast, kurtosis, 02)     106574 non-null float64
(spectral_contrast, kurtosis, 03)     106574 non-null float64
(spectral_contrast, kurtosis, 04)     106574 non-null float64
(spectral_contrast, kurtosis, 05)     106574 non-null float64
(spectral_contrast, kurtosis, 06)     106574 non-null float64
(spectral_contrast, kurtosis, 07)     106574 non-null float64
(spectral_contrast, max, 01)          106574 non-null float64
(spectral_contrast, max, 02)          106574 non-null float64
(spectral_contrast, max, 03)          106574 non-null float64
(spectral_contrast, max, 04)          106574 non-null float64
(spectral_contrast, max, 05)          106574 non-null float64
(spectral_contrast, max, 06)          106574 non-null float64
(spectral_contrast, max, 07)          106574 non-null float64
(spectral_contrast, mean, 01)         106574 non-null float64
(spectral_contrast, mean, 02)         106574 non-null float64
(spectral_contrast, mean, 03)         106574 non-null float64
(spectral_contrast, mean, 04)         106574 non-null float64
(spectral_contrast, mean, 05)         106574 non-null float64
(spectral_contrast, mean, 06)         106574 non-null float64
(spectral_contrast, mean, 07)         106574 non-null float64
(spectral_contrast, median, 01)       106574 non-null float64
(spectral_contrast, median, 02)       106574 non-null float64
(spectral_contrast, median, 03)       106574 non-null float64
(spectral_contrast, median, 04)       106574 non-null float64
(spectral_contrast, median, 05)       106574 non-null float64
(spectral_contrast, median, 06)       106574 non-null float64
(spectral_contrast, median, 07)       106574 non-null float64
(spectral_contrast, min, 01)          106574 non-null float64
(spectral_contrast, min, 02)          106574 non-null float64
(spectral_contrast, min, 03)          106574 non-null float64
(spectral_contrast, min, 04)          106574 non-null float64
(spectral_contrast, min, 05)          106574 non-null float64
(spectral_contrast, min, 06)          106574 non-null float64
(spectral_contrast, min, 07)          106574 non-null float64
(spectral_contrast, skew, 01)         106574 non-null float64
(spectral_contrast, skew, 02)         106574 non-null float64
(spectral_contrast, skew, 03)         106574 non-null float64
(spectral_contrast, skew, 04)         106574 non-null float64
(spectral_contrast, skew, 05)         106574 non-null float64
(spectral_contrast, skew, 06)         106574 non-null float64
(spectral_contrast, skew, 07)         106574 non-null float64
(spectral_contrast, std, 01)          106574 non-null float64
(spectral_contrast, std, 02)          106574 non-null float64
(spectral_contrast, std, 03)          106574 non-null float64
(spectral_contrast, std, 04)          106574 non-null float64
(spectral_contrast, std, 05)          106574 non-null float64
(spectral_contrast, std, 06)          106574 non-null float64
(spectral_contrast, std, 07)          106574 non-null float64
(spectral_rolloff, kurtosis, 01)      106574 non-null float64
(spectral_rolloff, max, 01)           106574 non-null float64
(spectral_rolloff, mean, 01)          106574 non-null float64
(spectral_rolloff, median, 01)        106574 non-null float64
(spectral_rolloff, min, 01)           106574 non-null float64
(spectral_rolloff, skew, 01)          106574 non-null float64
(spectral_rolloff, std, 01)           106574 non-null float64
(tonnetz, kurtosis, 01)               106574 non-null float64
(tonnetz, kurtosis, 02)               106574 non-null float64
(tonnetz, kurtosis, 03)               106574 non-null float64
(tonnetz, kurtosis, 04)               106574 non-null float64
(tonnetz, kurtosis, 05)               106574 non-null float64
(tonnetz, kurtosis, 06)               106574 non-null float64
(tonnetz, max, 01)                    106574 non-null float64
(tonnetz, max, 02)                    106574 non-null float64
(tonnetz, max, 03)                    106574 non-null float64
(tonnetz, max, 04)                    106574 non-null float64
(tonnetz, max, 05)                    106574 non-null float64
(tonnetz, max, 06)                    106574 non-null float64
(tonnetz, mean, 01)                   106574 non-null float64
(tonnetz, mean, 02)                   106574 non-null float64
(tonnetz, mean, 03)                   106574 non-null float64
(tonnetz, mean, 04)                   106574 non-null float64
(tonnetz, mean, 05)                   106574 non-null float64
(tonnetz, mean, 06)                   106574 non-null float64
(tonnetz, median, 01)                 106574 non-null float64
(tonnetz, median, 02)                 106574 non-null float64
(tonnetz, median, 03)                 106574 non-null float64
(tonnetz, median, 04)                 106574 non-null float64
(tonnetz, median, 05)                 106574 non-null float64
(tonnetz, median, 06)                 106574 non-null float64
(tonnetz, min, 01)                    106574 non-null float64
(tonnetz, min, 02)                    106574 non-null float64
(tonnetz, min, 03)                    106574 non-null float64
(tonnetz, min, 04)                    106574 non-null float64
(tonnetz, min, 05)                    106574 non-null float64
(tonnetz, min, 06)                    106574 non-null float64
(tonnetz, skew, 01)                   106574 non-null float64
(tonnetz, skew, 02)                   106574 non-null float64
(tonnetz, skew, 03)                   106574 non-null float64
(tonnetz, skew, 04)                   106574 non-null float64
(tonnetz, skew, 05)                   106574 non-null float64
(tonnetz, skew, 06)                   106574 non-null float64
(tonnetz, std, 01)                    106574 non-null float64
(tonnetz, std, 02)                    106574 non-null float64
(tonnetz, std, 03)                    106574 non-null float64
(tonnetz, std, 04)                    106574 non-null float64
(tonnetz, std, 05)                    106574 non-null float64
(tonnetz, std, 06)                    106574 non-null float64
(zcr, kurtosis, 01)                   106574 non-null float64
(zcr, max, 01)                        106574 non-null float64
(zcr, mean, 01)                       106574 non-null float64
(zcr, median, 01)                     106574 non-null float64
(zcr, min, 01)                        106574 non-null float64
(zcr, skew, 01)                       106574 non-null float64
(zcr, std, 01)                        106574 non-null float64
dtypes: float64(518)
memory usage: 422.0 MB

Loading genres

In [21]:
genres = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/genres.csv", index_col=0)
genres.head()
Out[21]:
#tracks parent title top_level
genre_id
1 8693 38 Avant-Garde 38
2 5271 0 International 2
3 1752 0 Blues 3
4 4126 0 Jazz 4
5 4106 0 Classical 5

Loading genres

In [14]:
tracks = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/tracks.csv", 
                     header=[0, 1], 
                     skipinitialspace=True, 
                     index_col=0)
tracks = tracks[[("track", "genres_all"), ("track", "genres"), ("track", "genre_top")]]
tracks.columns = [col[1] for col in tracks.columns]
tracks.head()
Out[14]:
genres_all genres genre_top
track_id
2 [21] [21] Hip-Hop
3 [21] [21] Hip-Hop
5 [21] [21] Hip-Hop
10 [10] [10] Pop
20 [17, 10, 76, 103] [76, 103] NaN

First, we can see that in the full dataset we have a lot of missing top genres. This is due to 2 things :

  • The audio doesn't have any genres
  • The audio have multiple main genres

So now we will remove sample without genres and get all main genres for all other sample.

In [15]:
tracks[tracks["genre_top"].isnull()]
Out[15]:
genres_all genres genre_top
track_id
20 [17, 10, 76, 103] [76, 103] NaN
26 [17, 10, 76, 103] [76, 103] NaN
30 [17, 10, 76, 103] [76, 103] NaN
46 [17, 10, 76, 103] [76, 103] NaN
48 [17, 10, 76, 103] [76, 103] NaN
185 [10, 27, 12, 76] [27, 76] NaN
236 [2, 92, 15] [15, 92] NaN
246 [10, 12, 76] [12, 76] NaN
248 [10, 12, 76] [12, 76] NaN
250 [10, 12, 76] [12, 76] NaN
251 [10, 12, 76] [12, 76] NaN
253 [10, 12, 76] [12, 76] NaN
254 [10, 12, 76] [12, 76] NaN
418 [1, 18, 1235, 38] [1, 18] NaN
420 [1, 18, 1235, 38] [1, 18] NaN
422 [1, 18, 1235, 38] [1, 18] NaN
423 [1, 18, 1235, 38] [1, 18] NaN
440 [4, 38, 15] [4, 15, 38] NaN
441 [4, 38, 15] [4, 15, 38] NaN
442 [4, 38, 15] [4, 15, 38] NaN
443 [4, 38, 15] [4, 15, 38] NaN
444 [4, 38, 15] [4, 15, 38] NaN
445 [4, 38, 15] [4, 15, 38] NaN
446 [4, 38, 15] [4, 15, 38] NaN
447 [4, 38, 15] [4, 15, 38] NaN
448 [4, 38, 15] [4, 15, 38] NaN
449 [4, 38, 15] [4, 15, 38] NaN
450 [4, 38, 15] [4, 15, 38] NaN
451 [4, 38, 15] [4, 15, 38] NaN
452 [4, 38, 15] [4, 15, 38] NaN
461 [3, 4, 37] [3, 37] NaN
462 [3, 4, 37] [3, 37] NaN
463 [3, 4, 37] [3, 37] NaN
464 [3, 4, 37] [3, 37] NaN
465 [3, 4, 37] [3, 37] NaN
554 [17, 10, 76, 103] [76, 103] NaN
555 [17, 10, 76, 103] [76, 103] NaN
556 [17, 10, 76, 103] [76, 103] NaN
557 [17, 10, 76, 103] [76, 103] NaN
558 [17, 10, 76, 103] [76, 103] NaN
559 [17, 10, 76, 103] [76, 103] NaN
560 [17, 10, 76, 103] [76, 103] NaN
561 [17, 10, 76, 103] [76, 103] NaN
613 [] [] NaN
871 [1235, 33, 107, 17] [33, 107] NaN
872 [1235, 33, 107, 17] [33, 107] NaN
873 [1235, 33, 107, 17] [33, 107] NaN
900 [250, 1235, 38] [250, 1235] NaN
901 [250, 1235, 38] [250, 1235] NaN
902 [250, 1235, 38] [250, 1235] NaN
903 [250, 1235, 38] [250, 1235] NaN
1076 [38, 22, 15] [15, 22] NaN
1213 [] [] NaN
1215 [184, 456, 38, 15] [38, 184, 456] NaN
1216 [] [] NaN
1217 [] [] NaN
1219 [184, 456, 38, 15] [38, 184, 456] NaN
1220 [38, 15] [15, 38] NaN
1222 [38, 15] [15, 38] NaN
1223 [184, 456, 38, 15] [38, 184, 456] NaN
1225 [184, 456, 38, 15] [38, 184, 456] NaN
1226 [38, 15] [15, 38] NaN
1227 [38, 15] [15, 38] NaN
1228 [184, 456, 38, 15] [38, 184, 456] NaN
1229 [38, 15] [15, 38] NaN
1230 [184, 456, 38, 15] [38, 184, 456] NaN
1231 [184, 456, 38, 15] [38, 184, 456] NaN
1232 [184, 456, 38, 15] [38, 184, 456] NaN
1233 [184, 456, 38, 15] [38, 184, 456] NaN
1234 [184, 456, 38, 15] [38, 184, 456] NaN
1235 [184, 456, 38, 15] [38, 184, 456] NaN
1236 [184, 456, 38, 15] [38, 184, 456] NaN
1237 [184, 456, 38, 15] [38, 184, 456] NaN
1238 [38, 15] [15, 38] NaN
1349 [74, 4, 38] [38, 74] NaN
1352 [74, 4, 38] [38, 74] NaN
1353 [74, 4, 38] [38, 74] NaN
1384 [8, 2] [2, 8] NaN
1429 [32, 38, 15] [15, 32] NaN
1574 [25, 10, 12] [10, 12, 25] NaN
1575 [25, 10, 12] [10, 12, 25] NaN
1865 [5, 15] [5, 15] NaN
1867 [5, 15] [5, 15] NaN
1869 [5, 15] [5, 15] NaN
1871 [5, 15] [5, 15] NaN
1873 [5, 15] [5, 15] NaN
1892 [17, 58, 12, 33] [17, 33, 58] NaN
2010 [2, 4, 118] [4, 118] NaN
2624 [42, 47, 38, 15] [15, 38, 42, 47] NaN
3256 [66, 12, 15] [15, 66] NaN
3257 [66, 12, 15] [15, 66] NaN
3258 [66, 12, 15] [15, 66] NaN
3259 [66, 12, 15] [15, 66] NaN
3260 [66, 12, 15] [15, 66] NaN
3261 [66, 12, 15] [15, 66] NaN
3262 [66, 12, 15] [15, 66] NaN
3276 [] [] NaN
3277 [10, 27, 12] [10, 27] NaN
3278 [10, 27, 12] [10, 27] NaN
3279 [10, 27, 12] [10, 27] NaN
... ... ... ...
155089 [10, 1235, 76, 15] [15, 76, 1235] NaN
155090 [10, 1235, 76, 15] [15, 76, 1235] NaN
155091 [10, 1235, 76, 15] [15, 76, 1235] NaN
155092 [10, 1235, 76, 15] [15, 76, 1235] NaN
155093 [10, 1235, 76, 15] [15, 76, 1235] NaN
155094 [10, 1235, 76, 15] [15, 76, 1235] NaN
155095 [10, 1235, 76, 15] [15, 76, 1235] NaN
155096 [10, 1235, 76, 15] [15, 76, 1235] NaN
155097 [10, 1235, 76, 15] [15, 76, 1235] NaN
155098 [10, 1235, 76, 15] [15, 76, 1235] NaN
155106 [2, 1235, 4, 15] [2, 4, 15, 1235] NaN
155109 [5, 38, 456, 107, 18, 1235] [5, 18, 107, 456, 1235] NaN
155110 [38, 107, 47, 15, 18, 1235, 286] [18, 47, 107, 286] NaN
155111 [38, 456, 236, 15, 18, 1235, 286] [15, 18, 236, 286, 456] NaN
155112 [38, 456, 12, 15, 18, 1235, 26] [15, 18, 26, 456, 1235] NaN
155113 [236, 15, 18, 1235, 286] [15, 18, 236, 286, 1235] NaN
155114 [18, 1235, 5, 15] [5, 15, 18, 1235] NaN
155115 [5, 38, 456, 107, 18, 1235, 659] [5, 18, 107, 456, 659] NaN
155116 [5, 107, 15, 18, 1235, 659, 286] [15, 18, 107, 286, 659] NaN
155119 [1235, 444, 5] [444, 1235] NaN
155121 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155122 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155123 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155124 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155125 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155126 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155127 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155128 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155129 [3, 103, 10, 12, 17] [3, 10, 12, 103] NaN
155162 [17, 18, 1235, 38] [17, 18, 38] NaN
155163 [17, 18, 1235, 38] [17, 18, 38] NaN
155164 [17, 18, 1235, 38] [17, 18, 38] NaN
155165 [17, 18, 1235, 38] [17, 18, 38] NaN
155166 [17, 18, 1235, 38] [17, 18, 38] NaN
155167 [17, 18, 1235, 38] [17, 18, 38] NaN
155168 [17, 18, 1235, 38] [17, 18, 38] NaN
155169 [17, 18, 1235, 38] [17, 18, 38] NaN
155170 [17, 18, 1235, 38] [17, 18, 38] NaN
155171 [17, 18, 1235, 38] [17, 18, 38] NaN
155172 [17, 18, 1235, 38] [17, 18, 38] NaN
155173 [17, 18, 1235, 38] [17, 18, 38] NaN
155174 [17, 18, 1235, 38] [17, 18, 38] NaN
155175 [17, 18, 1235, 38] [17, 18, 38] NaN
155176 [32, 3, 12, 38] [3, 12, 32] NaN
155178 [32, 3, 12, 38] [3, 12, 32] NaN
155179 [32, 3, 12, 38] [3, 12, 32] NaN
155180 [32, 3, 12, 38] [3, 12, 32] NaN
155181 [32, 3, 12, 38] [3, 12, 32] NaN
155189 [] [] NaN
155190 [] [] NaN
155191 [] [] NaN
155192 [] [] NaN
155193 [] [] NaN
155194 [] [] NaN
155195 [] [] NaN
155204 [17, 1235, 5] [5, 17, 1235] NaN
155205 [17, 1235, 5] [5, 17, 1235] NaN
155206 [17, 1235, 5] [5, 17, 1235] NaN
155214 [31, 12, 70, 15] [15, 31, 70] NaN
155215 [31, 12, 70, 15] [15, 31, 70] NaN
155216 [31, 12, 70, 15] [15, 31, 70] NaN
155217 [31, 12, 70, 15] [15, 31, 70] NaN
155218 [31, 12, 70, 15] [15, 31, 70] NaN
155219 [31, 12, 70, 15] [15, 31, 70] NaN
155220 [31, 12, 70, 15] [15, 31, 70] NaN
155221 [183, 38, 15] [15, 38, 183] NaN
155222 [183, 38, 15] [15, 38, 183] NaN
155223 [183, 38, 15] [15, 38, 183] NaN
155224 [183, 38, 15] [15, 38, 183] NaN
155225 [183, 38, 15] [15, 38, 183] NaN
155226 [183, 38, 15] [15, 38, 183] NaN
155227 [183, 38, 15] [15, 38, 183] NaN
155228 [183, 38, 15] [15, 38, 183] NaN
155229 [183, 38, 15] [15, 38, 183] NaN
155230 [183, 38, 15] [15, 38, 183] NaN
155231 [183, 38, 15] [15, 38, 183] NaN
155232 [183, 38, 15] [15, 38, 183] NaN
155233 [183, 38, 15] [15, 38, 183] NaN
155234 [1235, 41, 107, 38] [38, 41, 107] NaN
155245 [42, 107, 1235, 15] [42, 107] NaN
155246 [17, 1235, 9, 63] [17, 63, 1235] NaN
155248 [297, 12, 15, 240, 27] [15, 27, 240] NaN
155249 [10, 14] [10, 14] NaN
155259 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155260 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155261 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155262 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155263 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155264 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155265 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155266 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155267 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155268 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155269 [42, 107, 15, 1235, 183] [42, 107, 183] NaN
155275 [32, 38, 15] [15, 32, 38] NaN
155276 [32, 38, 15] [15, 32, 38] NaN
155277 [32, 38, 15] [15, 32, 38] NaN
155278 [42, 107, 1235, 15] [42, 107] NaN
155288 [] [] NaN
155320 [169, 10, 12, 9] [10, 12, 169] NaN

56976 rows × 3 columns

In [16]:
converter = genres[["top_level"]].to_dict()["top_level"]
naming = genres[["title"]].to_dict()["title"]

tracks["clean_class"] = tracks['genres_all'].str.replace("[^0-9,]", "").apply(clean_classes)
tracks = tracks[tracks["clean_class"] != "-1"]
In [17]:
tracks.head()
Out[17]:
genres_all genres genre_top clean_class
track_id
2 [21] [21] Hip-Hop Hip-Hop
3 [21] [21] Hip-Hop Hip-Hop
5 [21] [21] Hip-Hop Hip-Hop
10 [10] [10] Pop Pop
20 [17, 10, 76, 103] [76, 103] NaN Folk,Pop
In [18]:
tracks["clean_class"].value_counts()
Out[18]:
Rock                                                                                         14182
Experimental                                                                                 10608
Electronic                                                                                    9372
Electronic,Experimental                                                                       6851
Hip-Hop                                                                                       3552
Rock,Experimental                                                                             3118
Rock,Pop                                                                                      3024
Folk                                                                                          2803
Instrumental,Electronic                                                                       2694
Pop                                                                                           2332
Instrumental,Experimental                                                                     2198
Hip-Hop,Electronic                                                                            2114
Instrumental                                                                                  2079
Instrumental,Electronic,Experimental                                                          2046
Rock,Electronic                                                                               1794
Folk,Rock                                                                                     1559
Rock,Electronic,Experimental                                                                  1415
International                                                                                 1389
Classical                                                                                     1230
Folk,Experimental                                                                             1177
Jazz,Experimental                                                                             1121
Folk,Pop                                                                                      1116
Electronic,Pop                                                                                 874
Spoken,Experimental                                                                            692
Rock,Pop,Experimental                                                                          615
Folk,Rock,Pop                                                                                  613
Pop,Experimental                                                                               610
Rock,Instrumental                                                                              602
Jazz                                                                                           571
Classical,Experimental                                                                         566
Old-Time / Historic                                                                            554
International,Folk                                                                             532
Folk,Rock,Experimental                                                                         512
Electronic,Pop,Experimental                                                                    483
Pop,Electronic                                                                                 471
Electronic,Instrumental,Pop                                                                    445
International,Electronic                                                                       442
Hip-Hop,Electronic,Experimental                                                                439
Electronic,Rock,Pop                                                                            433
Spoken                                                                                         423
Folk,Country                                                                                   392
Instrumental,Classical                                                                         386
International,Experimental                                                                     360
Country,Rock                                                                                   358
International,Rock                                                                             347
Classical,Instrumental,Experimental                                                            322
Folk,Instrumental,Experimental                                                                 318
Folk,Instrumental                                                                              264
Instrumental,Pop                                                                               261
Rock,Electronic,Instrumental                                                                   247
Electronic,Jazz,Experimental                                                                   245
Soul-RnB,Electronic                                                                            234
Hip-Hop,Experimental                                                                           233
Classical,Folk,Instrumental,Experimental                                                       226
Blues,Rock                                                                                     223
Folk,Electronic,Experimental                                                                   220
Blues,Country,Folk                                                                             218
Instrumental,Rock,Experimental                                                                 207
Soul-RnB,Hip-Hop                                                                               203
Rock,Jazz,Experimental                                                                         200
Country                                                                                        194
Instrumental,Electronic,Classical                                                              187
Soul-RnB                                                                                       175
Experimental,Classical                                                                         169
Rock,Instrumental,Electronic                                                                   162
Folk,Blues                                                                                     154
International,Folk,Experimental                                                                153
International,Rock,Experimental                                                                140
Folk,Electronic                                                                                139
Folk,Blues,Rock,Pop                                                                            134
Folk,Country,Rock                                                                              127
Classical,Jazz,Experimental                                                                    124
Rock,Instrumental,Pop                                                                          123
Folk,Blues,Experimental                                                                        121
Electronic,Classical                                                                           121
Electronic,Hip-Hop,Pop                                                                         119
Hip-Hop,Electronic,Instrumental                                                                119
Instrumental,Hip-Hop                                                                           117
Hip-Hop,Instrumental                                                                           115
Rock,Instrumental,Experimental                                                                 115
International,Hip-Hop,Electronic                                                               114
Blues                                                                                          110
Hip-Hop,Pop                                                                                    106
International,Folk,Rock                                                                        105
Folk,Rock,Electronic                                                                           105
Rock,Jazz                                                                                      104
Rock,Hip-Hop                                                                                   104
Folk,Blues,Rock                                                                                103
International,Hip-Hop                                                                           98
International,Jazz                                                                              98
International,Pop                                                                               95
International,Electronic,Experimental                                                           93
Electronic,Instrumental,Pop,Experimental                                                        93
Folk,Electronic,Pop                                                                             92
Electronic,Instrumental,Hip-Hop                                                                 92
Instrumental,Jazz                                                                               89
Soul-RnB,Hip-Hop,Electronic                                                                     89
International,Old-Time / Historic                                                               89
Rock,Spoken,Experimental                                                                        88
International,Rock,Electronic                                                                   85
                                                                                             ...  
Folk,Country,Rock,Pop                                                                            2
Electronic,Easy Listening,Classical,Hip-Hop,Experimental                                         2
International,Rock,Spoken,Experimental                                                           2
Soul-RnB,Easy Listening,Pop                                                                      2
International,Jazz,Classical                                                                     2
Hip-Hop,Spoken,Experimental                                                                      2
International,Spoken                                                                             2
Soul-RnB,Blues,Rock                                                                              2
Country,Instrumental,Electronic                                                                  2
Electronic,Folk,Rock,Instrumental,Classical,Jazz,Experimental                                    2
International,Easy Listening,Hip-Hop                                                             2
International,Easy Listening,Electronic                                                          2
International,Instrumental,Rock                                                                  2
Folk,Jazz,Classical                                                                              2
Blues,Rock,Electronic                                                                            2
Soul-RnB,Electronic,Spoken,Jazz                                                                  2
Easy Listening,Rock,Pop                                                                          2
Easy Listening,Pop,Experimental                                                                  1
Folk,Electronic,Instrumental,Jazz                                                                1
Easy Listening,Spoken,Experimental                                                               1
International,Experimental,Classical                                                             1
Blues,Country,Folk,Instrumental                                                                  1
Hip-Hop,Rock,Electronic,Experimental                                                             1
Blues,Soul-RnB,Rock,Experimental                                                                 1
International,Easy Listening,Instrumental,Jazz                                                   1
Electronic,International,Folk,Easy Listening,Rock,Pop,Jazz,Experimental                          1
Folk,Instrumental,Electronic,Classical                                                           1
Blues,Electronic,Jazz                                                                            1
Easy Listening,Hip-Hop                                                                           1
Soul-RnB,Jazz,Pop                                                                                1
International,Country,Electronic                                                                 1
Pop,Rock,Jazz,Experimental                                                                       1
Folk,Rock,Electronic,Soul-RnB                                                                    1
Blues,Instrumental,Electronic                                                                    1
International,Folk,Easy Listening,Instrumental,Classical,Old-Time / Historic,Experimental        1
Folk,Easy Listening,Blues                                                                        1
Folk,Old-Time / Historic,Blues,Soul-RnB                                                          1
Spoken,Classical                                                                                 1
Blues,Electronic,Experimental                                                                    1
Old-Time / Historic,Instrumental,Electronic,Experimental                                         1
Instrumental,Spoken,Electronic,Experimental                                                      1
Blues,Electronic,Folk,Spoken                                                                     1
International,Instrumental,Experimental,Classical                                                1
International,Blues,Rock,Folk                                                                    1
Country,Spoken,Experimental                                                                      1
Folk,Soul-RnB,Pop                                                                                1
Rock,Instrumental,Spoken                                                                         1
Classical,Rock,Jazz,Experimental                                                                 1
Experimental,Spoken,Classical                                                                    1
International,Country,Rock                                                                       1
International,Soul-RnB,Experimental                                                              1
Easy Listening,Instrumental,Jazz,Classical                                                       1
Soul-RnB,Rock,Hip-Hop,Instrumental                                                               1
Electronic,Folk,Instrumental,Rock,Classical,Experimental                                         1
Spoken,Electronic,Experimental                                                                   1
Soul-RnB,Rock,Hip-Hop                                                                            1
Rock,Hip-Hop,Pop                                                                                 1
Soul-RnB,Old-Time / Historic,Rock                                                                1
Hip-Hop,Spoken,Instrumental                                                                      1
Electronic,Rock,Hip-Hop,Experimental                                                             1
Folk,Easy Listening,Rock                                                                         1
Country,Jazz,Pop                                                                                 1
International,Folk,Hip-Hop                                                                       1
Blues,Electronic,Soul-RnB,Jazz                                                                   1
Spoken,Instrumental,Electronic,Experimental                                                      1
International,Folk,Classical                                                                     1
Folk,Instrumental,Classical,Spoken,Experimental                                                  1
Electronic,Instrumental,Jazz,Experimental                                                        1
Pop,Spoken,Classical                                                                             1
International,Easy Listening,Instrumental                                                        1
International,Blues,Electronic                                                                   1
Blues,Country,Folk,Experimental                                                                  1
Instrumental,Rock,Pop,Experimental                                                               1
Old-Time / Historic,Instrumental                                                                 1
Blues,Easy Listening,Instrumental,Jazz                                                           1
International,Rock,Jazz,Instrumental                                                             1
Blues,Electronic,Classical                                                                       1
Soul-RnB,Hip-Hop,Pop                                                                             1
International,Instrumental,Spoken,Experimental                                                   1
Easy Listening,Jazz,Classical                                                                    1
Soul-RnB,Country                                                                                 1
Folk,Country,Blues                                                                               1
Electronic,Easy Listening,Country,Rock,Hip-Hop                                                   1
International,Soul-RnB,Rock,Pop                                                                  1
Folk,Jazz                                                                                        1
International,Folk,Instrumental,Electronic                                                       1
Folk,Easy Listening,Jazz                                                                         1
Rock,Instrumental,Hip-Hop,Experimental                                                           1
Folk,Country,Jazz,Classical                                                                      1
Blues,Jazz,Classical                                                                             1
Folk,Easy Listening,Electronic                                                                   1
Pop,Easy Listening,Jazz                                                                          1
Blues,Rock,Pop,Experimental                                                                      1
International,Country,Instrumental,Jazz                                                          1
Pop,Jazz,Electronic                                                                              1
Soul-RnB,Easy Listening                                                                          1
Folk,Rock,Instrumental,Classical,Experimental                                                    1
Blues,Old-Time / Historic,Soul-RnB,Spoken                                                        1
International,Spoken,Experimental                                                                1
Blues,Electronic,Instrumental,Jazz                                                               1
Name: clean_class, Length: 514, dtype: int64

Now all remaining tracks have at least 1 genre. The problem is that around 45% of songs have multiple genres like Pop + Rock. We cannot directly label them as is but we can use a BoW.

In [25]:
tracks_OHE = tracks["clean_class"].str.get_dummies(sep=',')
In [26]:
y = tracks_OHE.values
In [34]:
print("We now have {} observations and {} genres".format(*y.shape))
print("{:.02f}% of observation have more than 1 genre".format((y.sum(axis=1) > 1).mean()*100))
print("In average, a song has {:.02f} genres".format(y.sum(axis=1).mean()))
print("The balance of genre is :\n")
print(tracks_OHE.sum(axis=0))
We now have 104343 observations and 16 genres
52.47% of observation have more than 1 genre
In average, a song has 1.70 genres
The balance of genre is :

Blues                   1752
Classical               4106
Country                 1987
Easy Listening           730
Electronic             34413
Experimental           38154
Folk                   12706
Hip-Hop                 8389
Instrumental           14938
International           5271
Jazz                    4126
Old-Time / Historic      868
Pop                    13845
Rock                   32923
Soul-RnB                1499
Spoken                  1876
dtype: int64

The next step will require lot of memory so the best is to save this dataset, clean the memroy and restart from here

In [35]:
tracks_OHE.to_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/classes.csv")

Balancing the dataset

Dataset preparation

What we saw previously is that the dataset is very unbalanced. Unfortunately, there is no way to balance a BoW yet. So we will use a trick to do so. The idea is to duplicate features of audio having multiple genres, then keep only 1 genres for each. Then we can balance the dataset using classic solution of oversampling. A simpler visualisation is

To avoid memory issue, we will duplicate first only the genre dataset then join the feature based on the audio id.

In [36]:
classes = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/classes.csv")
In [37]:
to_merge = classes.columns[1:]
id_ = classes.columns[0]
In [38]:
temp = pd.melt(classes, id_vars = id_, value_vars = to_merge)
temp = temp[temp.value == 1]
temp = temp.drop("value", axis=1)
temp = temp.set_index("track_id")
In [40]:
temp.head()
Out[40]:
variable
track_id
461 Blues
462 Blues
463 Blues
464 Blues
465 Blues

This is it, now we have 1.7 x 103k observation as we have an average of 1.7 genres per audio. The balance is still the same as shown previously. We can join features now and then balance the dataset

In [30]:
features = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/features.csv", header=[0, 1, 2], skipinitialspace=True, index_col=0)
In [42]:
temp.columns = [("y", "y", "y")]  # required to join 2 dataset of same number of level
In [43]:
features = features.join(temp)
C:\python36\envs\machine_learning\lib\site-packages\pandas\core\reshape\merge.py:551: UserWarning: merging between different levels can give an unintended result (3 levels on the left, 1 on the right)
  warnings.warn(msg, UserWarning)
In [44]:
features.head()
Out[44]:
feature chroma_cens ... tonnetz zcr y
statistics kurtosis ... std kurtosis max mean median min skew std y
number 01 02 03 04 05 06 07 08 09 10 ... 05 06 01 01 01 01 01 01 01 y
track_id
2 7.180653 5.230309 0.249321 1.347620 1.482478 0.531371 1.481593 2.691455 0.866868 1.341231 ... 0.012226 0.012111 5.758890 0.459473 0.085629 0.071289 0.000000 2.089872 0.061448 Hip-Hop
3 1.888963 0.760539 0.345297 2.295201 1.654031 0.067592 1.366848 1.054094 0.108103 0.619185 ... 0.014212 0.017740 2.824694 0.466309 0.084578 0.063965 0.000000 1.716724 0.069330 Hip-Hop
5 0.527563 -0.077654 -0.279610 0.685883 1.937570 0.880839 -0.923192 -0.927232 0.666617 1.038546 ... 0.012691 0.014759 6.808415 0.375000 0.053114 0.041504 0.000000 2.193303 0.044861 Hip-Hop
10 3.702245 -0.291193 2.196742 -0.234449 1.367364 0.998411 1.770694 1.604566 0.521217 1.982386 ... 0.017952 0.013921 21.434212 0.452148 0.077515 0.071777 0.000000 3.542325 0.040800 Pop
20 -0.193837 -0.198527 0.201546 0.258556 0.775204 0.084794 -0.289294 -0.816410 0.043851 -0.804761 ... 0.022492 0.021355 16.669037 0.469727 0.047225 0.040039 0.000977 3.189831 0.030993 Folk

5 rows × 519 columns

Due to the join, we have audio without class, we can remove them (maybe a innerjoin would have been more clever...)

In [45]:
features = features[~features[("y", "y", "y")].isnull()]

Balancing

Now we have our dataset ready for the balance. What we can do upfront is to save both matrices (not dataset) and clean again the memory.

In [47]:
X = features.drop(("y", "y", "y"), axis=1).values
y = features[("y", "y", "y")].values
In [48]:
np.save("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/preprocessed_meta/new_X.npy", X)
np.save("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/preprocessed_meta/new_y.npy", y)
In [2]:
X = np.load("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/preprocessed_meta/new_X.npy")
y = np.load("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/preprocessed_meta/new_y.npy")
In [3]:
from imblearn.over_sampling import RandomOverSampler, SMOTE, ADASYN 
ros = RandomOverSampler(random_state=0)
X, y = ros.fit_sample(X, y)

Now our dataset should be balanced. Let'(s apply a Label Bisarizer and count each class

In [4]:
enc = LabelBinarizer()
y = enc.fit_transform(y.reshape(-1, 1))
In [5]:
y.sum(axis=0)
Out[5]:
array([38154, 38154, 38154, 38154, 38154, 38154, 38154, 38154, 38154,
       38154, 38154, 38154, 38154, 38154, 38154, 38154])
In [6]:
y.shape
Out[6]:
(610464, 16)

We properly have the same number of classes and the same number of observations. We can now apply the same pre-processing as the previous notebbok but with 610464 observations, there is no need to apply a PCA to reduce dimensions

Data preparation

In [7]:
X = MinMaxScaler().fit_transform(X)

An important thing to do also is to shuffle the dataset as it was sorted by class du to the melt and the RandomOverSampler. After that, the training will be done using a random split on the training set. But we will keep a validation set out.

In [8]:
X, y = shuffle(X, y, random_state=0)
In [9]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.15, random_state=42)
In [10]:
del X
del y

Model

In [12]:
K.clear_session()

model = Sequential()
model.add(Dense(256, input_dim=X_train.shape[1], activation='elu'))
model.add(Dropout(0.4))
model.add(Dense(64, activation='elu'))
model.add(Dropout(0.2))
model.add(Dense(16, activation='softmax'))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               132864    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                16448     
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 16)                1040      
=================================================================
Total params: 150,352
Trainable params: 150,352
Non-trainable params: 0
_________________________________________________________________
In [13]:
model.compile(loss='binary_crossentropy', 
              optimizer='Nadam', 
              metrics=["binary_crossentropy", "accuracy"])
In [14]:
history = model.fit(X_train, y_train, 
                      epochs=5, 
                      batch_size=1000, 
                      verbose=1, 
                      validation_split=0.2)
Train on 415115 samples, validate on 103779 samples
Epoch 1/5
415115/415115 [==============================] - 8s 20us/step - loss: 0.2142 - binary_crossentropy: 0.2142 - acc: 0.9391 - val_loss: 0.2018 - val_binary_crossentropy: 0.2018 - val_acc: 0.9401
Epoch 2/5
415115/415115 [==============================] - 5s 11us/step - loss: 0.1996 - binary_crossentropy: 0.1996 - acc: 0.9403 - val_loss: 0.1939 - val_binary_crossentropy: 0.1939 - val_acc: 0.9410
Epoch 3/5
415115/415115 [==============================] - 5s 12us/step - loss: 0.1945 - binary_crossentropy: 0.1945 - acc: 0.9408 - val_loss: 0.2054 - val_binary_crossentropy: 0.2054 - val_acc: 0.9386
Epoch 4/5
415115/415115 [==============================] - 5s 12us/step - loss: 0.1916 - binary_crossentropy: 0.1916 - acc: 0.9411 - val_loss: 0.1947 - val_binary_crossentropy: 0.1947 - val_acc: 0.9403
Epoch 5/5
415115/415115 [==============================] - 5s 11us/step - loss: 0.1887 - binary_crossentropy: 0.1887 - acc: 0.9414 - val_loss: 0.1878 - val_binary_crossentropy: 0.1878 - val_acc: 0.9413

Result

Stability is reach quite fast. we can now evaluate it on the validation set.

In [15]:
y_pred = model.predict(X_val, batch_size=500)
In [17]:
# just to clear memory

K.clear_session()
del X_train
del y_train
In [30]:
y_pred[10, :]
Out[30]:
array([0.0094475 , 0.00596818, 0.0035816 , 0.03833479, 0.19119968,
       0.03631313, 0.01211707, 0.24684896, 0.04468754, 0.17373377,
       0.01787476, 0.00033049, 0.04899017, 0.0214127 , 0.12618962,
       0.02297008], dtype=float32)

First if we look at prediction, we can see that there is no clear prediction, that starts bad... Let's look at the Confusion matrix

In [32]:
y_ohe = (y_pred == y_pred.max(axis=1)[:,None]).astype(int)
In [45]:
y_pred_classe = enc.inverse_transform(y_ohe)
y_val_classe = enc.inverse_transform(y_val)
In [55]:
def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    
cnf_matrix = confusion_matrix(y_val_classe, y_pred_classe)
np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,20))
plot_confusion_matrix(cnf_matrix, classes=enc.classes_)
plt.show()
Confusion matrix, without normalization

Strange for a prediction of 94%. Let's look at the accuracy on validation set manually

In [53]:
accuracy_score(y_val, y_ohe)
Out[53]:
0.30250081904553894

Analysis

Suprise... only 30%. Why the result is so low? My guess is due to the oversampling method. I was expecting the model to minimize the loss and prediction for example 50%/50% as 1 observation can be either one or the other. but with the oversampling, we introduce a lot of noise as a small variation of a Pop Rock song can be Rock only for example. This is clearly not a good solution.

Before to Finish, let's just try the model on unbalanced dataset. maybe the result will be better

In [6]:
features = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/features.csv", 
                       header=[0, 1, 2], skipinitialspace=True, index_col=0)
classes = pd.read_csv("F:/Nicolas/DNUPycharmProjects/machine_learning/audio/FMA/fma_metadata/classes.csv", index_col=0)
In [9]:
features = features[features.index.isin(classes.index)]
In [10]:
X = features.values
y = classes.values
In [13]:
X = MinMaxScaler().fit_transform(X)
In [14]:
X, y = shuffle(X, y, random_state=0)
In [15]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.15, random_state=42)
In [18]:
K.clear_session()

model = Sequential()
model.add(Dense(256, input_dim=X_train.shape[1], activation='elu'))
model.add(Dropout(0.4))
model.add(Dense(64, activation='elu'))
model.add(Dropout(0.2))
model.add(Dense(16, activation='softmax'))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               132864    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 64)                16448     
_________________________________________________________________
dropout_2 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 16)                1040      
=================================================================
Total params: 150,352
Trainable params: 150,352
Non-trainable params: 0
_________________________________________________________________
In [19]:
model.compile(loss='binary_crossentropy', 
              optimizer='Nadam', 
              metrics=["binary_crossentropy", "accuracy"])
In [21]:
history = model.fit(X_train, y_train, 
                      epochs=25, 
                      batch_size=1000, 
                      verbose=1, 
                      validation_split=0.2)
Train on 70952 samples, validate on 17739 samples
Epoch 1/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2626 - binary_crossentropy: 0.2626 - acc: 0.8973 - val_loss: 0.2527 - val_binary_crossentropy: 0.2527 - val_acc: 0.8965
Epoch 2/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2594 - binary_crossentropy: 0.2594 - acc: 0.8980 - val_loss: 0.2523 - val_binary_crossentropy: 0.2523 - val_acc: 0.8996
Epoch 3/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2584 - binary_crossentropy: 0.2584 - acc: 0.8982 - val_loss: 0.2487 - val_binary_crossentropy: 0.2487 - val_acc: 0.9000
Epoch 4/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2561 - binary_crossentropy: 0.2561 - acc: 0.8989 - val_loss: 0.2489 - val_binary_crossentropy: 0.2489 - val_acc: 0.8991
Epoch 5/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2543 - binary_crossentropy: 0.2543 - acc: 0.8994 - val_loss: 0.2495 - val_binary_crossentropy: 0.2495 - val_acc: 0.8993
Epoch 6/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2525 - binary_crossentropy: 0.2525 - acc: 0.8997 - val_loss: 0.2453 - val_binary_crossentropy: 0.2453 - val_acc: 0.9019
Epoch 7/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2524 - binary_crossentropy: 0.2524 - acc: 0.9000 - val_loss: 0.2511 - val_binary_crossentropy: 0.2511 - val_acc: 0.9015
Epoch 8/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2504 - binary_crossentropy: 0.2504 - acc: 0.9006 - val_loss: 0.2424 - val_binary_crossentropy: 0.2424 - val_acc: 0.9022
Epoch 9/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2492 - binary_crossentropy: 0.2492 - acc: 0.9008 - val_loss: 0.2415 - val_binary_crossentropy: 0.2415 - val_acc: 0.9010
Epoch 10/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2486 - binary_crossentropy: 0.2486 - acc: 0.9010 - val_loss: 0.2489 - val_binary_crossentropy: 0.2489 - val_acc: 0.9002
Epoch 11/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2481 - binary_crossentropy: 0.2481 - acc: 0.9010 - val_loss: 0.2386 - val_binary_crossentropy: 0.2386 - val_acc: 0.9010
Epoch 12/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2465 - binary_crossentropy: 0.2465 - acc: 0.9014 - val_loss: 0.2396 - val_binary_crossentropy: 0.2396 - val_acc: 0.9002
Epoch 13/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2462 - binary_crossentropy: 0.2462 - acc: 0.9017 - val_loss: 0.2381 - val_binary_crossentropy: 0.2381 - val_acc: 0.9032
Epoch 14/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2451 - binary_crossentropy: 0.2451 - acc: 0.9019 - val_loss: 0.2427 - val_binary_crossentropy: 0.2427 - val_acc: 0.9006
Epoch 15/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2446 - binary_crossentropy: 0.2446 - acc: 0.9022 - val_loss: 0.2363 - val_binary_crossentropy: 0.2363 - val_acc: 0.9024
Epoch 16/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2440 - binary_crossentropy: 0.2440 - acc: 0.9022 - val_loss: 0.2394 - val_binary_crossentropy: 0.2394 - val_acc: 0.9045
Epoch 17/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2435 - binary_crossentropy: 0.2435 - acc: 0.9024 - val_loss: 0.2436 - val_binary_crossentropy: 0.2436 - val_acc: 0.9010
Epoch 18/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2428 - binary_crossentropy: 0.2428 - acc: 0.9027 - val_loss: 0.2373 - val_binary_crossentropy: 0.2373 - val_acc: 0.9036
Epoch 19/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2421 - binary_crossentropy: 0.2421 - acc: 0.9030 - val_loss: 0.2395 - val_binary_crossentropy: 0.2395 - val_acc: 0.9027
Epoch 20/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2415 - binary_crossentropy: 0.2415 - acc: 0.9029 - val_loss: 0.2398 - val_binary_crossentropy: 0.2398 - val_acc: 0.9035
Epoch 21/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2415 - binary_crossentropy: 0.2415 - acc: 0.9031 - val_loss: 0.2328 - val_binary_crossentropy: 0.2328 - val_acc: 0.9037
Epoch 22/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2403 - binary_crossentropy: 0.2403 - acc: 0.9036 - val_loss: 0.2382 - val_binary_crossentropy: 0.2382 - val_acc: 0.9045
Epoch 23/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2400 - binary_crossentropy: 0.2400 - acc: 0.9036 - val_loss: 0.2355 - val_binary_crossentropy: 0.2355 - val_acc: 0.9038
Epoch 24/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2398 - binary_crossentropy: 0.2398 - acc: 0.9037 - val_loss: 0.2355 - val_binary_crossentropy: 0.2355 - val_acc: 0.9030
Epoch 25/25
70952/70952 [==============================] - 1s 11us/step - loss: 0.2399 - binary_crossentropy: 0.2399 - acc: 0.9036 - val_loss: 0.2370 - val_binary_crossentropy: 0.2370 - val_acc: 0.9028
In [22]:
y_pred = model.predict(X_val, batch_size=500)
In [23]:
K.clear_session()
del X_train
del y_train
In [24]:
y_pred[10, :]
Out[24]:
array([1.6993685e-02, 4.4145552e-03, 3.2401435e-02, 2.9894384e-03,
       7.4777277e-03, 6.7571200e-02, 6.5861233e-02, 1.1246969e-03,
       1.0620688e-02, 3.3953063e-02, 4.9277544e-02, 7.3500560e-05,
       5.6307834e-02, 6.4064819e-01, 8.4297983e-03, 1.8553571e-03],
      dtype=float32)
In [25]:
y_ohe = (y_pred == y_pred.max(axis=1)[:,None]).astype(int)
In [26]:
accuracy_score(y_val, y_ohe)
Out[26]:
0.2862892921032456

Only 28% ... still low but we should keep in mind that we may predict multiclass so we should consider as true all class for example above 0.3

In [44]:
y_ohe2 = (y_pred>0.32).astype(int)
accuracy_score(y_val, y_ohe2)
Out[44]:
0.22961921799131102
In [48]:
x_axis = np.linspace(0.1, 0.5, 20)
y_axis = []
for x in x_axis:
    y_ohe2 = (y_pred>x).astype(int)
    y = accuracy_score(y_val, y_ohe2)
    y_axis.append(y)

plt.figure(figsize=(20,12))
plt.plot(x_axis, y_axis)
plt.show()

We can see that the best accuracy is reached at 0.23 threshold but it's only 26%... We cannot do the confusion matrix as we are multilabel but here we can see that the result is not very good too. The good point is that we are sure there is no overfitting due to Oversampling or duplicate records. Nevertheless the result is not good too. One improvement we can do is to look at WaveNet model to extract features in a latent space and maybe the result will be the best as it uses the state of arts in term of model