As such, it is desirable to split the dataset into train and test sets in a way
that preserves the same proportions of examples in each class as observed in the
original dataset.
from sklearn.model_selection import train_test_split
# Create training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3,
random_state=42,stratify=y)