If your Yi's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]
But if your Yi's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]
# example
model = Sequential([
Dense(16, input_shape=(1,), activation='relu'), # the relu activation takes the max between 0 and x
Dense(32, activation='relu'),
Dense(2, activation='sigmoid'), # the sigmoid activation convert the number into number between 0 to 1
])
# the loss function is the sparse categorical crossentropy
model.compile(Adam(lr=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracyt'])