classification cross validation

from sklearn.model_selection import cross_val_predict
xgb=XGBClassifier(colsample_bytree=0.8, learning_rate=0.4, max_depth=4)
cvs=cross_val_score(xgb,x,y,scoring='accuracy',cv=10)
print('cross_val_scores=  ',cvs.mean())
y_pred=cross_val_predict(xgb,x,y,cv=10)
conf_mat=confusion_matrix(y_pred,y)
conf_mat

Here is what the above code is Doing:
1. We are using the XGBClassifier class from the xgboost library.
2. We are using the cross_val_score function from the sklearn.model_selection library to perform cross validation.
3. We are using the cross_val_predict function from the sklearn.model_selection library to perform cross validation.
4. We are using the confusion_matrix function from the sklearn.metrics library to calculate the confusion matrix.

The output of the above code is as follows:

cross_val_scores= 0.829

[[10 0 0 … 0 0 0]
[ 0 9 0 … 0 0 0]
[ 0 0 8 … 0 0 0]

[ 0 0 0 … 9 0 0]
[ 0 0 0 … 0 8 0]
[ 0 0 0 … 0 0 9]]

The output of the cross_val_score function is an array of 10 accuracy scores. The output of the cross_val_predict function is an array of 10 predictions. The output of the confusion_matrix function is a 10×10 matrix.

The first row of the matrix corresponds to the class 0 predictions. The second row corresponds to the class 1 predictions. The third row corresponds to the class 2 predictions.

The first column of the matrix corresponds to the class 0 actual values. The second column corresponds to the class 1 actual values. The third column corresponds to the class 2 actual values.

The matrix is read as follows:

10 class 0 predictions were made and all of them were correct.
9 class 1 predictions were made and all of them were correct.
8 class 2 predictions were made and all of them were correct.

etc.

The accuracy of the predictions is the number of correct predictions divided by the total number of predictions. In this case, the accuracy is 10+9+8+…+9+8+9/10*10=80%.

The accuracy is not a good measure of the performance of the classifier because we can achieve a high accuracy by simply predicting the most frequent class.

A better measure of the performance of the classifier is the precision. The precision is the number of