pandas resamples stratified by columns values 1

pandas resamples stratified by columns values

# stratified split dataset by column
from sklearn.model_selection import train_test_split
df_sample, df_drop_it = train_test_split(df, train_size =0.2, stratify=df['country'])

Here is what the above code is Doing:
1. We’re using the train_test_split function from the sklearn.model_selection library.
2. We’re splitting the df dataframe into two new dataframes. The first one, df_sample, will contain 20% of the original dataframe. The second one, df_drop_it, will contain the other 80%.
3. We’re using the stratify parameter to split the data by the country column. This means that the country column will have an even distribution in both the df_sample and df_drop_it dataframes.

Similar Posts