torch split classes stratified 1

torch split classes stratified

X = np.random.randn(1000, 2)
y = np.random.randint(0, 10, size=1000)

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1, stratify=y)
np.unique(y_train, return_counts=True)
np.unique(y_val, return_counts=True)

train_dataset = Dataset(X_train, y_train, ...)
train_loader = DataLoader(train_dataset, ...)

Here is what the above code is Doing:
1. Generate random data
2. Split the data into train and validation sets
3. Create a Dataset object for the training set
4. Create a DataLoader object for the training set

The Dataset object is a wrapper around the training data. It’s a class that you can define yourself, and it must implement two methods:

__len__(self)
__getitem__(self, idx)

The __len__ method should return the length of the dataset, and the __getitem__ method should return the data point at index idx.

The DataLoader object is a wrapper around the Dataset object. It’s a class that you don’t have to define yourself, and it implements the following methods:

__iter__(self)
__next__(self)

The __iter__ method returns an iterator for the dataset, and the __next__ method returns the next data point.

The DataLoader object is an iterator, so you can loop over it.

for batch in train_loader:
# Do something with the batch
pass

The batch is a tuple of the form (inputs, targets).

Similar Posts