X = np.random.randn(1000, 2) y = np.random.randint(0, 10, size=1000) X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.1, stratify=y) np.unique(y_train, return_counts=True) np.unique(y_val, return_counts=True) train_dataset = Dataset(X_train, y_train, ...) train_loader = DataLoader(train_dataset, ...)
Here is what the above code is Doing:
1. Generate random data
2. Split the data into train and validation sets
3. Create a Dataset object for the training set
4. Create a DataLoader object for the training set
The Dataset object is a wrapper around the training data. It’s a class that you can define yourself, and it must implement two methods:
The __len__ method should return the length of the dataset, and the __getitem__ method should return the data point at index idx.
The DataLoader object is a wrapper around the Dataset object. It’s a class that you don’t have to define yourself, and it implements the following methods:
The __iter__ method returns an iterator for the dataset, and the __next__ method returns the next data point.
The DataLoader object is an iterator, so you can loop over it.
for batch in train_loader:
# Do something with the batch
The batch is a tuple of the form (inputs, targets).