from PIL import Image import torch from torchvision import transforms input_image = Image.open("img/danbooru_resnet1.png") # load an image of your choice preprocess = transforms.Compose([ transforms.Resize(360), transforms.ToTensor(), transforms.Normalize(mean=[0.7137, 0.6628, 0.6519], std=[0.2970, 0.3017, 0.2979]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model if torch.cuda.is_available(): input_batch = input_batch.to('cuda') model.to('cuda') with torch.no_grad(): output = model(input_batch) # The output has unnormalized scores. To get probabilities, you can run a sigmoid on it. probs = torch.sigmoid(output[0]) # Tensor of shape 6000, with confidence scores over Danbooru's top 6000 tags # Second part of the code # to set a threshold value and to use it tmp = probs[probs > thresh] inds = probs.argsort(descending=True)
Here is what the above code is Doing:
1. We’re using the model to get the output for the image.
2. We’re using the sigmoid function to get the confidence scores for each tag.
3. We’re setting a threshold value and using it to get the tags with confidence scores above the threshold.
4. We’re sorting the tags in descending order of confidence scores.