
Recently, PyTorch has gained a lot of popularity due to its ease of use and learning. Andrej Karpathy, Senior Director of AI at Tesla, said the following in his tweet.

Jokes aside, PyTorch is very transparent and can help researchers and data scientists achieve high productivity and reliable results.
This article is part of the following series:
PyTorch for beginners |
---|
PyTorch for Beginners: The Basics |
PyTorch for Beginners: Image Classification with Pre-trained Models |
Image classification with transfer learning in PyTorch |
PyTorch Model Inference using ONNX and Caffe2 |
PyTorch for Beginners: Semantic Segmentation with Torchvision |
object detection |
instance targeting |
In this post, we discuss image classification in PyTorch. We use a subset of theCalTech256Dataset for classifying images of 10 animals. We'll walk through the steps of preparing the dataset, augmenting the data, and then the steps to create the classifier. we usetransfer learningto use low-level image resources such as borders, textures, etc. They are learned from a pre-trained model, ResNet50, and train our classifier to learn the higher level details in our dataset images like eyes, legs, etc. The ResNet50 has already been trained on ImageNet with millions of images.
Don't worry about functions and code. The post contains code snippets to make it easier to study and understand. In addition, the complete code was made available through a Python notebook (subscribe and download for free). Before we dive into the article, here's a video about it.Pytorch image ratingto keep motivating you. As this video shows, you can make your own "zoo sorter"!
While we tried to make the post original, we still encourage readers to familiarize themselves with it.Basics of Pytorchbefore proceeding.
Registration Preparation
OCalTech256The dataset contains 30,607 images categorized into 256 different labeled classes, along with another "unorganized" class.
Training the entire dataset will take hours. Therefore, we will work on a subset of the dataset that contains 10 animals -Bear, chimpanzee, giraffe, gorilla, llama, ostrich, porcupine, skunk, triceratops,EZebra. In this way, we can experiment faster. The code can also be used to train the entire dataset.
The number of images in these folders ranges from 81 (for a possum) to 212 (for a gorilla). We use the first 60 images in each of these categories for training. The next 10 images are for validation and the rest are for testing in our experiments below.
Finally, we have 600 training images, 100 validation images, 409 test images, E10 Klassenof the animals.
If you want to repeat the experiments, follow the steps below
- Download theCalTech256record
- Create three named directoriestrain, validEtest.
- Create 10 subdirectories each within the train and test directories. Subdirectories must be namedBear, chimpanzee, giraffe, gorilla, llama, ostrich, porcupine, skunk, triceratopsEZebra.
- Move the first 60 bear images in the Caltech256 dataset into the train/bear directory. Repeat this step for each animal.
- Move the next 10 bear images in the Caltech256 dataset into the valid/bear directory. Repeat this step for each animal.
- Copy the rest of the images for bear (that is, those that are not on a train or in valid folders) into the test/bear directory. Repeat this step for each animal.
NEXT ON KICKSTARTER - Mastering AI Art Generation with Diffusion TemplatesCreate AI-generated art like a pro. Gain first-mover advantage with new courses from OpenCV; designed for artists and programmers.
data extension
The images in the available training set can be modified in many ways to incorporate more variation into the training process. This way, the trained model becomes more generalized and works well on different test data. Also, input data can be of various sizes. They must be normalized to a fixed size and format before batches of data are used together for training.
First, each of the input images undergoes many transformations. We're trying to add some variation by introducing some randomness into the transformations. At each epoch, a single set of transformations is applied to each image. When we train for multiple epochs, the models see more variations in the input images with a new random variation of the transform in each epoch. This leads to data expansion and the model tries to generalize further.
Below is an example of transformed versions of a Triceratops image.

Let's go through the transformations we used for our data expansion.
The transformationRandomResizedCropcrops the input image to a random size (within a scaling range of 0.8 to 1.0 of the original size and a random aspect ratio in the default range of 0.75 to 1.33). The cropped image is then scaled to 256×256.
random rotationrotates the image by a random angle in the range of -15 to 15 degrees.
RandomHorizontalFlipRandomly flips the image horizontally with a default probability of 50%.
CenterCropcrops a 224×224 image from the center.
totensorconverts the PIL image to a floating point tensor with values in the range 0-255 and normalizes them to a range of 0-1 by dividing by 255.
NormalizeTake a 3-channel tensor and normalize each channel by the input mean and standard deviation for that channel. Mean and standard deviation vectors are entered as 3-element vectors. Each channel in the tensor is normalized as T = (T - mean)/(standard deviation)
All the above transformations are concatenated withTo compose.
download codeTo easily follow this tutorial, download the code by clicking the button below. It's free!
download code
# Anwenden von Transformationen auf das Datenbild_transforms = { 'train': transforms.Compose([ transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)), transforms.RandomRotation(degrees=15), transforms.RandomHorizontalFlip(), transforms .CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'gültig': transforms.Compose([ transforms.Resize ( size=256), transforms.CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ]), 'test': transforms. Compose ([ transforms.Resize(size=256), transforms.CenterCrop(size=224), transforms.ToTensor(), transforms.Normalize([0,485, 0,456, 0,406], [0,229, 0,224, 0,225]) ])}
Please note that for validation and test data we do not do thisRandomResizedCrop,random rotationERandomHorizontalFliptransformations. Instead, we resized the validation images to 256×256 and cropped the center 224×224 for use with the pretrained model. Finally, the image is converted into a tensor and normalized by the mean and standard deviation of all images in ImageNet.
loading data
Next, we will see how to use the transformations defined above and load the data to be used in training.
# Load data# Set train and valid directory pathsetrain_directory = 'train'valid_directory = 'test'# Batch sizebs = 32# Number of classnum_classes = 10# Load data from foldersdata = { 'train': datasets.ImageFolder(root= train_directory, transform =image_transforms['train']), 'valid': datasets.ImageFolder(root=valid_directory, transform=image_transforms['valid']), 'test': datasets.ImageFolder(root=test_directory, transform=image_transforms[ 'test ' ])}# Size of data to use for calculating average loss and accuracy train_data_size = len(data['train'])valid_data_size = len(data['valid'])test_data_size = len(data[ 'test']) # Create iterators for the data loaded using DataLoader moduletrain_data = DataLoader(data['train'], batch_size=bs, shuffle=True)valid_data = DataLoader(data['valid'], batch_size=bs, shuffle= True)test_data = DataLoader(data ['test'], batch_size=bs, shuffle=True)# Print size of training, validation and test data ßentrain_data_size, valid_data_size, test_data_size
First, we define the training and validation data directories and batch size. Then we load themDataLoader. Note that the previously discussed image transforms are applied to the data as it is loaded with the DataLoader. The data order is also shuffled. The Torchvision.transforms package and the DataLoader are the core features of PyTorch that make data enrichment and loading processes much easier.
transfer learning
It is very difficult and time consuming to collect images belonging to an area of interest and train a classifier from scratch. We then used a pre-trained model as a base and modified the final layers to classify the images according to our desired classes. This helps us to get good results even with a small dataset as the basic image features have already been learned in the pre-trained model from a much larger dataset like ImageNet.

As we can see in the image above, the inner layers remain the same as in the pre-trained model and only the final layers are modified to suit our number of classes. In this work, we use the pre-trained ResNet50 model.
# Load pretrained ResNet50 Modelresnet50 = models.resnet50(pretrained=True)
Canzianiet al.List several pre-trained models used for various practical applications and analyze the achieved accuracy and required inference time for each model. The ResNet50 is a model with a good compromise between accuracy and inference time. When a model is loaded into PyTorch, all its parameters have the "requires_grad" field set to "true" by default. This means that any changes to parameter values are saved for use in the backpropagation graph used for training. This increases the memory requirement. Since most of the parameters in our pre-trained model were already trained, we reset themgrade_requiredfield for false.
# Freeze the model parameters for param in resnet50.parameters(): param.requires_grad = False
We then replaced the last layer of the ResNet50 model with a small group of sequential layers. Inputs to the last fully connected layer of the ResNet50 are fed to a linear layer. It has 256 outputs which are then fed into ReLU and dropout layers. This is followed by a 256×10 linear layer with 10 outputs corresponding to the 10 classes in our CalTech subset.
# Modify the last level of the ResNet50 model for transfer Learningfc_inputs = resnet50.fc.in_featuresresnet50.fc = nn.Sequential( nn.Linear(fc_inputs, 256), nn.ReLU(), nn.Dropout(0.4), nn.Linear ( 256, 10), nn.LogSoftmax(dim=1) # To use NLLoss())
Since we will be training on a GPU, let's prepare the model for the GPU.
# Convert model to be used in GPUresnet50 = resnet50.to('cuda:0')
Next, we define the loss function and optimizer to use for training. PyTorch offers a variety ofloss functions. We use the Negative Loss Probability function because it is useful for classifying various classes. PyTorch also supports severalOptimizer. We use the Adam optimizer. Adam is one of themthe most popular optimizers because it can adjust the learning rate for each parameter individually.
# define the optimizer and loss functionloss_func = nn.NLLLoss()optimizer = optim.Adam(resnet50.parameters())
Training
The complete training code is in the Python notebook, but we'll cover the main concept here. Training is performed for a fixed set of epochs, with each image processed once in a single epoch. The training data loader loads data in batches. In our case, we specified a stack size of 32. This means that each batch can contain a maximum of 32 images.
For each batch, input images are passed through the template, also known aspass to frontto get the outputs. So the providedloss criteriaor the cost function is used to calculate the loss using actual data and calculated expenses. Gradients of the loss against the trainable parameters are also calculated.backFunction. Note that in transfer learning we only need to compute gradients for a small set of parameters belonging to the few newly added layers at the end of the model. A summary function call to the model can reveal the actual number of parameters and the number of trainable parameters. The benefit of this approach is that we now only need to train about a tenth of the total number of model parameters.

The calculation of the gradient is done withautogradeEBackpropagation, differentiate on the graph according to the chain rule. PyTorch collects all gradients on the reverse pass. Therefore, it is important to set them to zero at the beginning of the training loop. This is achieved with the help of the optimizer.null_gradFunction. Finally, after the gradients are calculated in reverse, the parameters are updated with the optimizersStageFunction.
The total loss and precision are calculated for the entire batch, which is then averaged over all batches to obtain the loss and precision values for the entire season.
for epoch in range(epochs): epoch_start = time.time() print("epoch: {}/{}".format(epoch+1, epochs)) # set to training mode model.train() # loss and accuracy within epoch train_loss = 0.0 train_acc = 0.0 valid_loss = 0.0 valid_acc = 0.0 for i, (inputs, labels) in enumerate(train_data_loader): input = input.to(device) labels = labels .to(device) # Clear existing gradient Optimizers .zero_grad () # Forward Pass - Calculate outputs on input data using model output = model(inputs) # Calculate loss loss = loss_criterion(outputs, labels) # Backpropagate gradients loss.backward() # Update parameters optimizer.step( ) # Calculate total stack loss and add it to train_loss train_loss += loss.item() * input.size(0) # Calculate precision ret, predictions = Torch.max(Outputs.Data, 1) correct_counts = Predictions .eq (labels.data.view_as(predictions)) # Convert correct_counts to float and then average acc = Torch.mean(correct_counts.type(torch. F loatTensor)) # Calculate the total accuracy across the entire batch and add to train_acc train_acc += acc.item() * input.size(0) print( "Batch number: {:03d}, Training: Loss: {:. 4f}, Precision: {: .4f}".format(i, loss.item(), acc.item()))
validation
If training is carried out for more epochs, the model tends to overfit the data, resulting in poor performance on new test data. It's important to keep a separate validation set so we can stop training at the right point and avoid overfitting. Validation is performed in each epoch immediately after the training loop. Since we don't need any gradient calculations in the validation process, this is done inside a Torch.no_grad() block.
For each validation batch, inputs and labels are sent to the GPU (if cuda is available, otherwise they are sent to the CPU). The inputs go through the direct pass, followed by the loss and precision calculations for the stack, and at the end of the loop for the entire epoch.
# Validation - No gradient tracing required with Torch.no_grad(): # Set to evaluation mode model.eval() # Validation loop for j, (inputs, labels) in enumerate(valid_data_loader): Inputs = Inputs. to(Device) Labels = labels.to(device) # Forward Pass - Calculate outputs from input data using model output = model(inputs) # Calculate loss loss = loss_criterion(outputs, labels) # Calculate total loss for the stack and add add to valid_loss valid_loss += loss.item() * input.size(0) # calculate the accuracy of the ret validation, predictions = Torch.max(outputs.data, 1) correct_counts = predicts.eq(labels. data.view_as(Predictions) ) # Convert correct_counts to float then average acc = Torch.mean(correct_counts.type(torch.FloatTensor)) # Calculate the total precision across the entire batch and add to valid_acc valid_acc += acc. item() * input . size(0) print("Validation batch number: {:03d}, Validation: Loss: {:.4f}, Accuracy: {:.4f}.format(j, loss.item(), acc.item( ))) # find average training loss u training accuracy avg_train_loss = train_loss/train_data_size avg_train_acc = train_acc/float(train_data_size) # Find average training loss and training precision epoch_end = time.time() print("epoch: {:03d }, training: loss: {:.4f}, accuracy : {:.4f}%, nttValidation : Loss: {:.4f}, Precision: {: .4f}%, Time: {:.4f}s". format(epoch, avg_train_loss, avg_train_acc*100, avg_valid_loss, avg_valid_acc*100, epoch_end-epoch_start))


As we can see in the graphs above, the validation and training losses quickly level off for this dataset. Accuracy also increases very quickly up to the 0.9 range. As the number of epochs increases, the training loss continues to decrease, leading to overfitting, but the validation results do not improve significantly. So we choose the model era with higher accuracy and lower loss. We better stop early to avoid overfitting the training data. In our case, we chose epoch #8, which had a validation accuracy of 96%.
Opremature withdrawalThe process can also be automated. We can stop as soon as the loss is below a threshold and the validation accuracy does not improve for a given set of epochs.
inference
Once we have the model, we can infer individual test images or the entire test dataset to get test accuracy. Calculating test set accuracy is similar to validation code, except that it is performed on the test data set. We added the featurecomputaTestSetAccuracyin python notebook for the same. We'll discuss how to find the output class for a specific test image below.
An input image is first subjected to all transformations used for validation/test data. The resulting tensor is then transformed into a four-dimensional tensor and passed through the model, which generates the logarithmic probabilities for different classes. An exponential function of the model outputs gives us the class probabilities. then we choose the class with the highest probability as our output class.
Choose the class with the highest probability as our output class.
def predição(modelo, test_image_name): transform = image_transforms['test'] test_image = Image.open(test_image_name) plt.imshow(test_image) test_image_tensor = transform(test_image) if Torch.cuda.is_available(): test_image_tensor = test_image_tensor.view (1, 3, 224, 224).cuda() else: test_image_tensor = test_image_tensor.view(1, 3, 224, 224) with Torch.no_grad(): model.eval() # model retorna log probabilities = model( test_image_tensor ) ps = Torch.exp(out) topk, topclass = ps.topk(1, dim=1) print("Classe de saída: ", idx_to_class[topclass.cpu().numpy()[0][0]] )
On a test set of 409 images, an accuracy of 92.4% was achieved.
NEXT ON KICKSTARTER - Mastering AI Art Generation with Diffusion TemplatesCreate AI-generated art like a pro. Gain first-mover advantage with new courses from OpenCV; designed for artists and programmers.
Below are some of the ranking results on new test data that were not used in training or validation. The best predicted classes for the images with their probability values are shown in the upper right corner. As we see below, the predicted class with the highest probability is usually the correct one. Also note that the second most likely class out of all the remaining 9 classes is usually the animal closest to the actual class.










We just saw how to use a pre-trained model trained on 1000 classes by ImageNet. It classifies in a very efficient way the images belonging to the 10 different classes of our interest.
We show the classification results on a small dataset. In a future post, we'll apply the same transfer learning approach to more difficult datasets to solve more difficult real-world problems. Stay tuned!
Subscription code and download
If you liked this article and would like to download the code (C++ and Python) and sample images used in this post, click here. Alternatively, sign up for a free computer vision resource guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, as well as algorithms and news about computer vision and machine learning.
Download the sample code
Saber
I would like to thank our Kushashwa intern Ravi Shrimali for writing the code for this post.