COCO Dataset - use Faster RCNN + MobileNet to conduct Object Detection (2024)

Task 2
2. Randomly Select Ten Images from the Dataset: Pick 10 images randomly from this dataset.

Let’s begin by importing the necessary libraries.

COCO provides an API to access datasets. By providing it with a JSON file, we can easily retrieve the necessary information such as images, labels, bounding boxes, and more.

To ensure familiarity with the COCO-provided API, here’s an exercise focusing on the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

# Create a function that, given an image ID, plots the image with bounding boxes and labels
def plot_image_with_annotations(coco, cocoRoot, dataType, imgId, ax=None):
# Get image information
imgInfo = coco.loadImgs(imgId)[0]
# Get image location for visualization
imPath = os.path.join(cocoRoot, dataType, imgInfo['file_name'])
# Read the image
im = cv2.imread(imPath)
# Convert color space: OpenCV defaults to BGR, but matplotlib to RGB, so conversion is needed
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)

# Find all annotations (bounding boxes) for the image
annIds = coco.getAnnIds(imgIds=imgInfo['id'])
# Load all annotation information: bounding box coordinates, labels, accuracies
anns = coco.loadAnns(annIds)
all_labels = set()

# Extract bounding box coordinates, labels, accuracies
for ann in anns:
# Specifically select information related to the bounding box: returns (x, y) of the lower-left corner, width, height
x, y, w, h = ann['bbox']

# Get label text information: load category name by category ID
label = coco.loadCats(ann['category_id'])[0]["name"]
all_labels.add(label)

# Draw bounding boxes using provided coordinates
rect = Rectangle((x, y), w, h, linewidth=2, edgecolor='r', facecolor='none')

# Draw the image: if sorting of images is needed, ax parameter specifies the position
if ax is None:
plt.gca().add_patch(rect)
plt.text(x, y, f'{label}', fontsize=10, color='w', backgroundcolor='r')
else:
ax.add_patch(rect)
ax.text(x, y, f'{label}', fontsize=10, color='w', backgroundcolor='r')

# Display the image with a title
if ax is None:
plt.imshow(im)
plt.axis('off')
plt.title(f'Annotations: {all_labels}', color='r')
plt.show()
else:
ax.axis('off')
ax.set_title(f'Annotations: {all_labels}', color='r', loc='center', pad=20)
ax.imshow(im)


# Get the tenth image
imgIds = coco.getImgIds()
imgId = imgIds[10]
# Plot the image with annotations
plot_image_with_annotations(coco, cocoRoot, dataType, imgId)

Task 3 & 5
3. Predicting bboxes using the pre-trained model FasterR-CNN:Use a pre-trained version of Faster R-CNN (Resnet50 backbone) to predict the bounding box
of objects on the 10 images. Only keep regions that have a score > 0.8.
5. Using another pre-trained model Mobilnet:Repeat the steps from above using a Mobilenet backbone for the Faster R-CNN.

We need to be able to take the image out of the picture based on the position of the image. Then we need to convert the image read from the book into a tensor before we can put it into the model for prediction. So we made two functions:

After the pre-training model is loaded, we need to train the model. The training process is as follows:

After the model is trained, we need to select the prediction results that are greater than 0.8. The reason is that the model will predict a lot of bounding boxes, but we only need the bounding boxes with high accuracy. So we need to filter out the bounding boxes with low accuracy. The code is as follows:

The following program is the procedure described above, we will draw the results of both models and calculate the average of the IoU.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import matplotlib.pyplot as plt
from PIL import Image
import os


# Can put the results of different models into this function, and it will return the average value of IoU
def display_annotated_results(imgId, valid_boxes, model_name, color='g', ax=None):
# Load the image
imgInfo = coco.loadImgs(imgId)[0]
image_path = os.path.join(cocoRoot, dataType, imgInfo['file_name'])
image = Image.open(image_path)

# Get the correct bounding box results
annIds = coco.getAnnIds(imgIds=imgInfo['id'])
anns = coco.loadAnns(annIds)
bbox_tlist_anns = torch.tensor([ann["bbox"] for ann in anns]) # tensor.shape[2,4]
# because our bounding box is x,y,w,h which is the coordinate of the lower left corner of the box (x,y) + the length and width of the box
# But torchvision Calculate the box_iou must give the coordinates of the lower left corner (x,y) and the coordinates of the upper right corner (x2,y2), so we need to Calculate (x2,y2) through (x+w, y+h) to get the coordinates of the upper right corner
# x,y,w,h -> x1,y1,x2,y2 = x,y,x+w,y+h
bbox_tlist_anns[:, 2] = bbox_tlist_anns[:, 0] + bbox_tlist_anns[:, 2]
bbox_tlist_anns[:, 3] = bbox_tlist_anns[:, 1] + bbox_tlist_anns[:, 3]

# From resultsvalid_boxes, we only need the box part, so we use (box, _, _)
# Use stack because we want to stack all the boxes together to become a tensor
bbox_tlist_model = torch.stack([box for box, _, _ in valid_boxes]) # turn [4] to tensor.shape[2,4]
# use box_iou 來Calculate IoU
iou = torchvision.ops.box_iou(bbox_tlist_anns, bbox_tlist_model) # get IoU
# Get the maximum value of each predicted box in ann, and then Calculate the average value of IoU
avg_iou = np.mean([t.cpu().detach().numpy().max() for t in iou]) # calculate the mean of IoU

# display image label
all_labels = set()

# Start drawing the prediction box
for boxes in valid_boxes:
# Get the information of the prediction box, including box, label, and accuracy
box, label, score = boxes
# Get the text information of the label: load category name by category ID
label = coco.loadCats(label.item())[0]["name"]
# Save the label for later display
all_labels.add(label)
# Because the results returned by the model are two coordinates, the lower left corner and the upper right corner, so we need to convert them into x,y,w,h form and put them into Rectangle
x, y, x2, y2 = box.detach().numpy() # x,y,w,h -> x,y,x2-x,y2-y
rect = Rectangle((x, y), x2 - x, y2 - y, linewidth=2, edgecolor=color, facecolor='none')

# Draw the picture: if you need to sort the picture, you can specify where to draw the picture through ax
if ax is None:
# gca can get the current axes, if not, it will automatically create one, and then draw the prediction box through add_patch
plt.gca().add_patch(rect)
# Draw the label on the prediction box
plt.text(x, y, f'{label}', fontsize=10, color='w', backgroundcolor=color)
else:
# add_patch can add a patch to the current axes, and then draw the prediction box on the ax
ax.add_patch(rect)
# Draw the label on the prediction box
ax.text(x, y, f'{label}', fontsize=10, color='w', backgroundcolor=color)

# display image and give it a title, the title is the label that appears in this picture, and the average value of IoU
if ax is None:
plt.axis('off')
plt.title(f'{model_name}: {all_labels} \n IoU: {avg_iou:.4f}', color=color)
plt.imshow(image)
plt.show()

else:
ax.axis('off')
ax.set_title(f'{model_name}: {all_labels} \n I0U: {avg_iou:.4f}', color=color)
ax.imshow(image)

return avg_iou


res_iou = []
mobile_iou = []

# Recursively call each id, where id is one of the 10 random ids we selected above
for i in range(len(valid_ids)):
# Create a 1x3 grid of images, each image sized 15x5
fig, axs = plt.subplots(1, 3, figsize=(15, 5))

# Draw the truth image and display it in the center
plot_image_with_annotations(coco, cocoRoot, dataType, valid_ids[i], ax=axs[1])

# Draw the predicted results from two different models on the left and right sides respectively, and return the IoU
i_mobil_iou = display_annotated_results(valid_ids[i], valid_boxes_mobile[i], "mobile", color='g', ax=axs[0])
i_res_iou = display_annotated_results(valid_ids[i], valid_boxes_res[i], "ResNet", color='b', ax=axs[2])

# Store the IoU of each image to assess the overall performance of the model
mobile_iou.append(i_mobil_iou)
res_iou.append(i_res_iou)

# Organize the layout
plt.tight_layout()

# Print the mean of the IoU list
print("ResNet: Avg.", np.mean(res_iou), "; each IoU:", res_iou)
print("MobileNet: Avg.", np.mean(mobile_iou), "; each IoU:", mobile_iou)

IoU (Intersection over Union) is a metric used to evaluate object detection algorithms. It is defined as the intersection area of the predicted box and the true box divided by their union area. The value ranges between 0 and 1, where a higher value indicates a greater overlap between the predicted and true boxes, signifying more accurate predictions.

From the example above, you might be wondering, what exactly does the following code segment do?

Here, we only need to obtain the maximum IoU for each ground truth bounding box and calculate the average value, so we use the following code

You might be curious whether using functions like max(), mean(), sum() will affect our results?

COCO Dataset - use Faster RCNN + MobileNet to conduct Object Detection (2024)

FAQs

What is COCO dataset format for object detection? ›

COCO is one of the most popular datasets for object detection and its annotation format, usually referred to as the “COCO format”, has also been widely adopted. The “COCO format” is a json structure that governs how labels and metadata are formatted for a dataset.

What is faster R-CNN algorithm for object detection? ›

Faster R-CNN is a single-stage model that is trained end-to-end. It uses a novel region proposal network (RPN) for generating region proposals, which save time compared to traditional algorithms like Selective Search. It uses the ROI Pooling layer to extract a fixed-length feature vector from each region proposal.

Which CNN model is best for object detection? ›

If you are looking for lightning-fast and fundamental object detection with an R-CNN feature set, Faster R-CNN is the way to go. But if you prefer accuracy and need precise object segmentation at the pixel level, the best option is to use Mask R-CNN.

What is a good dataset for object detection? ›

273 dataset results for Object Detection. The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images. The nuScenes dataset is a large-scale autonomous driving dataset.

Is Yolo trained on Coco? ›

The COCO dataset is widely used for training and evaluating deep learning models in object detection (such as YOLO, Faster R-CNN, and SSD), instance segmentation (such as Mask R-CNN), and keypoint detection (such as OpenPose).

Is Yolo better than Faster R-CNN? ›

As compared to Faster R-CNN, YOLO has more advanced applications. YOLO proves to be a cleaner and more efficient for doing object detection since it provides end-to-end training. Both the algorithms are fairly accurate but, in some cases, YOLO outperforms Faster R-CNN in terms of accuracy, speed and efficiency.

What is the disadvantage of faster R-CNN? ›

Fast R-CNN has limitations in terms of detection accuracy and performance in certain scenarios. It is sensitive to background clutter and its classification performance decreases when faced with noisy proposals 5. Additionally, it struggles with objects that vary greatly in scale, particularly small objects.

Can R-CNN detect multiple objects? ›

The R-CNN object detect (Computer Vision Toolbox) method returns the object bounding boxes, a detection score, and a class label for each detection. The labels are useful when detecting multiple objects, e.g. stop, yield, or speed limit signs.

What are the layers in faster R-CNN? ›

The Fast R-CNN consists of a CNN (usually pre-trained on the ImageNet classification task) with its final pooling layer replaced by an “ROI pooling” layer and its final FC layer is replaced by two branches — a (K + 1) category softmax layer branch and a category-specific bounding box regression branch.

How to train a CNN model for object detection? ›

  1. Train Object Detector Using R-CNN Deep Learning.
  2. Overview.
  3. Download CIFAR-10 Image Data.
  4. Create A Convolutional Neural Network (CNN)
  5. Train CNN Using CIFAR-10 Data.
  6. Validate CIFAR-10 Network Training.
  7. Load Training Data.
  8. Train R-CNN Stop Sign Detector.

What are the disadvantages of CNN in object detection? ›

What are the disadvantages of convolutional neural networks?
  • High computational requirements.
  • Needs large amount of labeled data.
  • Large memory footprint.
  • Interpretability challenges.
  • Limited effectiveness for sequential data.
  • Tend to be much slower.
  • Training takes a long time.

What is the fastest and most accurate object detection model? ›

SSD (Single Shot Detector)

It performs super fast and accurate detections and is able to do this in a single pass of a deep neural network, making it highly efficient and great for real-time applications. Leveraging multi-scale feature maps, it can detect small and large objects simultaneously.

What is the most powerful object detection model? ›

What is YOLOv7? YOLOv7 is one of the fastest and most accurate real-time object detection models for computer vision tasks.

What is the format for object detection dataset? ›

Pascal VOC and MS COCO are two most popular data format for object detection. Most public available object detection datasets follow either one of these two formats.

What is the Coco box format? ›

coco. coco is a format used by the Common Objects in Context COCO dataset. In coco , a bounding box is defined by four values in pixels [x_min, y_min, width, height] . They are coordinates of the top-left corner along with the width and height of the bounding box.

How to convert coco json to yolo format? ›

How To Convert COCO JSON to YOLOv8 PyTorch TXT
  1. Step 1: Create a free Roboflow public workspace. Roboflow is a universal conversion tool for computer vision annotation formats.
  2. Step 2: Upload your data into Roboflow. ...
  3. Step 3: Generate Dataset Version. ...
  4. Step 4: Export Dataset Version.

What is the VOC format? ›

The Pascal VOC (Visual Object Classes) format is one of the earlier established benchmarks for object classification and detection, which provides a standardized image data set for object class recognition. The export data format is XML-based and has been widely adopted in computer vision tasks.

References

Top Articles
Diablo 4 Enchantment Slot - How to unlock & use for Sorcerer
Blizzard Sorcerer Endgame Build Guide - Diablo 4 Maxroll.gg
Spasa Parish
Rentals for rent in Maastricht
159R Bus Schedule Pdf
Sallisaw Bin Store
Black Adam Showtimes Near Maya Cinemas Delano
Espn Transfer Portal Basketball
Pollen Levels Richmond
11 Best Sites Like The Chive For Funny Pictures and Memes
Things to do in Wichita Falls on weekends 12-15 September
Craigslist Pets Huntsville Alabama
Paulette Goddard | American Actress, Modern Times, Charlie Chaplin
Red Dead Redemption 2 Legendary Fish Locations Guide (“A Fisher of Fish”)
What's the Difference Between Halal and Haram Meat & Food?
R/Skinwalker
Rugged Gentleman Barber Shop Martinsburg Wv
Jennifer Lenzini Leaving Ktiv
Justified - Streams, Episodenguide und News zur Serie
Epay. Medstarhealth.org
Olde Kegg Bar & Grill Portage Menu
Cubilabras
Half Inning In Which The Home Team Bats Crossword
Amazing Lash Bay Colony
Juego Friv Poki
Dirt Devil Ud70181 Parts Diagram
Truist Bank Open Saturday
Water Leaks in Your Car When It Rains? Common Causes & Fixes
What’s Closing at Disney World? A Complete Guide
New from Simply So Good - Cherry Apricot Slab Pie
Drys Pharmacy
Ohio State Football Wiki
Find Words Containing Specific Letters | WordFinder®
FirstLight Power to Acquire Leading Canadian Renewable Operator and Developer Hydromega Services Inc. - FirstLight
Webmail.unt.edu
2024-25 ITH Season Preview: USC Trojans
Metro By T Mobile Sign In
Restored Republic December 1 2022
12 30 Pacific Time
Jami Lafay Gofundme
Wi Dept Of Regulation & Licensing
Pick N Pull Near Me [Locator Map + Guide + FAQ]
Crystal Westbrooks Nipple
Ice Hockey Dboard
Über 60 Prozent Rabatt auf E-Bikes: Aldi reduziert sämtliche Pedelecs stark im Preis - nur noch für kurze Zeit
Wie blocke ich einen Bot aus Boardman/USA - sellerforum.de
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Dermpathdiagnostics Com Pay Invoice
How To Use Price Chopper Points At Quiktrip
Maria Butina Bikini
Busted Newspaper Zapata Tx
Latest Posts
Article information

Author: Dr. Pierre Goyette

Last Updated:

Views: 5779

Rating: 5 / 5 (70 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Dr. Pierre Goyette

Birthday: 1998-01-29

Address: Apt. 611 3357 Yong Plain, West Audra, IL 70053

Phone: +5819954278378

Job: Construction Director

Hobby: Embroidery, Creative writing, Shopping, Driving, Stand-up comedy, Coffee roasting, Scrapbooking

Introduction: My name is Dr. Pierre Goyette, I am a enchanting, powerful, jolly, rich, graceful, colorful, zany person who loves writing and wants to share my knowledge and understanding with you.