CS231n_Notes_1.1: Image Classification

  • Image Classification Problem is the task of assigning an input image one label from a fixed set of categories. Many other seemingly distinct Computer Vision tasks (such as ojbject detection, segmentation) can be reduced to image classification.

  • Challenges

    • Viewpoint variation: A single instance of an object can be oriented in many ways with respect to the camera
    • Scale variation: Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image)
    • Deformation: Many objects of interest are not rigid bodies and can be deformed in extreme ways.
    • Occlusion: The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.
    • Illumination conditions: The effects of illumination are drastic on the pixel level.
    • Background clutter: The objects of interest may blend into their environment, making them hard to identify.
    • Intra-class variation: The classes of interest can often be relatively broad, such as chair. There are many different types of these objects, each with their own appearance.
  • Data-driven approach: provide the computer with many examples of each class and then develop learning algorithms that look at these examples and learn about the visual appearance of each class.

  • Image classification pipeline: Input -> Learning -> Evalution

  • Cross-validation: In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive. The splits people tend to use is between 50%-90% of the training data for training and rest for validation. However, this depends on multiple factors: For example if the number of hyperparameters is large you may prefer to use bigger validation splits. If the number of examples in the validation set is small (perhaps only a few hundred or so), it is safer to use cross-validation. Typical number of folds you can see in practice would be 3-fold, 5-fold or 10-fold cross-validation.

  • The pixel-wise distance does not correspond at all to perceptual or semantic similarity.

  • If your data is very high-dimensional, consider using a dimensionality reduction technique such as PCA or even Random Projections.