Original post author: Ryan
I've been spending some of my own time learning about computer vision, machine learning, and robotics. It has been very interesting, and I think I will make a post about it once a week or so. Perceiving objects is something that us humans do quite easily, but for computers it is very hard. Computers like to operate using "rules" and it's impossible to write a set of rules for a computer to scan an image and determine if it contains a person, or a chair, or a house, or a lamp, or a pizza (or any other object). When a computer looks at a photo, it just sees pixels. It doesn't see objects, or shapes, or anything. In the past few years, machine learning and computer vision have taken big steps forward. Researchers have learned that humans simply can't write code to teach a computer how see - but they can write code that will enable the computers to teach themselves how to see. These programs are called "artificial neural networks", and they have some parallels with the way our human brains work. The main premise of a neural network is that the pixels of an image are filtered through several "layers" of artificial neurons which parse the image in different ways. The first layer might detect edges, the second layer might detect shapes, the third layer might combine shapes into objects, and eventually the final layer will determine what object is being shown. I said "might" a lot in the last sentence because humans (including those who developed the fundamental mathematics and technology) don't really understand how the artificial neural network works. We just know that they do indeed work, and we know some of the basic theory behind it. When an artificial neural network is first set up, it is very very unintelligent. It only becomes "smart" after you "train" it. If you're training a neural network to do object detection, you'll need to show it thousands and thousands of example images while telling it "this is a person", "this is a chair", "this is another person", etc. This process can take days or even weeks - and keep in mind that a fast computer can process hundreds of these training images every minute. In the picture above, I was using an artificial neural network pre-trained by google (so they did most of the hard work here). It analyzed this photo and was 77% sure I am a person, and it was 94% sure that Violet was a cat.
Comments