Skip navigation
Visual web searches


Peyman Milanfar/UC Santa CruzProfessor, Electrical Engineering:   We've had a way of searching for information on the Web, which is extremely effective, but we haven't had a way of searching for information visually. I like to think of a Web search nowadays as kind of a one-dimensional problem. You type in a search term and related images come up. What you'd like to be able to do is search for images that you might be interested in with another image. So this is the notion of basically being able to find similarity between images without having a huge number of examples.  

There's two general components to any algorithm that is trying to recognize or detect particular objects and images.  One step is to extract visually salient features out of the given image.  And the second step is to be able to compare these visually salient features between two images. We came up with a way of measuring these visually salient components, not only to images but also to clips of video. Let's say you have a particular football play that you have on a video clip. 

Now you want to see if a particular other team is running the same defensive play.  Well you can compare the videos. I might give a picture of Lance Armstrong to the system and the system might give back to me pictures of people biking in the street; men, women without wearing the uniforms on the sidewalk, on the beach, anywhere. Faces are another good example.

So if you provide a query which is just this picture of this female, you can detect images and target images of people of different races, different ages, faces of different scales and so on; even kids with face paint on, people wearing glasses, et cetera. So it's fairly widely applicable. There's also of course a number of different applications to security, defense, surveillance. 

Those are sort of the obvious applications. So one other thing for instance you could do is not just this search capability but you could also for instance, let's say you take all of the images or video sequences that you have stored on your disk at home, on your computer.  Now you want to let's say categorize these into pictures of flowers, picture of outdoors, pictures of cars versus pictures of people.

This would provide an automated mechanism for measuring similarity between every pair of images that you have in your database and then using those similarity measures to cluster all of the images that we have into different categories in an automatic fashion. And I think as humans, we have to kind of translate what we are looking for often from a visual object to text in order to interface with the Web or with a database. 

Whereas what we really want to do is pick up an object and say well I wonder if I can find something on the shelf that looks like this. I mean the idea of uploading an image into Google is something that when I mention to people, like a light bulb goes off.  Oh, of course, why shouldn't you be able to do that or even a video clip? And I think once you have that idea out there, it will seem very natural for people to be able to use it in an everyday setting and it will happen.