Monday, December 21, 2009

Why is visual search so hard to do?

Superfish is centered on visual search—the use of images, rather than text, as information retrieval queries. Our vision is, no pun intended, to connect any real-world object, through its captured images, to all relevant information.

Robust visual search—retrieving relevant information on any image—is really hard to do. Many researchers, in academia and industry, are working to crack the visual search nut. Google has recently released Goggles in their labs, and Microsoft’s Bing has included a mode of visual search for some time. Dozens of other research teams and start-ups are busy working on this problem. Yet, the state of the art is not quite there. Why is it so hard? Why can’t computers perform a task so simple for us humans?

Human ability to sense and recognize images is the result of hundreds of millions of years of natural selection. Computer-based machine vision and visual search have been around for merely tens of years. In their quest for robust visual search, researchers are faced with several hurdles.

The first of these is that, contrary to the title of a recently popular book, the world is not really flat. Visual search for flat objects already exists, and it is not bad at all. For example, there are pretty good optical character recognition (OCR) and bar-code readers in use today. But we live in a three-dimensional world where objects take on dissimilar visual forms when viewed from different viewing angles. The same shoe looks completely different from the front, back, side, top and bottom. While even a young child can abstract a real-world object from its myriad appearances, computers can only compare images by their apparent features. Superfish employs algorithms that handle complex geometries to recognize an object regardless of the angle the image was captured.


The first image is the query image of a 3D object. The next two are results from our algorithm.
Notice how we are able to find similar even with the object rotated.




Secondly, unless professionally shot, most pictures have a lot of “noise.” The resolution may be poor, the lighting dim, the main subject might be partially covered, or there may be dominant background elements. We describe such images as having a low signal-to-noise ratio. It would be nice if every object was photographed with perfect lighting on a white background, but the great majority of images captured by camera or cell phone have a lot of confusing “noise.”

Some visual search algorithms employ rules of thumb, or heuristics, that try to ferret out visual features that are unique to specific classes of images, but such heuristics significantly limit the scope and scale of visual search. Because our goal is to enable visual search from common images to very large indexes of common images, we had no choice but to develop algorithms that rely on mathematics and statistics, rather than heuristics, to extract the signal from its noisy environment.

The query image (left) and the result from our algorithms (right).
Notice how both images have background and lighting "noise."




Third, image search features offered on Google and Microsoft’s Bing are heavily dependent on keyword tags and other text-based image-related descriptors. Such tags are useful in classifying images into categories such as people, places and objects, and subsequently can augment ranking and filtering algorithms. But sadly, most images captured from the real world are not tagged, thus rendering them invisible to search engines that rely on tags.

More importantly, many objects we search for simply cannot be described with words. A specific pattern in a curtain fabric, for example, may be indescribable with words, even for fabric experts. While in our minds we may not have words for the great dictionary of visual features, such a vocabulary can in fact be extracted and used for visual search. While we certainly appreciate and leverage the role of augmenting textual, temporal and locational tags, we seek to maximize the use of visual vocabularies for maximal relevance.

Lastly, size matters. A robust visual search solution must index and process billions of images quickly and accurately, and must handle image queries in real time. All this processing is very expensive –both in server costs as well as time. Engineering a solution that operates on arbitrary and “noisy” images to deliver high relevance at low latency on an unbounded index is a significant challenge. While our first application, Window Shopper, demonstrates visual search on a small subset of all the images one can think of, our algorithmic work has been based, from day one, on scalability and high performance.

As we move forward with our visual search technology, these are a few of the key challenges we tackle. While we’re not there yet, we believe our foundation is sound and deep. Our goal is a world where anyone can connect any image to relevant information through visual search.

Saturday, December 12, 2009

Hello World!


After years of research and development we are ready to introduce Superfish to the world. Say Hi.
Yes, it’s been difficult and it’s taken longer than we thought, but here it is.

Superfish is a visual search engine, software that tries to achieve what we humans do so well – perceive similarity in images. In fact, this is a notoriously difficult problem which smart scientists and engineers have been working on for decades. Our work builds on theirs and adds our own innovative ideas.

The explosion of images on the Web has accelerated the need for visual search, with several teams trying to crack the code (friendly nod to the guys at Google and Microsoft). While we are making our first steps, we think Superfish is quite different from anything you’d seen so far. Other technologies have demonstrated how effective visual search can be on bar codes, text and flat objects like books and DVD covers. Those are neat applications but, alas, the world is not really flat!

So, we set out to visually search more complex objects that matter to people. Things we wear, for example – like shoes and clothing and jewelry. While books and DVDs have names and catalog numbers to search with, fashion items don’t. In the shoe store we can ask for another shoe like this one or tell the clerk we don’t like this style. Online there’s a ton of fashion, which is great, but we aren’t able to use visual cue shortcuts to cut through it all. Can visual search help?
We think it can.

Our first product is called
Window Shopper and it uses visual search to help you discover the products you want. We engineered it as a browser add-on that allows you to select a product image at some online store (no need to go to our destination site - launch it from your favorite store), quickly see similar products, most likely in other stores, and focus or expand your search based on style. Say you are looking at a high heel shoe at Nordstrom and want to see similar products elsewhere on the Web. Click, and you will get similar style shoes at other stores, from different brands at different prices.

It’s pretty cool, we think. And, it actually works. Well, most of the time. Sometimes Superfish may return really funny results – like a pair of trousers that are perceived to look like wine glasses.
In most cases Superfish returns really nice and relevant results. Considering the difficulty of visual search, we feel it
is actually quite amazing and very cool and dramatically better than where we started out from. But we do of course recognize that it is still 20% shy of perfect. Improving this requires a lot of work – beefing up our image index, learning from usage patterns and improving our algorithms. It also means we will rely on you to use our beta products, point out problems and push us to constantly improve the technology and user experience.

The Superfish technology was designed to search on any image. As we move forward, we will be adding more categories, enabling more features and extending visual search to more use cases. Our vision is that one day, we hope not too distant, you will be able to search visually on any object, wherever you are, whenever you feel like it.

We hope you will add WindowShopper to your browser and check it out. We would love to get your feedback, ideas, comments and suggestions, on this blog or using the feedback form at our web site.

Happy fishing.
The Superfish Team