Image Based Product Recommendations

Product recommendations are great ways of boosting sales and customer satisfaction. AI and deep learning, such as Vector Search, can help create accurate user specific recommendations that aid the customers buying journey. We have discussed vector search in order to create recommendations based on contextual learning. This is a continuation of that advanced AI capacity, but this time with images. Image based product recommendations leverage image descriptors to create recommendations based on visual prompts. 

Why Use Image Recommendations?

Recommendations are key in order to boost sales. Cross-sell and upsell recommendations are widely known as having a big impact on sales. We are all visual creatures. It is for this reason that visual merchandising is so important to any merchandising strategy.

The Importance Of Sight

When we enter an eCommerce page, immediately we see with our eyes. Creating recommendations based on sight therefore makes sense. Multiple features can be extracted from images, and then used for similarity computation. 

When it comes to words, people need to have at least a vague description of what they are looking for. If recommendations are offered just on words, therefore, this might not always work. When it comes to sight; it’s something different. It’s a lot more abstract, and vague. It’s simply due to personal preference, nothing deeper than that.

Offering recommendations based on sight appeals to the human psychology of shopping; we just like how things look sometimes. It removes the fickleness of ‘labels’ and just appeals to the senses the same way brick and mortar stores do.

Image Recommendation vs Image Classification

Image classification is where certain images can be classified based on image features. Certain elements can be recognised to put the image into a context or category. Image recommendation finds similar objects found in similar images. 

The goal is not to classify, but to find other images with the same elements. Descriptive and distinctive features are utilized in order to create recommendations.

Low-level Feature Extraction

Deep learning models can be taught to generate similar recommendations based on elements of features. This is a new way of using recommendations, that thrives off the power of visual understanding shoppers have. 

So what kind of features are used in order to make this work? Well, feature descriptions of any image need to be considered. Some examples can be:

  • Color
  • Texture
  • Shapes
  • Defining features

These can all be used to try and identify certain elements. Then these can be compared against other images in order to create recommendations.

Similarity Computation

Training vectors can be generated for each of the training and testing images. After the vectors are trained, each image will be tested against the predefined clusters. Then, all cosine similarities are computed between images and clusters. This is repeated with different feature descriptors. The recommendations shown are the ones with the most similar images. This kind of system works, but at these stages may also be inconsistently reliable.

Solution: Hybrid Model

Using a hybrid of Vector search that combines both contextual understanding (labels and keywords) as well as image features, is the way forward. We need more than simply just good visual matches. Combining visual matches, with a context understanding, provides the most human-like recommendations outcome possible.  

Creating a rules engine within vector search that can understand visuals and semantics will create the best possible recommendations. This will hugely increase the merchandising experience, and gives you the opportunity to push ahead. 


Utilizing the opportunities that image recommendations present is a great plan. The closer the eCommerce world becomes to offering recommendations like those offered by in-store assistants, the closer it comes to a more perfect eCommerce experience. Image based recommendations are a great step in the right direction towards achieving this.