Product similarity search

I would like some advice about how best to try out Marqo for my use case. I am completely new to this, but suspect that Marqo could perform this task well - I would like to know if this is worth spending time experimenting with or not, and any hints would be welcome about how best to achieve it.

The problem is that I have a list of Products sold by company A - as a CSV and a second list of Products sold by company B - I would like to match each product in Company A’s list to the equivalent product in company B’s list. In the event there is not the same product I would like a list of the most similar candidates.

For each product in both lists we always have a “product_name” as a text field - which is basically the same as on a product page on an Ecommerce site, “brand” text which should be the brand name of the product. We usually have some form of “Category” text of the product. We also have a Selling price number of each in a particular currency (usually same currency in both companies)

Sometimes we have an image of the product as a URL to an image. sometimes we have

I imagine that for each product in company A I might get

a) there is no product in company B that is even considered “similar” - so basically they not only do not stock the same product - but nothing else that gets a high enough “similarity” score to be considered

b) Company B stocks EXACTLY the same product - the Similarity score is such that there is one clear “winner” and we should have high enough confidence this is the “same” product. (though there may well be a number of “similar” products)

c)Company B stocks a number of products which may be considered similar enough to be considered “equivalent” if not EXACTLY the same product. we would want to capture these and some rank of Similarity.

Apologies if this is too much a newbie question!

Hey Harry,

Here’s a few things to try:

  • run hybrid search so the lexical component biases towards results that explicitly share text in the description. You can weight the lexical component more highly by reducing the alpha, like this:

    mq.index("my-hybrid-index").search(
        q="<my product text field>",
        search_method="HYBRID",
        hybrid_parameters={
            "alpha": 0.3
        },
    )
    
  • add filtering to narrow down potential matches. For example, you could filter out products where:

    • The brand names don’t match
    • The price difference is more than 30%
    • The categories are completely different
  • run some queries and manually inspect the results. Try to find the similarity scores at which the products begin to seem dissimilar. We do something like this quite often, using LLMs to annotate.

Thank you very much for the pointers Nick, much appreciated.