Support for vectors as array of floats?

daFish · August 14, 2023, 8:32am

Hello there,

I’m currently trying to add documents to Marqo in the form of:

[-124.581626892, 140.399642944, -142.286575317, 10.4401988983, -58.0831069946, -12.1402750015, -25.3591308594, 0.0149844130501, -15.8585281372, -1.21159851551, -8.49753952026, -3.4258377552, -4.86744403839]

Is this somehow supported?

owen-elliott · August 14, 2023, 8:43am

Unfortunately directly indexing documents into Marqo as vectors is not supported at the moment.

Having the inference as part of the add_documents pipeline (which creates the vectors for you) is an intrinsic part of Marqo.

I can offer some alternative approaches that may or may not be useful in your case:

If your vectors come from a supported model architecture then you could load your model into Marqo during index creation, you can refer to the documentation for loading generic CLIP and SBERT models. Marqo will then create the vectors for you.
You can also search directly with vectors once your data is indexed via the context parameter in the search API.

daFish · August 14, 2023, 11:16am

Thank you for your insightful answer. Using the CLIP and SBERT models is not an option as I want to index documents with audio features extracted using essentia.

But maybe I’m just overlooking something and the features from Marqo are already supporting what I want to achieve.

owen-elliott · August 14, 2023, 12:15pm

I see, at this stage we don’t currently support any models that do feature extraction from audio directly.

If you are just looking to work with vectors that you already have, then I would suggest that a vector database that doesn’t include the inference might be a better fit for your use case. There are some good open source options such as HNSWLib which is a more minimal implementation of vector search or OpenSearch which is more complex to use but is also more feature rich.

Marqo is targeted towards working directly with text and images where the vectors are created internally using a model of your choice.

For example an index created as

mq.create_index("my-index", model="hf/e5-base")

would then use the e5-base model to create the vectors. Adding documents like

mq.index("my-index").add_documents(
    [
        {
            "text": "The EMU is a spacesuit that provides environmental protection.",
        },
    ],
    tensor_fields=["text"],
)

Would take the "text" and turn it into a vector using the hf/e5-base base. The same process is applied at search time where a text is converted into a vector to do the vector search.

daFish · August 14, 2023, 1:26pm

Thank you for the explanation. I’ll search for other vector databases but will keep an eye on Marqo.

owen-elliott · November 26, 2023, 10:40pm

We have now added support for this functionality in the latest release (1.4.0). You can refer to the documentation for example usage.

Topic		Replies	Views
I need some help with marqo Support	3	342	September 19, 2024
Need a support in custom vector search with my own embeddings Support	2	43	November 11, 2024
Document Input Compatibility? Support	1	96	April 1, 2024
Need some insight Support	7	195	April 1, 2024
Marqo Major Release 1.0.0 Announcements	1	210	August 3, 2023

Support for vectors as array of floats?

Related topics