Given a document of text + images, will marqo vectorizes the text within the document as well as the image? The images could be a direct picture within a document or a picture with text embedded within standard image format.
The fields that get vectorised will depend on the ones you specify as the tensorFields
. Text and images must be contained in separate fields and images are provided via URLs. For example this document:
{
"_id": "doc1",
"text": "This is a field with some text in it",
"image": "https://image.com/image.jpg"
}
Could have the text vectorised with tensorFields=["text"]
or the image with tensorFields=["image"]
. Or you could combine the two into a single vector with:
import marqo
mq = marqo.Client(url="http://localhost:8882")
settings = {
"treat_urls_and_pointers_as_images": True,
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
}
mq.create_index("my-first-multimodal-index", **settings)
document = {
"_id": "doc1",
"text": "This is a field with some text in it",
"image": "https://image.com/image.jpg"
}
mq.index("my-first-multimodal-index").add_documents(
[document],
tensor_fields=["text_image_field"],
mappings={
"text_image_field": {
"type": "multimodal_combination",
"weights": {"text": 0.1, "image": 0.9},
}
},
)
The example above uses the Python client however you could of course use the API from any language of your choice.