Is marqo's text embedding multilingual?

Can Marqo work with multilingual documents? If so, how do I get that working?

Yep Marqo can be used with multilingual models.

You can use a few different models through the custom models API . The one below should be good to start with

“We used the following 50+ languages: ar, bg, ca, cs, da, de, el,
en, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja,
ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt-br, ro, ru,
sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw.”
see here Pretrained Models — Sentence-Transformers documentation.

settings = {
    "index_defaults": {
        "treat_urls_and_pointers_as_images": False,
        "text_preprocessing": {
            "split_length": 2,
            "split_overlap": 0,
            "split_method": "sentence"
        },
        "model": 'unique-model-alias',
        "model_properties": {
            "name": "sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
            "dimensions": 768,
            "tokens": 128, # can change this to up to 512 tokens context length
            "type": "sbert"
        },
        "normalize_embeddings": True,
    },
}
response = mq.create_index("my-generic-model-index", settings_dict=settings)
3 Likes