Is marqo's text embedding multilingual?

esoteric1 · July 7, 2023, 12:47am

Can Marqo work with multilingual documents? If so, how do I get that working?

owen-elliott · July 7, 2023, 12:53am

Edited for Marqo 2.x

Yep, Marqo can be used with multilingual models.

Out of the box we have support for a number of performant multilingual models. I would recommend e5 multilingual models:

You can use them like so:

import marqo
mq = marqo.Client()
mq.create_index(
    "my-multilingual-index", 
    model="hf/multilingual-e5-base"
)

You can also use a few different models through the custom models API . The one below should be good to start with

“We used the following 50+ languages: ar, bg, ca, cs, da, de, el,
en, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja,
ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt-br, ro, ru,
sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw.” see here Pretrained Models — Sentence Transformers documentation.


settings = {
    "textPreprocessing": {
        "splitLength": 2,
        "splitOverlap": 0,
        "splitMethod": "sentence"
    },
    "model": 'unique-model-alias',
    "modelProperties": {
        "name": "sentence-transformers/paraphrase-multilingual-mpnet-base-v2",
        "dimensions": 768,
        "tokens": 512, 
        "type": "sbert"
    },
    "normalizeEmbeddings": True,
}
response = mq.create_index("my-generic-model-index", settings_dict=settings)

OneEyedBlackCatDevel · July 15, 2024, 7:43pm

Hi,
It seems that the API changed since your reply.
Can you point me to the documentation or give an updated example, please?

owen-elliott · July 16, 2024, 12:47am

Hi @OneEyedBlackCatDevel, thanks for spotting this. Our API did change when we upgraded to version 2. For a good medium sized multilingual model I would recommend doing an e5 model.

import marqo

mq = marqo.Client()

response = mq.create_index("my-index", model="hf/multilingual-e5-base")

Multilingual E5 models support 94 languages.

Topic		Replies	Views
Are the marqo ecommerce B/L embedding models multi-lingual and has anyone used them for deduplication? General	0	13	January 22, 2025
How to use BAAI bge 1.5 Models in Marqo? Support	1	212	November 26, 2023
Supported LLM Models Support	2	258	February 1, 2024
I need some help with marqo Support	3	304	September 19, 2024
Document Input Compatibility? Support	1	95	April 1, 2024

Is marqo's text embedding multilingual?

Related topics