Where are embedding models stored?

We would like to avoid downloading models (e.g., multilingual-clip/XLM-Roberta-Large-Vit-L-14) more than once, even if the marqo container is shut down. This presumable means mapping an internal path to a persistent docker volume. But where are these models stored?

As far as I can tell, the cache directory is:

/app/src/marqo/cache

is that correct?

Hello @rcking . Yes, it is correct. You can mount a volume to that folder to cache models.

Some more details from the code base:

Please let me know if you have further questions. Thanks.

Thanks @papa99do. Sorry in advance for the long reply…

I do not think that marqo behaves as expected. Here is what I tried:

On a local machine, with internet connection:

$ docker run -d --name marqo -it -p 8882:8882 -v /vespa:/opt/vespa/var -v /marqo:/app/src/marqo/cache marqoai/marqo:2.11

$ curl -X POST -H 'Content-type: application/json' http://localhost:8882/indexes/test -d '{"model": "hf/e5-base-v2"}'

$ curl localhost:8882/models
{"models":[{"model_name":"hf/e5-base-v2","model_device":"cpu"},{"model_name":"open_clip/ViT-B-32/laion2b_s34b_b79k","model_device":"cpu"}]}
$ curl localhost:8882/indexes
{"results":[{"indexName":"test"}]}

Then I added some documents, carried out a query successfully.

Then I stopped the container, disconnected from the internet, and re-started the container. Query still works successfully.

Then I stopped the container and removed the container. Then I tried to run marqo again (this time, no internet connection).

$ docker logs marqo

loading for: model_name=hf/e5-base-v2 and properties={'name': 'intfloat/e5-base-v2', 'dimensions': 768, 'tokens': 512, 'type': 'hf', 'model_size': 0.438, 'text_query_prefix': 'query: ', 'text_chunk_prefix': 'passage: ', 'notes': ''}
INFO:marqo.tensor_search.index_meta_cache:Last index cache refresh at 1724658687.0970526
/usr/local/lib/python3.8/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
ERROR:marqo.s2_inference.s2_inference:Error loading model hf/e5-base-v2 on device cpu with normalization=True.
Error message is Marqo encountered an error loading the Hugging Face model = `intfloat/e5-base-v2` using AutoTokenizer Please ensure that the model is a valid Hugging Face model and retry.
 Original error message = We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like intfloat/e5-base-v2 is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 168, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 73, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib64/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

+ many more exceptions

In other words, there is something emphemeral about the ModelCache object (I suspect); marqo has “forgotten” that the model is cached, attempts to load it from the internet, and fails due to the missing connection.

Addendum: Interestingly, if I restore the internet connection, and re-run marqo (having removed the container previously), it starts well and apparently loads the models from the cache! But it seems there is a check of huggingface before this happens…?

Not good behaviour for an offline deployment…

Found a workaround. First, I downloaded e5-base-v2 from huggingface, and put in in a local cache dirctory. Then I turned off my network interface.

Then:

docker run -d --name marqo -it -p 8882:8882 -e MARQO_MODELS_TO_PRELOAD='[]' -v /vespa:/opt/vespa/var -v /marqo/.cache/huggingface/hub:/hfcache marqoai/marqo:2.11

curl -X POST -H 'Content-type: application/json' http://localhost:8882/indexes/test -d '{"model": "local/e5-base-v2", "modelProperties": {"dimensions": 768, "tokens": 512,"type": "hf","model_size": 0.438,"text_query_prefix": "query: ","text_chunk_prefix": "passage: ","notes": "","localpath": "/hfcache/models--intfloat--e5-base-v2/snapshots/1c644c92ad3ba1efdad3f1451a637716616a20e8"}}'

Then I can add documents to the test index, and query them.

But most importantly, I can also shutdown and remove the marqo container, then restart it with the command above, and everything works - offline.

@rcking , thanks for sharing your findings with us. Glad to hear that you’ve got Marqo working offline. We will do some more investigation and see why you will need to put the model in a local folder to start Marqo with. We will keep you updated about our investigation result. Thanks again.

I think the problem might be in from_hf.py:

def download_model_from_hf(
        location: HfModelLocation,
        auth: Optional[HfAuth] = None,
        download_dir: Optional[str] = None):
    """Downloads a pretrained model from HF, if it doesn't exist locally. The basename of the
    location's filename is used as the local filename.

    hf_hub_download downloads the model if it does not yet exist in the cache.

    Args:
        location: repo_id and filename to be downloaded.
        auth: contains HF API token for model access
        download_dir: The location where the model
            should be stored

    Returns:
        Path to the downloaded model
    """
    download_kwargs = location.dict(exclude_unset=True) # Ignore unset values to avoid adding None to params
    if auth is not None:
        download_kwargs = {**download_kwargs, **auth.dict()}
    try:

        return hf_hub_download(**download_kwargs, cache_dir=download_dir)
    except RepositoryNotFoundError:
        # TODO: add link to HF model auth/loc
        raise ModelDownloadError(
            "Could not find the specified Hugging Face model repository. Please ensure that the request's model_auth's "
            "`hf` credentials and the index's model_location are correct. "
            "If the index's model_location is not correct, please create a new index with the corrected model_location"
        )

It appears that hf_hub_download is used to get the filepath of the cached model; but that function raises an exception if huggingface cannot be reached, even if nothing is downloaded.

1 Like