Issues with docker

Hi Marqo team.

I have issues with the docker image. It is just starting randomly. If it starts all good but mostly it fails with:

marqo@marqo-server:~$ sudo docker run -m=6g --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
Preparing to start Marqo-OS…
Marqo-OS not found; starting Marqo-OS…
Marqo-OS started successfully.
Starting Marqo throttling…
Marqo throttling successfully started.
INFO:ModelsForStartup:pre-loading [‘hf/all_datasets_v4_MiniLM-L6’, ‘ViT-L/14’] onto devices=[‘cpu’]

###########################################################
###########################################################

STARTING DOWNLOAD OF MARQO ARTEFACTS################

###########################################################
###########################################################

INFO:DeviceSummary:found devices [{‘id’: -1, ‘name’: [‘cpu’]}]
INFO:SetBestAvailableDevice:Best available device set to: cpu
loading for: model_name=hf/all_datasets_v4_MiniLM-L6 and properties={‘name’: ‘flax-sentence-embeddings/all_datasets_v4_MiniLM-L6’, ‘dimensions’: 384, ‘tokens’: 128, ‘type’: ‘hf’, ‘notes’: ‘’}
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████| 612/612 [00:00<00:00, 47.4kB/s]
Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████| 90.9M/90.9M [00:01<00:00, 49.1MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████| 535/535 [00:00<00:00, 45.9kB/s]
Downloading (…)solve/main/vocab.txt: 100%|███████████████████████████████████████████████| 232k/232k [00:00<00:00, 2.45MB/s]
Downloading (…)/main/tokenizer.json: 100%|███████████████████████████████████████████████| 466k/466k [00:00<00:00, 10.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████| 112/112 [00:00<00:00, 57.7kB/s]
INFO:marqo.s2_inference.s2_inference:loaded hf/all_datasets_v4_MiniLM-L6 on device cpu with normalization=True at time=2023-11-06 22:08:01.437866.
INFO:ModelsForStartup:hf/all_datasets_v4_MiniLM-L6 cpu run succesfully!
loading for: model_name=ViT-L/14 and properties={‘name’: ‘ViT-L/14’, ‘dimensions’: 768, ‘notes’: ‘CLIP ViT-L/14’, ‘type’: ‘clip’}
ERROR:marqo.s2_inference.s2_inference:Error loading model ViT-L/14 on device cpu with normalization=True.
Error message is <urlopen error [Errno 104] Connection reset by peer>
Traceback (most recent call last):
File “/usr/lib/python3.8/urllib/request.py”, line 1354, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File “/usr/lib/python3.8/http/client.py”, line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.8/http/client.py”, line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.8/http/client.py”, line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.8/http/client.py”, line 1011, in _send_output
self.send(msg)
File “/usr/lib/python3.8/http/client.py”, line 951, in send
self.connect()
File “/usr/lib/python3.8/http/client.py”, line 1425, in connect
self.sock = self._context.wrap_socket(self.sock,
File “/usr/lib/python3.8/ssl.py”, line 500, in wrap_socket
return self.sslsocket_class._create(
File “/usr/lib/python3.8/ssl.py”, line 1040, in _create
self.do_handshake()
File “/usr/lib/python3.8/ssl.py”, line 1309, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/app/src/marqo/s2_inference/s2_inference.py”, line 155, in _update_available_models
AvailableModelsKey.model: _load_model(
File “/app/src/marqo/s2_inference/s2_inference.py”, line 357, in _load_model
model.load()
File “/app/src/marqo/s2_inference/clip_utils.py”, line 261, in load
self.model, self.preprocess = clip.load(self.model_type, device=‘cpu’, jit=False, download_root=ModelCache.clip_cache_path)
File “/usr/local/lib/python3.8/dist-packages/clip/clip.py”, line 121, in load
model_path = _download(_MODELS[name], download_root or os.path.expanduser(“~/.cache/clip”))
File “/usr/local/lib/python3.8/dist-packages/clip/clip.py”, line 60, in _download
with urllib.request.urlopen(url) as source, open(download_target, “wb”) as output:
File “/usr/lib/python3.8/urllib/request.py”, line 222, in urlopen
return opener.open(url, data, timeout)
File “/usr/lib/python3.8/urllib/request.py”, line 525, in open
response = self._open(req, data)
File “/usr/lib/python3.8/urllib/request.py”, line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File “/usr/lib/python3.8/urllib/request.py”, line 502, in _call_chain
result = func(*args)
File “/usr/lib/python3.8/urllib/request.py”, line 1397, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File “/usr/lib/python3.8/urllib/request.py”, line 1357, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/uvicorn”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 1157, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 1078, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 783, in invoke
return __callback(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/uvicorn/main.py”, line 416, in main
run(
File “/usr/local/lib/python3.8/dist-packages/uvicorn/main.py”, line 587, in run
server.run()
File “/usr/local/lib/python3.8/dist-packages/uvicorn/server.py”, line 61, in run
return asyncio.run(self.serve(sockets=sockets))
File “/usr/lib/python3.8/asyncio/runners.py”, line 44, in run
return loop.run_until_complete(main)
File “uvloop/loop.pyx”, line 1517, in uvloop.loop.Loop.run_until_complete
File “/usr/local/lib/python3.8/dist-packages/uvicorn/server.py”, line 68, in serve
config.load()
File “/usr/local/lib/python3.8/dist-packages/uvicorn/config.py”, line 467, in load
self.loaded_app = import_from_string(self.app)
File “/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py”, line 21, in import_from_string
module = importlib.import_module(module_str)
File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1014, in _gcd_import
File “”, line 991, in _find_and_load
File “”, line 975, in _find_and_load_unlocked
File “”, line 671, in _load_unlocked
File “”, line 848, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/app/src/marqo/tensor_search/api.py”, line 79, in
on_start(OPENSEARCH_URL)
File “/app/src/marqo/tensor_search/on_start_script.py”, line 33, in on_start
thing_to_start.run()
File “/app/src/marqo/tensor_search/on_start_script.py”, line 159, in run
_ = _preload_model(model=model, content=test_string, device=device)
File “/app/src/marqo/tensor_search/on_start_script.py”, line 186, in _preload_model
_ = vectorise(
File “/app/src/marqo/s2_inference/s2_inference.py”, line 63, in vectorise
_update_available_models(
File “/app/src/marqo/s2_inference/s2_inference.py”, line 172, in _update_available_models
raise ModelLoadError(
marqo.s2_inference.errors.ModelLoadError: Unable to load model=ViT-L/14 on device=cpu with normalization=True. If you are trying to load a custom model, please check that model_properties={‘name’: ‘ViT-L/14’, ‘dimensions’: 768, ‘notes’: ‘CLIP ViT-L/14’, ‘type’: ‘clip’} is correct and Marqo has access to the weights file.

also: I have installed nvidia drivers and cuda according to manul and get nvidia-smi as well as cuda in pytorch + I have docker - nvidia engine. But when starting docker with --gpu=all it fails as well.

Thanks for looking at this.

Hi @karl ! It could be a couple of things, would you be able to try and start Marqo without any preloaded models?
You can remove any models using this
-e MARQO_MODELS_TO_PRELOAD='[]'
as part of the Docker command. See below as well
https://marqo.pages.dev/1.4.0/Troubleshooting/troubleshooting/#ram-and-vram

1 Like

I checked all this. After a fresh install I manged to get it up and running.
But I can’t manage to get GPU support up.

Hi @karl, what OS are you running on? I presume you have followed the guide for using Marqo with a gpu?

Any additional info you can provide on the issue you are having with getting the GPU working will help us debug the problem.