Issues with docker

karl · November 6, 2023, 10:17pm

Hi Marqo team.

I have issues with the docker image. It is just starting randomly. If it starts all good but mostly it fails with:

marqo@marqo-server:~$ sudo docker run -m=6g --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
Preparing to start Marqo-OS…
Marqo-OS not found; starting Marqo-OS…
Marqo-OS started successfully.
Starting Marqo throttling…
Marqo throttling successfully started.
INFO:ModelsForStartup:pre-loading [‘hf/all_datasets_v4_MiniLM-L6’, ‘ViT-L/14’] onto devices=[‘cpu’]

###########################################################
###########################################################

STARTING DOWNLOAD OF MARQO ARTEFACTS################

###########################################################
###########################################################

INFO:DeviceSummary:found devices [{‘id’: -1, ‘name’: [‘cpu’]}]
INFO:SetBestAvailableDevice:Best available device set to: cpu
loading for: model_name=hf/all_datasets_v4_MiniLM-L6 and properties={‘name’: ‘flax-sentence-embeddings/all_datasets_v4_MiniLM-L6’, ‘dimensions’: 384, ‘tokens’: 128, ‘type’: ‘hf’, ‘notes’: ‘’}
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████| 612/612 [00:00<00:00, 47.4kB/s]
Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████| 90.9M/90.9M [00:01<00:00, 49.1MB/s]
Downloading (…)okenizer_config.json: 100%|█████████████████████████████████████████████████| 535/535 [00:00<00:00, 45.9kB/s]
Downloading (…)solve/main/vocab.txt: 100%|███████████████████████████████████████████████| 232k/232k [00:00<00:00, 2.45MB/s]
Downloading (…)/main/tokenizer.json: 100%|███████████████████████████████████████████████| 466k/466k [00:00<00:00, 10.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|█████████████████████████████████████████████████| 112/112 [00:00<00:00, 57.7kB/s]
INFO:marqo.s2_inference.s2_inference:loaded hf/all_datasets_v4_MiniLM-L6 on device cpu with normalization=True at time=2023-11-06 22:08:01.437866.
INFO:ModelsForStartup:hf/all_datasets_v4_MiniLM-L6 cpu run succesfully!
loading for: model_name=ViT-L/14 and properties={‘name’: ‘ViT-L/14’, ‘dimensions’: 768, ‘notes’: ‘CLIP ViT-L/14’, ‘type’: ‘clip’}
ERROR:marqo.s2_inference.s2_inference:Error loading model ViT-L/14 on device cpu with normalization=True.
Error message is <urlopen error [Errno 104] Connection reset by peer>
Traceback (most recent call last):
File “/usr/lib/python3.8/urllib/request.py”, line 1354, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File “/usr/lib/python3.8/http/client.py”, line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File “/usr/lib/python3.8/http/client.py”, line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “/usr/lib/python3.8/http/client.py”, line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File “/usr/lib/python3.8/http/client.py”, line 1011, in _send_output
self.send(msg)
File “/usr/lib/python3.8/http/client.py”, line 951, in send
self.connect()
File “/usr/lib/python3.8/http/client.py”, line 1425, in connect
self.sock = self._context.wrap_socket(self.sock,
File “/usr/lib/python3.8/ssl.py”, line 500, in wrap_socket
return self.sslsocket_class._create(
File “/usr/lib/python3.8/ssl.py”, line 1040, in _create
self.do_handshake()
File “/usr/lib/python3.8/ssl.py”, line 1309, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/app/src/marqo/s2_inference/s2_inference.py”, line 155, in _update_available_models
AvailableModelsKey.model: _load_model(
File “/app/src/marqo/s2_inference/s2_inference.py”, line 357, in _load_model
model.load()
File “/app/src/marqo/s2_inference/clip_utils.py”, line 261, in load
self.model, self.preprocess = clip.load(self.model_type, device=‘cpu’, jit=False, download_root=ModelCache.clip_cache_path)
File “/usr/local/lib/python3.8/dist-packages/clip/clip.py”, line 121, in load
model_path = _download(_MODELS[name], download_root or os.path.expanduser(“~/.cache/clip”))
File “/usr/local/lib/python3.8/dist-packages/clip/clip.py”, line 60, in _download
with urllib.request.urlopen(url) as source, open(download_target, “wb”) as output:
File “/usr/lib/python3.8/urllib/request.py”, line 222, in urlopen
return opener.open(url, data, timeout)
File “/usr/lib/python3.8/urllib/request.py”, line 525, in open
response = self._open(req, data)
File “/usr/lib/python3.8/urllib/request.py”, line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File “/usr/lib/python3.8/urllib/request.py”, line 502, in _call_chain
result = func(*args)
File “/usr/lib/python3.8/urllib/request.py”, line 1397, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File “/usr/lib/python3.8/urllib/request.py”, line 1357, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 104] Connection reset by peer>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/uvicorn”, line 8, in
sys.exit(main())
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 1157, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 1078, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python3.8/dist-packages/click/core.py”, line 783, in invoke
return __callback(*args, **kwargs)
File “/usr/local/lib/python3.8/dist-packages/uvicorn/main.py”, line 416, in main
run(
File “/usr/local/lib/python3.8/dist-packages/uvicorn/main.py”, line 587, in run
server.run()
File “/usr/local/lib/python3.8/dist-packages/uvicorn/server.py”, line 61, in run
return asyncio.run(self.serve(sockets=sockets))
File “/usr/lib/python3.8/asyncio/runners.py”, line 44, in run
return loop.run_until_complete(main)
File “uvloop/loop.pyx”, line 1517, in uvloop.loop.Loop.run_until_complete
File “/usr/local/lib/python3.8/dist-packages/uvicorn/server.py”, line 68, in serve
config.load()
File “/usr/local/lib/python3.8/dist-packages/uvicorn/config.py”, line 467, in load
self.loaded_app = import_from_string(self.app)
File “/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py”, line 21, in import_from_string
module = importlib.import_module(module_str)
File “/usr/lib/python3.8/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1014, in _gcd_import
File “”, line 991, in _find_and_load
File “”, line 975, in _find_and_load_unlocked
File “”, line 671, in _load_unlocked
File “”, line 848, in exec_module
File “”, line 219, in _call_with_frames_removed
File “/app/src/marqo/tensor_search/api.py”, line 79, in
on_start(OPENSEARCH_URL)
File “/app/src/marqo/tensor_search/on_start_script.py”, line 33, in on_start
thing_to_start.run()
File “/app/src/marqo/tensor_search/on_start_script.py”, line 159, in run
_ = _preload_model(model=model, content=test_string, device=device)
File “/app/src/marqo/tensor_search/on_start_script.py”, line 186, in _preload_model
_ = vectorise(
File “/app/src/marqo/s2_inference/s2_inference.py”, line 63, in vectorise
_update_available_models(
File “/app/src/marqo/s2_inference/s2_inference.py”, line 172, in _update_available_models
raise ModelLoadError(
marqo.s2_inference.errors.ModelLoadError: Unable to load model=ViT-L/14 on device=cpu with normalization=True. If you are trying to load a custom model, please check that model_properties={‘name’: ‘ViT-L/14’, ‘dimensions’: 768, ‘notes’: ‘CLIP ViT-L/14’, ‘type’: ‘clip’} is correct and Marqo has access to the weights file.

also: I have installed nvidia drivers and cuda according to manul and get nvidia-smi as well as cuda in pytorch + I have docker - nvidia engine. But when starting docker with --gpu=all it fails as well.

Thanks for looking at this.

jn2clark · November 8, 2023, 12:14am

Hi @karl ! It could be a couple of things, would you be able to try and start Marqo without any preloaded models?
You can remove any models using this
-e MARQO_MODELS_TO_PRELOAD='[]'
as part of the Docker command. See below as well
https://marqo.pages.dev/1.4.0/Troubleshooting/troubleshooting/#ram-and-vram

karl · November 15, 2023, 9:32pm

I checked all this. After a fresh install I manged to get it up and running.
But I can’t manage to get GPU support up.

owen-elliott · November 26, 2023, 10:45pm

Hi @karl, what OS are you running on? I presume you have followed the guide for using Marqo with a gpu?

Any additional info you can provide on the issue you are having with getting the GPU working will help us debug the problem.

Topic		Replies	Views
Marqo does not find my GPU Support	4	250	April 17, 2024
Running Marqo with docker compose? Support	1	372	December 15, 2023
WebApp for Containers Support	1	172	November 26, 2023
How to configure the persistent storage path? Support	5	399	August 11, 2024
I need some help with marqo Support	3	342	September 19, 2024

Issues with docker

STARTING DOWNLOAD OF MARQO ARTEFACTS################

Related topics