Marqo looks great. Very excited to develop with it. I have built some wrappers around it with my application server to make HTTP calls to Marqo. Most of the functions create/delete index are operating but adding documents is returning an error.
172.17.0.1:57134 - “POST /indexes/data/documents HTTP/1.1” 500 Internal Server Error
Body:
{ “documents” : [{“_id”:“1”,“title”:“Fat cat”,“description”:“The fat cat sits on the mat in the sunshine”},{“_id”:“2”,“title”:“Brown fox”,“description”:“The quick brown fox jumps over the lazy dog”}], “tensorFields” : [“description”] }
Docker shows:
2023-09-03 22:23:10 During handling of the above exception, another exception occurred:
2023-09-03 22:23:10
2023-09-03 22:23:10 Traceback (most recent call last):
2023-09-03 22:23:10 File “/usr/local/lib/python3.8/dist-packages/starlette/middleware/base.py”, line 43, in call_next
2023-09-03 22:23:10 message = await recv_stream.receive()
2023-09-03 22:23:10 File “/usr/local/lib/python3.8/dist-packages/anyio/streams/memory.py”, line 118, in receive
2023-09-03 22:23:10 raise EndOfStream
2023-09-03 22:23:10 anyio.EndOfStream
Sending a blank tensorFields value returns no error: “tensorFields” : [“”]
Response:
{“errors”:false,“processingTimeMs”:91.782602999956,“index_name”:“data”,“items”:[{“_id”:“1”,“result”:“created”,“status”:201},{“_id”:“2”,“result”:“created”,“status”:201}]}
As far as I can tell the tensorFields definition is correct?
curl -XPOST .../indexes/pandu-test-1/documents \
-H "Content-Type: application/json" \
-d '{ "documents" : [{"_id":"1","title":"Fat cat","description":"The fat cat sits on the mat in the sunshine"},{"_id":"2","title":"Brown fox","description":"The quick brown fox jumps over the lazy dog"}], "tensorFields" : ["description"] }' | jq
Is it possible that the Marqo container has been killed? Hard-to-explain 500s usually occur when Marqo has been killed by Docker.
Docker, out of the box, tends to kill Marqo because it under-provisions memory. When using Docker desktop I raise the default provisioned memory from 2GB to 8GB. This allows Marqo to fit in the machine learning models and vector-search-engine comfortably in memory.
HI thanks for the replies. I did increase the allocated memory but that wasn’t the issue. It came down to the split_overlap value when creating the index.
I had this set to 5 which may have been too large for the short sentences I was testing on. Dropping the value down to 0 or 1 allowed it to be inserted successfully.
Perhaps you might want to consider trapping and returning an error when this occurs or defaulting to 0 if the content is too short for the specified overlap?
{ “documents” : [{“_id”:“1”,“title”:“Fat cat”,“description”:“The fat cat sits on the mat in the sunshine”},{“_id”:“2”,“title”:“Brown fox”,“description”:“The quick brown fox jumps over the lazy dog”}], “tensorFields” : [“description”] }
I can confirm that on my end a split_overlap value of 5 returns a 500 error. A value of 0 or 1 is successful