Error 500 Adding Documents

Marqo looks great. Very excited to develop with it. I have built some wrappers around it with my application server to make HTTP calls to Marqo. Most of the functions create/delete index are operating but adding documents is returning an error.

172.17.0.1:57134 - “POST /indexes/data/documents HTTP/1.1” 500 Internal Server Error

Body:
{ “documents” : [{“_id”:“1”,“title”:“Fat cat”,“description”:“The fat cat sits on the mat in the sunshine”},{“_id”:“2”,“title”:“Brown fox”,“description”:“The quick brown fox jumps over the lazy dog”}], “tensorFields” : [“description”] }

Docker shows:
2023-09-03 22:23:10 During handling of the above exception, another exception occurred:
2023-09-03 22:23:10
2023-09-03 22:23:10 Traceback (most recent call last):
2023-09-03 22:23:10 File “/usr/local/lib/python3.8/dist-packages/starlette/middleware/base.py”, line 43, in call_next
2023-09-03 22:23:10 message = await recv_stream.receive()
2023-09-03 22:23:10 File “/usr/local/lib/python3.8/dist-packages/anyio/streams/memory.py”, line 118, in receive
2023-09-03 22:23:10 raise EndOfStream
2023-09-03 22:23:10 anyio.EndOfStream

Sending a blank tensorFields value returns no error: “tensorFields” : [“”]
Response:
{“errors”:false,“processingTimeMs”:91.782602999956,“index_name”:“data”,“items”:[{“_id”:“1”,“result”:“created”,“status”:201},{“_id”:“2”,“result”:“created”,“status”:201}]}

As far as I can tell the tensorFields definition is correct?

Hi Artie.

So I can reproduce this, could you please let me know what index settings you’re using (or defaults) when creating your index?

Also how much memory is the container being allocated? You can view this by running docker stats.

Hi Artie! Thanks for posting this issue here!

Just tried to reproduce it without much luck:

 curl -XPOST .../indexes/pandu-test-1/documents \
      -H "Content-Type: application/json" \
      -d '{ "documents" : [{"_id":"1","title":"Fat cat","description":"The fat cat sits on the mat in the sunshine"},{"_id":"2","title":"Brown fox","description":"The quick brown fox jumps over the lazy dog"}], "tensorFields" : ["description"] }' | jq

Response:

{
  "errors": false,
  "processingTimeMs": 190.603825999915,
  "index_name": "pandu-test-1",
  "items": [
    {
      "_id": "1",
      "result": "updated",
      "status": 200
    },
    {
      "_id": "2",
      "result": "updated",
      "status": 200
    }
  ]
}

Is it possible that the Marqo container has been killed? Hard-to-explain 500s usually occur when Marqo has been killed by Docker.

Docker, out of the box, tends to kill Marqo because it under-provisions memory. When using Docker desktop I raise the default provisioned memory from 2GB to 8GB. This allows Marqo to fit in the machine learning models and vector-search-engine comfortably in memory.

HI thanks for the replies. I did increase the allocated memory but that wasn’t the issue. It came down to the split_overlap value when creating the index.

I had this set to 5 which may have been too large for the short sentences I was testing on. Dropping the value down to 0 or 1 allowed it to be inserted successfully.

Perhaps you might want to consider trapping and returning an error when this occurs or defaulting to 0 if the content is too short for the specified overlap?

Thanks for that update Artie. I’ll try to reproduce it. Do you mind sending the full index settings, and some sample documents you are indexing?

Hi the docker instance has 16GB available to it.

Below is the content transmitted:
Create

{ “index_defaults”: { “text_preprocessing”: { “split_length”: 2, “split_overlap”: 5, “split_method”: “word” }, “treat_urls_and_pointers_as_images”: false, “model”: “hf/all_datasets_v4_MiniLM-L6”, “normalize_embeddings”: true, “image_preprocessing”: { “patch_method”: null }, “ann_parameters” : { “space_type”: “cosinesimil”, “parameters”: { “ef_construction”: 128, “m”: 16 } } }, “number_of_shards”: 3, “number_of_replicas”: 0 }

Insert:

{ “documents” : [{“_id”:“1”,“title”:“Fat cat”,“description”:“The fat cat sits on the mat in the sunshine”},{“_id”:“2”,“title”:“Brown fox”,“description”:“The quick brown fox jumps over the lazy dog”}], “tensorFields” : [“description”] }

I can confirm that on my end a split_overlap value of 5 returns a 500 error. A value of 0 or 1 is successful

Thanks for that example! I managed to reproduce it on my end.

I created a GitHub issue for this. We’ll plan to implement a fix soon. Thanks again for bringing this up!