MARQO_MAX_VECTORISE_BATCH_SIZE 16 Maximum size of batch size to process in parallel (when, for example, adding documents ).
What is the function of this parameter?
I have implemented a paralleized ingest process using celery, with 8 concurrent workers. However, the ingest time is only three times faster than single-threaded ingest. Furthermore, this ingest time does not improve when I increase MARQO_MAX_VECTORISE_BATCH_SIZE to 32.
Is it possible that this parameter is not presently used and/or transferred to Vespa?
That parameter controls the number of documents we will vectorize per batch on Marqo’s inference engine.
Note that this is not being passed to Vespa.
I suspect this is a performance bottleneck in the underlying Vespa store. Could you give me a better understanding of how your service is currently deployed (e.g. cluster on kubernetes, instance running on docker locally)? If it is a single instance, could you give me some idea of the Disk I/O, memory consumption, cpu utilization, gpu utilization, and gpu memory over time?
The ideal graph to diagnose what’s happening here would be one that maps throughput as you start ingesting against each of these to see where we plateau.
Scaling the Vespa feed nodes had no effect. It seems that the bottleneck was the text chunk embedding. We were able to increase throughput by placing multiple marqo instances behind a load balancer.