What is the function of MARQO_MAX_VECTORISE_BATCH_SIZE?

rcking · October 2, 2024, 8:25am

MARQO_MAX_VECTORISE_BATCH_SIZE 16 Maximum size of batch size to process in parallel (when, for example, adding documents ).

What is the function of this parameter?

I have implemented a paralleized ingest process using celery, with 8 concurrent workers. However, the ingest time is only three times faster than single-threaded ingest. Furthermore, this ingest time does not improve when I increase MARQO_MAX_VECTORISE_BATCH_SIZE to 32.

Is it possible that this parameter is not presently used and/or transferred to Vespa?

Marqo-Robertson · October 2, 2024, 12:06pm

Hey,

That parameter controls the number of documents we will vectorize per batch on Marqo’s inference engine.

github.com

marqo-ai/marqo/blob/58ab19a5ac93ebf8a103c0c02b01ebe166de6e0c/src/marqo/s2_inference/s2_inference.py#L116C1-L145C45


      
          def _encode_without_cache(model_cache_key: str, content: Union[str, List[str], List[Image], List[bytes]],
                                    normalize_embeddings: bool, modality: Modality, **kwargs) -> List[List[float]]:
              try:
                  model = _available_models[model_cache_key][AvailableModelsKey.model]
                  encoder = get_encoder(model)
          
                  if isinstance(content, str):
                      vectorised = model.encode(content, normalize=normalize_embeddings, modality=modality, **kwargs)
                  elif isinstance(content, (torch.Tensor, torch.FloatTensor)):
                      vectorised = model.encode(content, normalize=normalize_embeddings, modality=modality, **kwargs)
                  else:
                      vector_batches = []
                      batch_size = _get_max_vectorise_batch_size()
                      
                      for batch in generate_batches(content, batch_size=batch_size):
                          if modality is None:
                              modality = infer_modality(batch[0] if isinstance(batch[0], (str, bytes)) else batch)
          
                          encoded_batch = encoder.encode(batch, modality=modality, normalize=normalize_embeddings, **kwargs)

This file has been truncated. show original

Note that this is not being passed to Vespa.

I suspect this is a performance bottleneck in the underlying Vespa store. Could you give me a better understanding of how your service is currently deployed (e.g. cluster on kubernetes, instance running on docker locally)? If it is a single instance, could you give me some idea of the Disk I/O, memory consumption, cpu utilization, gpu utilization, and gpu memory over time?

The ideal graph to diagnose what’s happening here would be one that maps throughput as you start ingesting against each of these to see where we plateau.

rcking · October 2, 2024, 12:55pm

Thanks for your quick feedback!

In this case, I would expect a performance increase if we increase the number of document-processing nodes in the Vespa cluster, right?

rcking · October 9, 2024, 12:34pm

Scaling the Vespa feed nodes had no effect. It seems that the bottleneck was the text chunk embedding. We were able to increase throughput by placing multiple marqo instances behind a load balancer.

Topic		Replies	Views
Marqo Major Release 1.0.0 Announcements	1	211	August 3, 2023
Container vespa resource-limits General	0	14	March 6, 2025
Marqo Release 0.1.0 Announcements	0	210	July 6, 2023
Document Input Compatibility? Support	1	96	April 1, 2024
Support for vectors as array of floats? Support	5	306	November 26, 2023

What is the function of MARQO_MAX_VECTORISE_BATCH_SIZE?

Related topics