Performance improvements
Store more, clean faster: CleanStorage job optimized for S3 and MinIO
We are glad to announce the changes in the logic of the cleanStorage starting from the version 5.8.1 of Service Jobs and API Service version 5.9.0.
Previously, it was only possible to clear 1 attachment per 1 request to the binary storage, and it was not possible to clear more than 500,000 attachments per 1 job execution. The current implementation allows to clean multiple blobs (200,000) per 1 request to the S3 storage.
The chunk_size env variable remains for the service jobs (no changes in the deployment are needed). But under the hood, the job logic is splitting chunk_size value into fixed batches by 200k and processing deletion via a certain number of iterations, which is calculated based on chunk size. If chunk_size = 2 million, then the job will proceed with the deletion with 10 iterations and clean 200k attachments per iteration.
The new logic is 30 times faster for both S3 and MinIO binary storages
Use case 1: if chunk_size is set to 200,000, the cleanStorage job will delete 200,000 MAX attachments from the S3 storage within 1 iteration. If the attachments count in the attacment_deletion table is less than 200,000, then all attachments will be deleted.
Use case 2: if chunk_size is set to 2 million, the cleanStorage job will delete all attachments from the S3 storage within 10 iterations by batches of 200,000 attachments.
Thus, thanks to this implementation, we have significantly optimized the cleanStorage job performance. It keeps your binary storage within limits and allows you not to care about attachment cleaning. Since we've significantly speeded up the cleanup of binaries, we assume that this should completely solve all such problems.