Python→Redis→RabbitMQ

What we had, What we thought, What we did

Talha Hanif Butt
5 min readMar 7, 2020

So, another week gone and what I went through during the week was great as my Thesis supervisor accepted my idea (after about 2 months of literature review and discussions, which was after about 9–10 months of initial working on a couple of other ideas), my course supervisor accepted my idea (after two weeks of discussions, actually two meetings)and I learned two new words, Redis and RabbitMQ. Today I will only focus on Redis and RabbitMQ but I will talk about the ideas accepted once I get something out of them (hopefully soon).

Source: https://www.educba.com/rabbitmq-vs-redis/

Why?

We are working on object detection and tagging for which we started off with OpenCV compilation about which I already wrote here:

After compilation, it was optimization about which I already wrote here:

After optimization, we deployed our application about which I already wrote here:

What we had?

Until this point, we were using simple python threads and queues for handling multiple processes but there were some questions to be answered out of which the most important was:

How much load of the developer can be taken as far as multiprocessing, synchronization and scaling is concerned?

What we thought?

To answer the question above, we first tried Redis because of RQ Workers.

A worker is a Python process that typically runs in the background and exists solely as a work horse to perform lengthy or blocking tasks that you don’t want to perform inside web processes. RQ uses workers to perform tasks that are added to the queue.

It provides FIFO-semantics per queue, but how many queues you create is up to you. For the simplest cases, simply using the default queue suffices already.

In case of multiple queues, all queues are equally important to RQ. None has higher priority as far as RQ is concerned. But when you start a worker, you are defining queue priority by the order of the arguments.

First, all work on the high priority queue is done (with FIFO semantics), then low priority queue is emptied. If meanwhile work is enqueued on high priority queue, that work takes precedence over the low priority queue again after the currently running job is finished.

Problem

In case of Redis-Queue, for sending Callback inside a thread, join is required, it’s kind of single threaded.

What we did?

As a solution to the above mentioned problem, we tried RabbitMQ because:

RabbitMQ is a message broker: it accepts and forwards messages. You can think about it as a post office: when you put the mail that you want posting in a post box, you can be sure that Mr. or Ms. Mailperson will eventually deliver the mail to your recipient. In this analogy, RabbitMQ is a post box, a post office and a postman.

By default, RabbitMQ will send each message to the next consumer, in sequence. On average every consumer will get the same number of messages. This way of distributing messages is called round-robin.

Message acknowledgment

Doing a task can take a few seconds. You may wonder what happens if one of the consumers starts a long task and dies with it only partly done.

There are two scenarios available:

RabbitMQ delivers message to the consumer it immediately marks it for deletion. In this case, if you kill a worker we will lose the message it was just processing. We’ll also lose all the messages that were dispatched to this particular worker but were not yet handled.

But we don’t want to lose any tasks. If a worker dies, we’d like the task to be delivered to another worker.

In order to make sure a message is never lost, RabbitMQ supports message acknowledgments. An ack(nowledgement) is sent back by the consumer to tell RabbitMQ that a particular message had been received, processed and that RabbitMQ is free to delete it.

If a consumer dies (its channel is closed, connection is closed, or TCP connection is lost) without sending an ack, RabbitMQ will understand that a message wasn’t processed fully and will re-queue it. If there are other consumers online at the same time, it will then quickly redeliver it to another consumer. That way you can be sure that no message is lost, even if the workers occasionally die.

There aren’t any message timeouts; RabbitMQ will redeliver the message when the consumer dies. It’s fine even if processing a message takes a very, very long time.

Message durability

When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to. Two things are required to make sure that messages aren’t lost: we need to mark both the queue and messages as durable.

We can also mark our messages as persistent — by supplying a delivery_mode property with a value 2.

Fair dispatch

In a situation with two workers, when all odd messages are heavy and even messages are light, one worker will be constantly busy and the other one will do hardly any work. Well, RabbitMQ doesn’t know anything about that and will still dispatch messages evenly.

This happens because RabbitMQ just dispatches a message when the message enters the queue. It doesn’t look at the number of unacknowledged messages for a consumer. It just blindly dispatches every n-th message to the n-th consumer.

In order to defeat that we can use the basic.qos method with the prefetch_count=1 setting. This tells RabbitMQ not to give more than one message to a worker at a time. Or, in other words, don’t dispatch a new message to a worker until it has processed and acknowledged the previous one. Instead, it will dispatch it to the next worker that is not still busy.

References

--

--