RabbitMQ Tuning – part 1

As any good programmer or devop know there no single method to improve rabbitMQ installation, meaning cpu optimisation, memory consumption and round trip performances. Is very hard to optimize software like this without  environment knowledge because usually people like that behind RabbitMQ are able to do their jobs so no quick win for optimisation but only balancing tips based on installation.

Increase connections limit and file descriptors limits.

RabbitMQ installations running production workloads may need system limits and kernel parameters tuning in order to handle a decent number of concurrent connections and queues. You can increase the 1024 default limit in most operating systems to at least 65536 for production. This is a generic optimisation that have no bad draw back on you setup.

In order to do this, you shoud first to find the correct rabbitmq-server usinglocate rabbitmq-serve.
Then open the file with sudo vim /etc/default/rabbitmq-server (or you location if different.  It is a file which will contain the line ulimit -n 65536. Uncomment the ulimit command if needed.
Then add EOF as the last line and now The file will now appear like this:

# This file is sourced by /etc/init.d/rabbitmq-server. Its primary
# reason for existing is to allow adjustment of system limits for the
# rabbitmq-server process.
#
# Maximum number of open file handles. This will need to be increased
# to handle many simultaneous connections. Refer to the system
# documentation for ulimit (in man bash) for more information.
#
ulimit -n 65536
EOF

Now you have to restart rabbitmq. Don’t restart the operating system though.
Run sudo rabbitmqctl stop and Then start rabbitmq again either with sudo rabbitmqctl start &.
That’s it. Now go to the RabbitMQ Browser UI, and you’ll see that the number of sockets have increased.
Sometimes in some configuration you may need to restart rabbitmq using the “service rabbitmq-server restart”

Change management-plugin settings

Another quick and easy optimisation is related to the monitoring, many rabbitMQ tutorials suggest activate rabitmq management plugin to have statistics and UI of your queue. First of all it’s not a single one but a mix of different plugins and that introduce web ui, statistics, we api etc and have a great value. However in production is quite common to delegate to the RabbitMQ machine (or container) the job of queue managing and not monitoring that is delegated to external application. Considering that and the fact that statistics use a huge amount of memory (especially if you have huge amount of queues and messages). You can decide to disable completely the statistics or increase the interval. My suggestion however is to disable the statistics and avoid to disable completely the management plugin because it provide a lot of useful functions that don’t impact memory. Below you can see a rabbitmq.cofig that disable the statistics collection.

[
  {rabbit, [
		.
		.
                .
        ,{collect_statistics,none},
        {collect_statistics_interval,360000}
  ]}
 ,{rabbitmq_management,
    [{rates_mode,none},
     {stats_event_max_backlog,150},
     {sample_retention_policies,
      %% List of {MaxAgeInSeconds, SampleEveryNSeconds}
      [{global,   [{60, 10}, {720, 120}, {7200, 1200}, {21600, 3600}]},
       {basic,    [{60, 10}, {720, 120}]},
       {detailed, [{20, 10}]}]
    }]
  }
  ].


The key point of this optimisation research is based to the ability to find the right modelling schema for your queue usage.
After this change it’s useful to clean the statistics database in order to free memory. Rabbitmq let you do this without restart.
The statistics database is stored in the memory of the stats process previously to RabbitMQ 3.6.2, and stored in ETS tables from RabbitMQ 3.6.2. To restart the database with versions earlier than 3.6.2, use

rabbitmqctl eval 'exit(erlang:whereis(rabbit_mgmt_db), please_terminate).'

Starting with RabbitMQ 3.6.2 and up to 3.6.5, use

rabbitmqctl eval 'supervisor2:terminate_child(rabbit_mgmt_sup_sup, rabbit_mgmt_sup),
rabbit_mgmt_sup_sup:start_child().'

Use the right type of queue

Resources are not infinite and sometimes is not necessary to use cluster but is possible to save money and time identifying the right type of queue to use. If you cannot afford to lose any messages, make sure that your queue is declared as “durable” and your messages are sent with delivery mode “persistent”.
In order to avoid losing messages on system restart we need to ensure that they are saved on durable support.
Persistent messages are heavier as they have to be written to disk. Keep in mind that lazy queues will have the same effect on performance, even though you are sending transit messages. For high performance – use transit messages.
In some cases system can still works well even if it lose a message (consider logs or informations without values after a system restart) so durable queues can be avoided. Consider also system tuning to reduce not processed messages as much as possibile to avoid loosing message during system restart.

System optimisations

ipv4.tcp_tw_reuse allow to reuse tcp sockets in TIME-WAIT state for new connections. This can limit the number of open sockets.
Don’t recycle tcp connections (recycle can break clients behind NAT).
Operating systems limit maximum number of concurrently open file handles, which includes network sockets. You must be sure to manage enough opened files. fs.file-max doesn’t impact performances but setting it to 200000 let you forget about without side effects.
net.ipv4.tcp_fin_timeout can improve tcp connection reutilisation.

sysctl -w net.ipv4.tcp_tw_reuse=1
sysctl -w net.ipv4.tcp_tw_recycle = 0
sysctl -w fs.file-max=200000
sysctl -w net.ipv4.tcp_fin_timeout=10

You should also avoid SYN attack enabling syncookies. This is useful to avoid connections blocking after RabbitMQ restart or under heavy loading and reconnections. Increase also the amount of opened connections that the system can handle for the same reason.

sysctl -w net.ipv4.tcp_syncookies = 1
sysctl -w net.ipv4.tcp_syn_retries = 2
sysctl -w net.ipv4.tcp_synack_retries = 2
sysctl -w net.ipv4.tcp_max_syn_backlog = 4096
sysctl -w net.core.somaxconn = 4096
sysctl -w net.core.netdev_max_backlog = 65536

In order to have a more aggressive input on your machine you can use these options to use extra bit to increase windows size and 

net.ipv4.tcp_window_scaling = 1
tcp_slow_start_after_idle = 0
net.ipv4.tcp_no_metrics_save = 0

Spread the word. Share this post!

Leave Comment

Your email address will not be published. Required fields are marked *