1. Introduction
A dummy Kafka client (producer+consumer) that measures end-to-end latency of using Kafka.
The latencies are measured by sending messages to the given Kafka topic, consuming them
and measuring the time it took for the message to be produced and consumed.
The recorded latencies are reported in prometheus format on a metrics endpoint (/q/metrics
).
The metrics contain values for the median, 95th percentile, and 99th percentile of the latencies.
If the client is given the required ACLs, it will report latencies per broker. It is also able to automatically increase the number of partitions in a topic to match the number of brokers and automatically reassigns partitions to brokers to ensure that each broker will be produced to and consumed from.
2. Getting started
2.1. Using docker-compose
Use the docker compose file found at the root of this repository.
Start everything using
docker-compose up -d
This will start a kafka broker, the kafka-synth-client and prometheus so you can store visualize the metrics. Prometheus will be available at http://localhost:9090
. You can try the following queries:
synth_client_e2e_latency_ms
If you want to test against your own kafka cluster, be sure to change the kafka parameters in the docker-compose file. Everything prefixed with KAFKA_
will be used in the kafka client.
Refer to the chapter Parameters for more information on the available parameters.
2.2. Required ACLs
In order to work properly, the client’s user needs to have ACL permissions to do the following operations:
-
Describe, Read, Write, Alter the configured topic
-
Read the configured consumer group
For information on how to configure the consumer group and the topic, see the Parameters section.
An example for setting this up for a Strimzi cluster is available in the examples/
folder.
2.3. Available Metrics
The synth client exposes the following metrics in prometheus format (on the /q/metrics
endpoint):
2.3.1. End to end produce/consume latency in milliseconds
# HELP synth_client_e2e_latency_ms End-to-end latency of the synthetic client
# TYPE synth_client_e2e_latency_ms summary
synth_client_e2e_latency_ms{broker="0",fromRack="rack1",partition="0",toRack="rack1",quantile="0.5",} 6.125
synth_client_e2e_latency_ms{broker="0",fromRack="rack1",partition="0",toRack="rack1",quantile="0.8",} 7.125
synth_client_e2e_latency_ms{broker="0",fromRack="rack1",partition="0",toRack="rack1",quantile="0.9",} 9.375
synth_client_e2e_latency_ms{broker="0",fromRack="rack1",partition="0",toRack="rack1",quantile="0.95",} 9.375
synth_client_e2e_latency_ms{broker="0",fromRack="rack1",partition="0",toRack="rack1",quantile="0.99",} 10.375
This latency describes the time it took for a produced message to be consumed. The latency is measured in milliseconds. The synth client reports the median, 80th, 90th, 95th, and 99th percentile of the latencies. These metrics are handy if you have a service level objective (SLO) for the end-to-end latency of your Kafka messages (e.g. 95% of messages should be consumable within 30ms of being produced).
2.3.2. Acknowledgement latency in milliseconds
# HELP synth_client_ack_latency_ms Ack latency of the synthetic client
# TYPE synth_client_ack_latency_ms summary
synth_client_ack_latency_ms{broker="0",partition="0",rack="eu-west",quantile="0.5",} 4.1875
synth_client_ack_latency_ms{broker="0",partition="0",rack="eu-west",quantile="0.8",} 5.1875
synth_client_ack_latency_ms{broker="0",partition="0",rack="eu-west",quantile="0.9",} 6.1875
synth_client_ack_latency_ms{broker="0",partition="0",rack="eu-west",quantile="0.95",} 7.1875
synth_client_ack_latency_ms{broker="0",partition="0",rack="eu-west",quantile="0.99",} 9.4375
This latency describes the time it took for a produced message to be acknowledged by the broker. The latency is measured in milliseconds.
The synth client reports the median, 80th, 90th, 95th, and 99th percentile of the latencies. These metrics are handy if
you would like to know how long it takes for your message to be acknowledged by the broker.
This is especially interesting if you configure the producer with acks=all
(you can do this in the synth client by setting the KAFKA_ACKS
environment variable to all
).,
as this will only acknowledge the message once it has been received by all replicas.
In this case, we are effectively monitoring the time it takes to replicate the message across all replicas.
The rack
label indicates the "rack" or environment in which the client is running.
This label is useful for distinguishing between latencies in different environments.
2.3.3. Time since last message in seconds
# HELP synth_client_time_since_last_consumption_seconds
# TYPE synth_client_time_since_last_consumption_seconds gauge
synth_client_time_since_last_consumption_seconds{rack="rack1",} 0.175
This metric describes the amount of seconds since the last message was consumed by the client. Since the client is producing messages at a constant rate (at least one message per second), a value that is much higher than 1 second indicates that there are issues either with the production or consumption of messages. This metric is a good candidate for alerting, as it can indicate that the Kafka cluster is not functioning as expected or unreachable.
2.3.4. Producer error rate per second
# HELP synth_client_producer_error_rate
# TYPE synth_client_producer_error_rate gauge
synth_client_producer_error_rate{rack="rack1",} 0.0
The average per-second number of record sends that resulted in errors. An increase in this metric can indicate issues with reaching the Kafka cluster or issues with the Kafka cluster itself. This is another good candidate for alerting.
3. Parameters
You have the possibility to tweak the configuration using environment variables. Below is an overview of some key configuration parameters.
Parameter | default | Description |
---|---|---|
|
<mandatory> |
Kafka bootstrap servers |
|
<mandatory> |
The Kafka topic to produce to and consume from. |
|
8 |
The size of each Kafka message in bytes. |
|
1 |
The number of messages to produce per second. |
|
"default" |
Some identifier of the environment in which the client is running. For example "eu-west-1a". This is useful for measuring latencies between clients that are running in different environments. Can be left unset if not needed. |
|
8081 |
The port on which the metrics endpoint will be exposed. |
Furthermore, any environment variable prefixed with KAFKA_
will be interpreted as a Kafka Producer/Consumer configuration property.
For example, setting KAFKA_GROUP_ID
will set the value of the group.id
consumer property.