Skip to main content

Apache Pulsar as a Message Queue: A Guide to Scalable Messaging

As a software engineer, managing messaging systems at scale often requires a robust solution that supports high throughput, low latency, and multi-tenancy. Apache Pulsar, a cloud-native, distributed messaging and streaming platform, is designed to address these needs. While Pulsar is commonly recognized as a streaming platform, it also excels as a message queue, offering a flexible and powerful alternative to traditional brokers like RabbitMQ or Kafka.

In this guide, we’ll explore the core concepts of Apache Pulsar as a message queue, its features, and how to implement it in a practical use case.


Why Use Apache Pulsar for Messaging?

Apache Pulsar provides a unified messaging solution that combines the best of both worlds: message queuing and event streaming. Key reasons to use Pulsar for message queues include:

  1. Multi-Tenancy: Supports isolation of workloads across tenants in a single cluster.
  2. Topic-Level Durability: Messages are persisted to disk with configurable retention policies.
  3. Scalable Queues: Topics can be partitioned for horizontal scaling.
  4. Built-In Message Acknowledgments: Ensures at-least-once or exactly-once delivery semantics.
  5. Subscription Models: Offers flexible consumption patterns, including exclusive, shared, and failover modes.
  6. Geo-Replication: Ensures message availability across multiple regions.

Messaging Patterns Supported by Pulsar

  1. Point-to-Point (Queues):

    • Producers send messages to a single topic.
    • Consumers in a shared subscription mode process messages in a load-balanced manner.
  2. Publish-Subscribe:

    • Producers send messages to a topic.
    • Multiple consumers (subscriptions) receive copies of the same message.
  3. Streaming:

    • Similar to Kafka, Pulsar supports partitioned topics for parallel processing of message streams.

A Use Case: Task Queue for Image Processing

Imagine a system where users upload images that need to be processed (e.g., generating thumbnails). Instead of processing images synchronously, the application enqueues tasks in a Pulsar topic, and multiple workers consume the tasks asynchronously to process them.


Step-by-Step Guide: Implementing Pulsar as a Message Queue

Step 1: Set Up Apache Pulsar

  1. Download Apache Pulsar: Download Pulsar from the official website.

    wget https://downloads.apache.org/pulsar/pulsar-3.0.0/apache-pulsar-3.0.0-bin.tar.gz
    tar -xzf apache-pulsar-3.0.0-bin.tar.gz
    cd apache-pulsar-3.0.0
  2. Start Pulsar in Standalone Mode:

    bin/pulsar standalone

    This starts a local Pulsar instance with a broker, ZooKeeper, and BookKeeper.


Step 2: Create a Pulsar Topic

You can create a topic explicitly or let Pulsar create it on the first publish. For a task queue, use a non-partitioned topic:

bin/pulsar-admin topics create persistent://public/default/task-queue

Step 3: Send Messages to the Topic

Use the Pulsar client library to produce messages. Below is an example in Python.

Install Pulsar Client:

pip install pulsar-client

Producer Code:

import pulsar

# Connect to Pulsar
client = pulsar.Client('pulsar://localhost:6650')

# Create a producer
producer = client.create_producer('persistent://public/default/task-queue')

# Send a message
message = '{"task_id": "12345", "file": "image.png"}'
producer.send(message.encode('utf-8'))
print(f"Message sent: {message}")

# Close the client
producer.close()
client.close()

Step 4: Consume Messages from the Topic

Use a shared subscription for a worker pool that processes messages in a load-balanced manner.

Consumer Code:

import pulsar

# Connect to Pulsar
client = pulsar.Client('pulsar://localhost:6650')

# Create a consumer
consumer = client.subscribe(
'persistent://public/default/task-queue',
subscription_name='image-processors',
subscription_type=pulsar.SubscriptionType.Shared
)

# Consume messages
while True:
msg = consumer.receive()
try:
print(f"Received message: {msg.data().decode('utf-8')}")
# Acknowledge the message after processing
consumer.acknowledge(msg)
except Exception as e:
# Negative acknowledgment in case of failure
print(f"Failed to process message: {e}")
consumer.negative_acknowledge(msg)

# Close the client
consumer.close()
client.close()

Step 5: Monitor Pulsar

Use Pulsar’s admin tools to monitor the topic and subscription:

  1. View Topic Stats:

    bin/pulsar-admin topics stats persistent://public/default/task-queue
  2. List Subscriptions:

    bin/pulsar-admin topics subscriptions persistent://public/default/task-queue

Advanced Features of Pulsar for Messaging

  1. Dead Letter Topics (DLQ): Handle undeliverable messages by routing them to a dead-letter topic:

    consumer = client.subscribe(
    'persistent://public/default/task-queue',
    subscription_name='image-processors',
    dead_letter_policy=pulsar.DeadLetterPolicy(max_redeliver_count=3, dead_letter_topic='persistent://public/default/dlq')
    )
  2. Partitioned Topics: For high-throughput systems, partition a topic and scale consumers:

    bin/pulsar-admin topics create-partitioned-topic persistent://public/default/task-queue-partitioned --partitions 3
  3. Batching: Optimize performance by sending messages in batches:

    producer = client.create_producer(
    'persistent://public/default/task-queue',
    batching_enabled=True,
    batching_max_messages=100
    )
  4. Geo-Replication: Replicate messages across multiple Pulsar clusters for disaster recovery:

    bin/pulsar-admin namespaces set-clusters public/default --clusters cluster-a,cluster-b

When to Use Pulsar as a Message Queue

Pros

  • Scalability: Partitioned topics enable horizontal scaling.
  • Durability: Messages are stored persistently in BookKeeper.
  • Flexibility: Supports both queue and pub/sub patterns.
  • Multi-Tenancy: Ideal for large organizations with isolated workloads.
  • Geo-Replication: Ensures high availability across regions.

Cons

  • Operational Complexity: Requires setup and management of brokers, ZooKeeper, and BookKeeper.
  • Memory Usage: May need careful tuning for resource efficiency.

Best Practices for Pulsar Queues

  1. Use Dead Letter Topics: Prevent message loss and diagnose failures.
  2. Optimize Acknowledgment: Use manual acknowledgment to control message processing precisely.
  3. Scale Consumers: Add consumers for shared subscriptions to handle high message throughput.
  4. Monitor Resource Usage: Regularly monitor broker and BookKeeper performance to avoid bottlenecks.

Conclusion

Apache Pulsar offers a versatile and scalable platform for building robust messaging systems. Its ability to function as a message queue, combined with advanced features like dead-letter topics, partitioning, and geo-replication, makes it an excellent choice for modern distributed applications.

By following the steps in this guide, you can set up and use Pulsar as a message queue to build high-performance, reliable systems. Let me know if you'd like additional examples, such as using Pulsar with Java or integrating it with Kubernetes!