As it happens to me from time to time, today I ended up sleeplessat 2 am with nothing to do. So after contemplating life, the universe and everything for a while I decided to put my time to a good use by playing with MQTT.

In case you haven’t heard it before, the name MQTT stands for MQ Telemetry Transport, and it is a publish/subscribe protocol aimed at sensor networks and embedded devices. MQTT is an application (layer 7) protocol that uses TCP or other reliable transports and is standardized at the wire format level (in this aspect it is similar to AMQP, where one client library can connect to any AMQP broker, instead of requiring vendor-specific clients like JMS or ODBC/JDBC).

The operations implemented by MQTT are the bare minimum for this kind of service:

  • Connect / Disconnect
  • Subscribe to a topic
  • Unsubscribe from a topic
  • Publish a message to a topic

There is no concept of “Queue” in MQTT. All messages sent to a topic are dispatched to all subscribers, which is very confusing since the two first letters of the protocol’s name are precisely “MQ”2. MQTT does not impose any particular format on the message data, so it can handle JSON, XML and binary formats equally well. The simplicity plus the lack of restrictions make it ideally suited for use in embedded systems like IoT devices.

Deploying Mosquitto

The simplest way to try MQTT these days seems to be to use Mosquitto, an Eclipse.org project that implements a full MQTT server. Luckily for me several people have done the hard work of packaging Mosquitto as a Docker image, so the only thing I had to do was pull the image and wire up some folders, and I was up and running. This is the command I used to do that:

docker run -it 
    -p [server_ip]:[server_port]:1883 \
    -p [server_ip]:[server_port]:9001 \
    -v /storage/docker/work/mqtt/config:/mosquitto/config \
    -v /storage/docker/work/mqtt/data:/mosquitto/data \
    -v /storage/docker/work/mqtt/log:/mosquitto/log \
    --restart=always --name mosquitto eclipse-mosquitto

By default the image will listen on ports 1883 (MQTT) and 9001 (MQTT over Websocket). I had to remap those ports because I already have other things listening there. I also mapped host directories to the configuration, data and log directories in the container, to make them persistent3.

Playing with the Python client

Connecting to an MQTT server is surprisingly straightforward using the Paho client for Python, and the no-nonsense design of the API makes it very easy to work with topics and messages. With just a handful of lines of code I was able to publish and consume simple messages:

import paho.mqtt.client as paho
import threading
import time

# ------------ SUBSCRIBER CODE -----------------
# Callback function for every received message
def processMessage(client, userdata, msg):
  print("Message from " + msg.topic + ": " + str(msg.payload))

# Create receiver
client = paho.Client()
client.connect("[SERVER_IP]", [SERVER_PORT], 60)
client.on_message = processMessage
client.subscribe("/test/topic")

# Start receiver on a separate thread
loopThread = threading.Thread(target=client.loop_forever)
loopThread.start();

# ------------ PUBLISHER CODE -----------------
# Create publisher
publisher = paho.Client()
publisher.connect("[SERVER_IP]", [SERVER_PORT], 60)

# Publish 3 messages
for i in range(3):
  publisher.publish("/test/topic", "Message number " + str(i))
  time.sleep(0.5)

# Clean up
publisher.disconnect()
client.disconnect()
print("End")

 

Protocol analysis

Since by default MQTT doesn’t use SSL I was able to capture and examine the contents of the MQTT conversation between clients and broker. I was surprised by how compact the protocol actually is. The message headers are almost nonexistent and the data is packed so as to fit as much as possible in a single TCP segment or Link Layer frame.

Here are some Wireshark captures of both the publisher and the subscriber. I annotated the TCP stream with the different segments exchanged between clients and server.

First, here is the exchange when the publisher connects to the broker and pushes three messages (client to server messages are in red, where server to client messages are in blue):

And here is the client side of it, with the three messages being pushed to the client by the MQTT broker:

 

Next step: Home Assistant and the Google Home that I got for Christmas!
Stay tuned.

 


1 Go ahead, make the “Sleepless in Seattle” joke. You know you want to.
Apparently MQTT was once part of IBM’s MQ series of products. Hence the prefix.
3 I know I could have used Docker volumes, but this works better with my backup strategy.

Most programmers understand Garbage Collection but very few know that memory can get fragmented and filled with holes just like hard drives, with far more serious consequences.

2006-new-york-us-a-ford-e-005

All languages, interpreted or compiled, give programmers the ability to allocate and release spans of memory. Objects, structures or simple blobs of addressable space, they can be created, used and returned to the memory pool once they are no longer needed. But there’s a catch:

Even with the most efficient memory manager, even with the best-in-class garbage collection algorithm, there is no guarantee that after a piece of code has done its thing the memory will have the same capacity to hold data. Let this sink in for a second: you write your code to the best of your ability, your debugger and profilers tell you there is plenty of memory to go around, and yet your program crashes because it ran out of memory, and there is nothing you can do.

Consider, for example, a piece of code that takes a string already stored in memory, and simply adds an extra character at the end. Regardless of the language, and except in very special circumstances, the program will need to allocate a new chunk of memory to hold the new string, copy the data over (adding the character at the end) and then free the old memory block.

contiguous-memory

Rinse and repeat. A million times. Ten million times. Across days, weeks or months. The memory space of any non-trivial program becomes a series of holes where new data may not fit. Granted, with today’s computers and heap sizes, a condition like this is unlikely to happen in server-class hardware but in low-end devices it is a real possibility.

The solution to this problem is to run a process called Memory Compacting that physically relocates all objects in the application’s heap and re-writes references and pointers so that al free memory becomes a single block again:

compaction

Now, not all (in fact, very few) runtimes do this. The grand list of languages and runtimes that compact memory is:

  1. JRE: Java, Scala, groovy, etc.
  2. .NET CLI: C#, F#, Visual Basic and others
  3. LISP

The only viable alternative to memory compaction is not to use dynamic memory allocation at all, and only use statically-defined variables and stack-local variables. As you can imagine this reduces the flexibility of the algorithms that can be implemented, but has the advantage of being the only method that guarantees that the program will never run out of space to store objects and can calculate in advance the amount of memory that the program will need.

Not surprisingly the static memory management is the preferred method for it systems implemented on microcontrollers and other systems very very limited amount of memory. You have no way of doing it in any language other than assembler, C and (to a certain degree) C++.

A plea for sanity in memory management

With this in mind, i’d like to end this article asking you to please stop using scripting languages like Python, Ruby or PHP for projects that must run for months or years at a time even if it’s not on limited hardware. Just stop.

Use real languages that use a real runtime that will guarantee your program will run for as long as it needs to, or take matters into your own hands and do your own memory management with C. All other options will be problematic in the long run.