The Real Reasons Behind Most ZigBee Interoperability Problems

December 25th, 2015

Interoperability is a buzzword that we hear often when talking about wireless protocols, including ZigBee. Being an already trusted but still young standard, ZigBee itself can raise many questions when reading the official documentation. However, that is not the topic of this blog. With over a decade of experience in wireless communications software development and 7 years working closely with ZigBee, we have seen many cases where although the specification gives adequate description, developers invent their own bicycle. Our extensive experience integrating and working with a large number of sensors from different manufacturers provided us the valuable insight we are sharing in this blog.

The field where there is so much space for creativity and hence mistakes, is the application layer, when profiles join the game.

Let us start with one simple flag – “Manufacturer specific” flag in ZCL header, invalid usage of which may cause a variety of problems. The right way of using it is extending the functionality of ZCL (HA), adding attributes or whole clusters that are not provided officially. For example, we cannot guess, why “Temperature measurement” cluster has a “tolerance” attribute, while the “Humidity measurement” does not. It is about the fact that if you want “Tolerance” attribute in your humidity sensor, you need to make a manufacturer specific attribute. Or, in another example, let’s say you are working on a ZigBee-based pet tracking system. We promise there is no “Animal tracker” cluster in any specification. You will need to implement it yourself and, yes, it will be manufacturer-specific.

The common mistake of using this flag is marking general attributes and commands with it. We faced it while working with IAS sensors and made us wonder why the standard enrollment procedure needs any manufacturer code. Do developers really consider their manufacturer code safer from intruders than the entire ZigBee security system?

Anyway, it can be easily debugged, because the only thing we need to know in this case is the manufacturer code. There is a way to obtain it using only ZigBee tools: the code is placed into the node descriptor. If the node descriptor does not work, it can be requested from the manufacturer. And, when there are no contacts, ZigBee sniffer can help too. If there is a coordinator that the intended device successfully enrolls with, then with the proper enrollment procedure caught by the sniffer, we will get the code. Another way to achieve this is by writing any attribute in the intended cluster and probably getting the response with the code. Moreover, configuring and binding the intended cluster may cause some manufacturer-specific attribute to be reported with the code. So, they key is just to be patient.

This mistake may be worse when the device confuses ZDP discovery tools: for example, the cluster is not returned in a simple or match descriptor response, but some commands are supported and they are manufacturer-specific. In this case, discovery does not work and you will need either a technical contact or a lot of time to experiment.

In this case all we know is that the device in our hands is a ZigBee device and what it is used for. So we can predict its cluster list. The only thing we can do without manufacturer help is to send commands to the predicted cluster waiting for the response with some status.

The next issue has to do with attribute semantics misunderstanding. When the number of attributes exceeds two-three and cluster logic becomes complicated, this can lead to misunderstanding of an existing attribute’s meaning . Just imagine the situation when you try to set temperature on a thermostat but it is still too cold or hot in the room. Now we take this HVAC system and try to guess, which setpoint the “Setpoint Raise/Lower” command operates with? It depends on the command’s mode as well as current system’s mode. But some developers may like only one clear attribute and of course it will cut the existing logic. In this case, specification misunderstanding can even cause attribute duplication.

One of the last common problems has to do with a very useful HA extension – poll control. Even though it is strongly recommended to implement it, it is often ignored. However, real problems come, when the device has its own long poll interval that is much longer than the default one. If we leave the situation as is we will for sure have many packets lost for such a sleepy device. Therefore, we should increase the timeout for deleting expired indirect packets. This does come with a risk: if the interval is too high, the queue most likely will got overflowed. That is why when increasing the indirect queue timeout, updated coordinator should be tested in a large network with a lot of sleepy devices connected.

To close, we want to add a few words about the mistake that will not break interoperability, but can be frustrating and easily avoided. Unfortunately, as of today we do not have as many reportable attributes as we may want. And everybody who faces this problem solves it in his or her own way. We have seen “Write attributes” sent to the client cluster and even reports that were not configured. It is the only problem described here that can be attributed to by the lack of functionality in the official specifications. We are sure this will be addressed in one of the next updates. But we are sure that the devices that skip the configure/bind logic before sending reports will not disappear for many years.

We hope this blog gave enough examples to show that most interoperability problems at the application layer appear because of not completely understanding ZigBee Alliance documents. With the growth of ZigBee technology and the number of well-designed devices, such misunderstanding may make the product less competitive and supported. It is key to take time to understand and follow the standard to avoid these issues and ensure the success of your products.