March 16, 2018

Manually controlling OpenWrt hardware watchdog

Introduction to hardware watchdogs

Function of hardware watchdog is to monitor if system is working correctly and if it goes into an unknown state (ie. freezes or stops working as expected) to return it to a known state - usually by resetting or rebooting the device.

Why should you care?

Why should you care about hardware watchdogs? Well you shouldn't unless you are embedded engineer or if you are making devices and displays for "real world". As you can clearly see from this photo some people should definitely think about it:

Know your enemy :)

Linux and OpenWrt hardware watchdog consists of three parts;

Hardware (usually part of embedded chip or cpu)
Middleware (usually linux kernel driver that communicated directly to hardware watchdog)
Software - tool or system daemon (that enables, disables and configures watchdog settings)

Hardware

Focus of this article is using hardware watchdog on OpenWrt, and most embedded devices that OpenWrt uses have real hardware watchdog in their SoC.

To be honest I haven't researched do our laptops, desktops and servers have hardware watchdog or not.

Middleware

Middle layer between software tools and hardware are Linux kernel drivers, you shouldn't worry about this part, this part is "automagically" handled by OpenWrt. Once hardware watchdog drivers are loaded you will see a new device /dev/watchdog appear.

Software and confusion

Originally OpenWrt used watchdog daemon tool to manage hardware watchdog (there is a very detailed man page explaining what exactly watchdog daemon does).

If for any reason watchdog process stopped 'tickling' hardware watchdog trough /dev/watchdog device the hardware watchdog would trigger a hardware reset.

The confusion started when in 2013 OpenWrt developers removed watchdog daemon and implemented watchdog feature into procd.

The biggest issue was that there was no mechanism to take control back from procd so that you can manually tickle or not tickle hardware watchdog.

This caused a lot of confusion and many unanswered questions regarding how to properly use hardware watchdog in OpenWrt for custom projects.

Some projects like sudomesh created a custom version of procd to take back watchdog control from procd.

Magicclose

In middle of 2017 we finally got a patch (thanks to Hans Dedecker) that implemented magicclose feature that made procd release file lock of /dev/watchdog if watchdog feature in procd was stopped.

This feature finally gave control of hardware watchdog back to users. Freedom!!! :)

How to manually control hardware watchdog?

If you just stop procd from tickling the hardware watchdog you still can't manually tickle watchdog:

root@OpenWrt:~# ubus call system watchdog '{"stop": true}'
{
        "status": "stopped",
        "timeout": 30,
        "frequency": 5,
        "magicclose": false
}
root@OpenWrt:~# echo 1 > /dev/watchdog 
-ash: can't create /dev/watchdog: Resource busy

But if you enable magicclose feature then you can:

root@OpenWrt:~# ubus call system watchdog '{"magicclose": true}'
{
        "status": "running",
        "timeout": 30,
        "frequency": 5,
        "magicclose": true
}

root@OpenWrt:~# ubus call system watchdog '{"stop": true}'
{
        "status": "offline",
        "timeout": 0,
        "frequency": 0,
        "magicclose": true
}
root@OpenWrt:~# echo 1 > /dev/watchdog
root@OpenWrt:~#

Photos:

Blue screen of death by Acid Pix.
Freedom by Victor Freitas