Finding spurious relaxes

Sources of spurious relaxes

In a dual kernel configuration, an application does not want to switch to the regular Linux execution mode unexpectedly, aka secondary mode. When short and bounded response time is a strong requirement, the application wants to run in primary mode exclusively, because only this mode is deemed to deliver real-time capabilities.

There are several reasons for an application to be moved out of real-time mode automatically by the Xenomai co-kernel. This happens when the application is:

  • invoking a regular Linux syscall

  • triggering a processor fault

  • receiving a Linux-originated signal (i.e. not a Xenomai-originated one), possibly sent by the Xenomai kernel upon detecting a consistency issue with the application

At any rate, we want to know where in the application code the signal was received, which will be a strong hint to find out why we switched away from real-time mode.

The reason Xenomai has to move the application out of real-time mode upon such events stems from the way a dual kernel system works. As the Cobalt kernel may preempt the regular Linux kernel at any time in the course of its execution for dealing with a high priority event, including when the latter assumes to be traversing some atomic section, special care has to be taken for re-entering a valid Linux context before a regular kernel service can be delivered.

Manually debugging spurious relaxes

Xenomai provides a mechanism for sending a misbehaving application a regular Linux signal immediately when it switches to secondary mode unexpectedly. When run over GDB, the backtrace of the current thread receiving the signal can be examined. This feature is available with both Xenomai 2 and Xenomai 3.

Each time a spurious migration to secondary mode happens, the thread is sent a SIGDEBUG signal, along with additional information in the siginfo_t data. First of all, the application should trap such signal to a handler, like this:

#include <signal.h>

    struct sigaction sa;
    sigemptyset(&sa.sa_mask);
    sa.sa_sigaction = sigdebug_handler;
    sa.sa_flags = SA_SIGINFO;
    sigaction(SIGDEBUG, &sa, NULL);

The sigdebug_handler() routine would have to retrieve and decode the current backtrace, writing out the cause of the SIGDEBUG notification and hopefully a meaningful call stack:

#include <sys/types.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <execinfo.h>

static const char *sigdebug_msg[] = {
        [SIGDEBUG_UNDEFINED] = "latency: received SIGXCPU for unknown reason",
        [SIGDEBUG_MIGRATE_SIGNAL] = "received signal",
        [SIGDEBUG_MIGRATE_SYSCALL] = "invoked syscall",
        [SIGDEBUG_MIGRATE_FAULT] = "triggered fault",
        [SIGDEBUG_MIGRATE_PRIOINV] = "affected by priority inversion",
        [SIGDEBUG_NOMLOCK] = "Xenomai: process memory not locked "
        "(missing mlockall?)",
        [SIGDEBUG_WATCHDOG] = "Xenomai: watchdog triggered "
        "(period too short?)",
};

void sigdebug_handler(int sig, siginfo_t *si, void *context)
{
        const char fmt[] = "Mode switch (reason: %s), aborting. Backtrace:\n";
        unsigned int cause = sigdebug_reason(si);
        static char buffer[256];
        static void *bt[200];
        unsigned int n;

        if (cause > SIGDEBUG_WATCHDOG)
                cause = SIGDEBUG_UNDEFINED;

        switch(cause) {
        case SIGDEBUG_UNDEFINED:
        case SIGDEBUG_NOMLOCK:
        case SIGDEBUG_WATCHDOG:
                /* These errors are lethal, something went really wrong. */
                n = snprintf(buffer, sizeof(buffer),
                             "%s\n", sigdebug_msg[cause]);
                write(STDERR_FILENO, buffer, n);
                exit(EXIT_FAILURE);
        }

        /* Retrieve the current backtrace, and decode it to stdout. */
        n = snprintf(buffer, sizeof(buffer), fmt, sigdebug_msg[cause]);
        n = write(STDERR_FILENO, buffer, n);
        n = backtrace(bt, sizeof(bt)/sizeof(bt[0]));
        backtrace_symbols_fd(bt, n, STDERR_FILENO);

        signal(sig, SIG_DFL);
        kill(getpid(), sig);
}

Then, you need to enable the warn_upon_switch capability on a per-thread basis. For instance, a POSIX-based application would run this code from the thread to monitor for spurious relaxes:

    /* Ask Xenomai to warn us upon switches to secondary mode. */
    pthread_setmode_np(0, PTHREAD_WARNSW, NULL);

Alternatively, an application based on the native API would run this code instead:

    /* Ask Xenomai to warn us upon switches to secondary mode. */
    rt_task_set_mode(0, T_WARNSW, NULL);

Debugging spurious relaxes with Xenomai 3 and slackspot

Xenomai 3 introduces a new utility called slackspot, for pinpointing code locations which caused transitions to secondary mode.

slackspot combines dedicated kernel support exporting trace information about spurious relaxes through the /proc/xenomai/debug/relax virtual file, and a userland utility called sbin/slackspot, which parses the virtual file output to display the program backtrace, so that we can locate the offending code easily.

The motivation to centralize tracing information about relaxes directly into kernel space is fourfold:

  • it allows to gather all the trace data into a single location and keep it safe there, with no external log file involved.

  • enabling the tracing does not impose any requirement on the application (aside of being compiled with debug symbols for best interpreting that information). We only need a kernel config switch for this (i.e. CONFIG_XENO_OPT_DEBUG_TRACE_RELAX).

  • the data is collected and can be made available exactly the same way regardless of the application emitting the relax requests, or whether it is still alive when the trace data are displayed.

  • the kernel is able to provide accurate and detailed trace information, such as the relative offset of instructions causing relax requests within dynamic shared objects, without having to guess it roughly from /proc/pid/maps, or relying on ldd’s –function-relocs feature, which both require to run on the target system to get the needed information. Instead, we allow a build host to use a cross-compilation toolchain later to extract the source location, from the raw data the kernel has provided on the target system.

A typical debug session with slackspot

Here is a typical debugging session, involving a target board running the Cobalt-based application, and a host system, providing the cross-compilation toolchain used to build that application. We use the netcat utility to pull the trace data over the wire from the target, and process it locally on our debug host:

target> netcat -l -p <port> -c < /proc/xenomai/debug/relax

host> nc <target-ip> <port> | CROSS_COMPILE=ppc_6xx- ./slackspot --path=/opt/rootfs/MPC5200/lib:$HOME/frags/relax --filter thread=Task*

Thread[828] "Task 2" started by /relax:
   #0  0xfff00000 ???
   #1  0x000001bb ???
   #2  0x00064393 _IO_file_doallocate() in ??:?
   #3  0x00073d6f _IO_doallocbuf() in ??:?
   #4  0x00072d87 _IO_file_overflow() in ??:?
   #5  0x00075f83 __overflow() in ??:?
   #6  0x0006997b putchar() in ??:?
   #7  0x100017db task2_func() in /home/rpm/frags/relax/relax.c:23
   #8  0x000078d7 task_entry() in /home/rpm/git/xenomai-forge/lib/alchemy/task.c:235
   #9  0x00005a6b start_thread() in pthread_create.c:?
   #10 0x000d389f __clone() in ??:?

Thread[828] "Task 2" started by /relax (4 times):
   #0  0x000c443f write() in ??:?
   #1  0x00072553 _IO_file_write() in ??:?
   #2  0x000721cf _IO_file_seek() in ??:?
   #3  0x000724c7 _IO_do_write() in ??:?
   #4  0x00072c2f _IO_file_sync() in ??:?
   #5  0x00064a4f _IO_fflush() in ??:?
   #6  0x100017eb task2_func() in /home/rpm/frags/relax/relax.c:24
   #7  0x000078d7 task_entry() in /home/rpm/git/xenomai-forge/lib/alchemy/task.c:235
   #8  0x00005a6b start_thread() in pthread_create.c:?
   #9  0x000d389f __clone() in ??:?

The slackspot output here shows that something wrong is happening at lines 23 and 24 from the example file, as the application calls buffered standard I/O routines from primary mode, which in turn invoke regular kernel services, causing the thread to relax and leave the real-time mode.