Christof.Meerwald@blog.www

Christof Meerwald@blog.www

home
> blog
>> 1033

Sat Aug 30 20:02:21 2025 GMT: Out of Order: Debugging a NetBSD virtio Hang

On a NetBSD 10.1 VPS I noticed that it would occasionally become somewhat unresponsive: it would still respond to pings, but any attempt to connect via SSH would just hang. Strangely, when logging in on the console, it would then continue as nothing had happened.

System logs didn't point to anything obvious. lighttpd showed a "warning: clock jumped 48611 secs" message in its error log, so maybe it could be related to the timecounter? The machine seemed to use hpet as the timecounter, so maybe that wasn't emulated correctly by qemu? Eventually, I set up a background job on the VPS to periodically send the process status to a remote system; and that continued to work even when the VPS became unresponsive, except that most processes hung in tstile.

Now the nice thing about NetBSD is that you can break into a (rather minimal) in-kernel debugger from the console at any time. Using the in-kernel debugger, I managed to figure out that most processes were waiting on a lock held by ioflush, itself waiting for an I/O operation to complete:

db(0)> bt/a ffffdadd0c6dc500
trace: pid 0 lid 195 at 0xffff8981544e0d60
sleepq_block() at netbsd:sleepq_block+0x13a
cv_wait() at netbsd:cv_wait+0xb7
biowait() at netbsd:biowait+0x42
wapbl_buffered_flush() at netbsd:wapbl_buffered_flush+0xa2
wapbl_write_commit() at netbsd:wapbl_write_commit+0x28
wapbl_flush() at netbsd:wapbl_flush+0x552
ffs_sync() at netbsd:ffs_sync+0x176
VFS_SYNC() at netbsd:VFS_SYNC+0x22
sched_sync() at netbsd:sched_sync+0x90
db(0)>

With that information, attention turned to the virtio block driver - maybe it was losing some notification from the host? The protocol is described in Virtual I/O Device (VIRTIO) Version 1.0 and 2.4.9 Virtqueue Notification Suppression seem particularly relevant here. Could there be a race condition when the device asks for notifications again after having suppressed them? Adding some printfs showed that indeed the hangs occurred after the NetBSD driver didn't send a notification for an I/O request, but the device had set the flag to request notifications (and didn't see the I/O request). And, of course, manually triggering the notification from the in-kernel debugger brought it all back to life again.

So how did we get into that state? Surely, the device would first set the "notification" flag, and then check one more time for any pending requests before relying on notification for any further I/O requests. Conversely, the NetBSD virtio driver first updates the "available request index/counter" before checking the "notification" flag.

The relevant part of the code is in virtio_enqueue_commit, essentially doing

    vq->vq_avail->idx = virtio_rw16(sc, vq->vq_avail_idx);
    vq_sync_aring_header(sc, vq, BUS_DMASYNC_PREWRITE);
    // ...
     
    vq_sync_uring_header(sc, vq, BUS_DMASYNC_POSTREAD);
    flags = virtio_rw16(sc, vq->vq_used->flags);

So what do these vq_sync_aring_header and vq_sync_uring_header calls do? On x86-64, they basically map to sfence and lfence to ensure that all stores are completed for the sfence and any loads are completed for the lfence. So it should be all fine, right?

Except that the one thing those fences don't do is to ensure that stores can't be reordered with respect to other loads. So, in fact, the store can be made globally visible after the read on the flags has been performed. And that's exactly the issue here.

At this point it's worth pointing out that the VPS showing this issue is running on an AMD Ryzen 9 9950X 16-Core Processor, and my tests show that these store-load reorderings are a lot more common (maybe by a factor of 100 for these cases) on this CPU than on older AMD CPUs.

What's the fix then? It's relatively straightforward, we need a full memory fence (mfence) here instead of the separate sfence and lfence. The only issue is that NetBSD currently doesn't seem to have the right abstractions to cover that case, see NetBSD Problem Report #59618 for some more details.

Sun Sep 22 17:42:43 2024 GMT: Debugging an early NetBSD kernel panic

Tried to install NetBSD in an AMD64 VM with little success. Although the NetBSD loader worked fine, as soon as it transferred control to the NetBSD kernel, the VM immediately rebooted with no diagnostics appearing on the console that would give a hint of what could be wrong. One of the things I tried was adding a DELAY into the panic function in the kernel - at least the DELAY seemed to have an effect, so the kernel must have paniced, but still no sign of any diagnostics. So either the kernel was using a different console, or the panic was too early and the console wasn't initialised yet.

Looking at the vpanic function, I noticed that it uses a scratchstr to store the panic message, so maybe there is a way to read that string after the reboot? But how do you get access to a physical memory address and which address do you need to look at? Turns out that grub2 has a "dump" command to do exactly that, so just need to figure out the memory address. Ok, scratchstr is a function local static, so doesn't appear in any global symbol table, but there is panicstr which points to that string. And using "nm" on the NetBSD kernel reveals some address "ffffffff8197dfc0". That looks a bit strange as an address, but it seems like the NetBSD loader is using the lower 28 bits as the physical memory address when loading the binary (at least that's what grub does). So all you have to do now is do a "dump 0x197dfc0 8" to get the contents of that pointer and then another dump (again, only using the lower 28 bits) to get the contents of the string - which you then have to decode by converting hex to ASCII.

dump 0x197dfc0 8
e0 df 97 81 ff ff ff ff
dump 0x197dfe0 128
6b 65 72 6e 65 6c 20 64 69 61 67 6e 6f 73 74 69
63 20 61 73 73 65 72 74 69 6f 6e 20 22 6c 70 5f
6d 61 78 20 3e 3d 20 63 6f 72 65 5f 6d 61 78 22
20 66 61 69 6c 65 64 3a 20 66 69 6c 65 20 22 2e
2e 2f 2e 2e 2f 2e 2e 2f 2e 2e 2f 61 72 63 68 2f
78 38 36 2f 78 38 36 2f 63 70 75 5f 74 6f 70 6f
6c 6f 67 79 2e 63 22 2c 20 6c 69 6e 65 20 31 36
33 20 0a 00 00 00 00 00 00 00 00 00 00 00 00 00

There you go, and in this case it was: kernel diagnostic assertion "lp_max >= core_max" failed: file "../../../../arch/x86/x86/cpu_topology.c", line 163

Sun Apr 25 15:55:42 2021 GMT: vidmini - limit webcam resolution

Being fed up with the limited configurability of some of the more widely used software used for video conferencig, I have put together a small LD_PRELOAD-able shared library that limits the available resolutions a webcam reports. Have a look at the README or the source code.

Sun Apr 26 18:45:55 2020 GMT: Tizen on Orange Pi PC

Made some significant progress for running Tizen on an Orange Pi PC (and hopefully any other SBC with a similar Mali GPU). Main issue was that alignments in TBM (Tizen Buffer Manager) weren't in sync with what the actual GPU driver expected. With that fixed, rendering seems to work fine now and I was also able to enable Full HD (1920x1080) resolution.

Next step would be to get the Home Screen app auto started and find some way to get a "Home" button (to be able to switch between apps without completely closing them).

Mon Apr 20 19:02:27 2020 GMT: SIGCHLD si_pid Linux kernel bug

While trying to get Tizen working on my Orange Pi PC, I noticed some strange behaviour in the Linux kernel in that SIGCHLD signals sent to the parent process don't always set the "si_pid" field correctly. I tracked this down to a bug in the Linux kernel for multithreaded process termination, see SIGCHLD signal sometimes sent with si_pid==0 (Linux 5.6.5). Luckily, a patch has already been posted less than 24 hours later.

Sat Feb 29 23:41:48 2020 GMT: Rock Pi S Review

Thu Oct 10 17:48:58 2019 GMT: kmscube Running on Orange Pi PC with Mainline Kernel

Fri Apr 19 19:51:15 2019 GMT: Outdoor Maps in Galaxy Store

Sun Dec 23 10:59:23 2018 GMT: Adding CircleCI builds

Sat Jul 21 09:57:47 2018 GMT: Roundup running on Python 3

Thu Jul 19 06:43:19 2018 GMT: Roundup Updated for wg21.cmeerw.net

Wed May 02 14:00:35 2018 GMT: Perforce Acquires PRQA

This Web page is licensed under the Creative Commons Attribution - NonCommercial - Share Alike License. Any use is subject to the Privacy Policy.

Revision: 1.14, cmeerw.org/blog/1033.html
Last modified: Sat Aug 30 20:02:21 2025

Christof Meerwald <cmeerw@cmeerw.org>
XMPP: cmeerw@cmeerw.org