Distracting adventures in ZFS upgrades

Sep 04 2015 Published by under linux

Last week I wanted to play around with some software packages for logging and charting of environmental measurements and events (specifically, two packages, openhab, and emoncms)
Wanting to save time (sweet irony!), rather than building up a VM and manually configuring the tools, I figured I’d use docker. Except that the workstation I wanted to use was running Debian Squeeze was still on kernel 3.2, which doesn’t support docker. Oh, and a ZfsOnLinux (ZoL) zraid for the root filesystem.
So the steps to get to docker involved upgrading the kernel, ZFS, and by the way, the nvidia drivers.
Mistake #1. I should have just built a Xubuntu 14.04 VM and run docker inside that!

Before upgrading the kernel from Debian backports, I decided to ensure ZfsOnLinux was updated. I (correctly, confirmed) anticipated the most problems with ZoL. Anyway, I knew that upgrading ZoL would be fraught with danger so I read all the documentation, and upgrade advice, and so on, and took all the recommended precautions.

But of course, after going through two cycles of apt-get and dpkg-reconfigure and  rebuilding the initramfs and so on, after rebooting, BAM! A variant of the dreaded “failed to mount the root filesystem” error. Reported close by was a missing kernel module error for something called zcommon.

After a bit of digging and breaking the virtual glass on an emergency boot partition I worked out that I had missed upgrading one of the packages required for ZoL. Why it was not an automatic dependency I don’t know, but after installing something called “libvnpair” the system booted further. And then stopped again.

This one would take rather a bit more work to track down. Semi-helpfully, the entire error message was:

Manually import the root pool at the command prompt and then exit.
Hint: Try: zpool import -R / -N ${ZFS_RPOOL}

At this point, the initramfs was dropping my system to a rescue shell, and via the above message advising me to import the ZFS pool containing the root filesystem. So I tried its helpful suggestion to execute the ‘zpool import’ command, which actually succeeded, and after some more fiddling manually mount various file systems proceeded to boot the system. However, this manual process only got me out of trouble once, and still needed to be resolved.

To get further I had to instrument the initramfs file scripts/zfs with a bunch of echo statements and rebuild the initramfs. (The script files bundled in when rebuilding initramfs on Debian are located under /usr/share/initramfs-tools/scripts) This let me reboot and work out where the zpool import was failing (or not even being called at all.)

As it turns out, zpool was not being called, at all, in a way that would work for my partitioning scheme. The logic in scripts/zfs runs a whole bunch of permutations trying to locate the pool, but if a variable called ROOT is empty it skips executing zpool as required. The solution, as it turns out, was to update my grub with ‘root=zfs:AUTO‘ – previously, my kernel did not require this kernel argument, but now, having upgraded ZoL, from 0.6.2 to 0.6.4, it did.

So, what caused this? There were a lot of year or so old threads discussing upgrade errors related to ZfsOnLinux but none of them quite matched my specific scenario.

One possibility is this:
* I run a separate boot filesystem from the usual /boot, containing a hand crafted grub, which can execute various tools such as Gparted, various minimal linux installs for rescue purposes, memcheckx86 and other tools.
* Whenever I upgrade the kernel on this system I need to copy over the vmlinux and initramfs files to this originating boot filesystem from /boot (which is never used by my grub)
* I wonder if ZoL  may have added the root=zfs:AUTO option to the Debian grub update facility, but I neglected to check for changes to the generated /boot/grub/grub.cfg and apply any changes to  my real grub.cfg. And wham!

However, I couldn’t find any references to zfs in /etc/grub.d, so this hypothesis may well be wrong. Via occams razor, perhaps its just that my setup on this particular workstation is more complex or unusual than most users of ZfsOnLinux. Anyway, onward and upwards.

I’ shortly to decide on which of OpenHAB or EmonCMS I’ll be using for my Hackaday Prize finals entry. Stay tuned!

No responses yet

Experiments with hardening OpenWRT: applying the grsecurity patches

Dec 14 2014 Published by under infosec

A well known set of security enhancements to the Linux kernel is the grsecurity patch.  The grsecurity patch is a (large) patch that applies cleanly against selected supported stock Linux kernel versions. It brings with it PAX, which protects against various well known memory exploits, plus  a number of other hardening features including logging time and mount changes. In particular it enables features such as Non-executable stack (NX) on platforms that do not provide NX in hardware, such as MIPS devices and older x86.
UPDATE Unfortunately, NX protection for MIPS 32-bit devices is not in fact supported in software. This would be very useful. Whilst I was teaching myself I managed to mix things up, so be aware when reading the rest of this blog entry. Otherwise, the usefulness of grsecurity and the mechanism for patching into OpenWRT is still valid.

Note also, a more detailed procedure you can use to rebase the patches is at https://github.com/pastcompute/openwrt-cc-ar71xx-hardened/wiki .

OpenWRT hardening

OpenWRT is a widely adopted embedded / router Linux distribution. It would benefit greatly from including grsecurity, in particular given most MIPS platforms do not support NX protection in hardware. However for a long time the differences between the OpenWRT kernel and the kernel revisions that grsecurity is supported on have been significant and would likely have taken an extreme effort to get working, let alone get working securely.

This is a shame, because there is malware targeted at consumer embedded routers, and it must only be a matter of time before OpenWRT is targeted.  OpenWRT is widely regarded as relatively secure compared to many consumer devices, at least if configured properly,  but eventually some bug will allow a remote binary to be dropped. It would be helpful if the system can be hardened and stay one step ahead of things.

The OpenWRT development trunk (destined to become the next release, ‘Chaos Calmer’ in due course) has recently migrated most devices to the 3.14 kernel tree.  Serendipidously this aligns with the long term supported grsecurity revision 3.14.  When I noticed this I figured I’d take a look at whether it was feasible to deploy grsecurity with OpenWRT.

Applying grsecurity – patch

In late November I pulled the latest OpenWRT sources and the kernel version was 3.14.25, which I noticed matched the current grsecurity stable branch 3.14.25

The grsecurity patch applies cleanly against a stock kernel, and OpenWRT starts with a stock kernel and then applies a series of patches designed to extend hardware support to many obscure embedded things not present in the mainline kernel, along with patches that reduce the memory footprint. Some of the general patches are pushed upstream but may not yet have been accepted, and some could be backports from later kernels.  Examples of generic patches  include a simplified crash report.

Anyway, I had two choices, and tried them both: apply grsecurity, then the OpenWRT patches; or start with the OpenWRT patched kernel.  In both cases there were a number of rejects, but there seemed to be less when I applied grsecurity last. I also decided this would be easier for me to support for myself going forward, a decision later validated successfully.

OpenWRT kernel patches are stored in two locations; generic patches applying against any platform, then platform specific patches.  My work is tested against the Carambola2, an embedded MIPS board supported by the ‘ar71xx’ platform in OpenWRT, so for my case, there were ar71xx patches.

To make life easy I wrote a script that would take a directory of OpenWRT kernel patches, apply to a git kernel repository and auto-commit. This allowed me to use gitg and git difftool to examine things efficiently.  It also worked well with using an external kernel tree to OpenWRT so I didnt have to worry yet about integrating patches into OpenWRT. This script is on github, it should be easily adaptable for other experiments.

(Note: to use an external tree, managed by git, use config options like the following:

There were four primary rejects that required fixing.  This involved inspecting each case and working out what OpenWRT had changed in the way. Generally, this was caused because one or the other had modified the end of the same structure or macro, but luckily it turned out nothing significant and I was able to easily reconcile things. The hardest was because OpenWRT modifies vmstat.c for MIPS and the same code was modified by grsecurity to add extra memory protections.  At this point I attempted to build the system, and discovered three other minor cases that broke the build. These mispatches essentially were due to movements in one or two lines, or new code using internal kernel API modified by grsecurity, and were also easily repaired.  The most difficult mispatch to understand was where OpenWRT rewrites the kernel module loader code, apparently to make better use of MIPS memory structures and it took me a little while to understand how to try and fix things.

The end result is on github at https://github.com/pastcompute/openwrt-cc-linux-3.14.x-grsecurity

Applying grsecurity – OpenWRT quirks

One strange bug that had to be worked around was some new dependency in the kernel build process, where extra tools that grsecurity adds were not being built in the correct order with other kernel prerequisites.

In the end I had to patch how OpenWRT builds the kernel to perform an extra ‘make olddefconfig‘ to sort things out.

I also had to run ‘make kernel_menuconfig‘ and turn on grsecurity.

As the system built, I eventually hit another problem area: building packages. This was a bit of an ‘OH-NO’ moment as I thought it had the potential to become a big rabbit hole. Luckily as it turned out, only one package was affected in the end: compat-wireless.  This package builds some extra user space tools and wifi drivers, and used a macro, ACCESS_ONCE, that was changed by grsecurity to be more secure; and required use of a new macro to make everything work again, ACCESS_ONE_RW. There were rather a number of calls to this macro, but luckily it turned out to be fixable using sed!

Booting OpenWRT with grsecurity – modules not loading

I was able to then complete an INITRAMFS image that I TFTP’d into my carambola2 via uboot.

Amazingly the system booted and provided me with a prompt.

I then discovered that no kernel modules were loading. A bit of digging and it turns out that a grsecurity option, CONFIG_GRKERNSEC_RANDSTRUCT  will auto-enable CONFIG_MODVERSIONS. One thing I learned at this point is that OpenWRT does not support CONFIG_MODVERSIONS=y, due to the way it packages modules with its packaging system. So an iteration later with the setting disabled, and everything appeared to be “working”

Testing OpenWRT with grsecurity

Of course, all this work is moot if we cant prove it works.

Easy to check is auditing. For example, we now had these messages:

However, the acid test would be enforcement of the NX flag. Here I used the code from http://wiki.gentoo.org/wiki/Hardened/PaX_Quickstart to test incorrect memory protections. Result:


Revisiting Checksec, and tweaking PAX

In an earlier blog I wrote about experimenting with checksec.  Here I used it to double-check that the binaries were built with NX protection. MOst were, due to a patch I previously submitted to OpenWRT for MIPS. However, openssl was missing NX. It turns out that OpenSSL amongst everything else it has been discussed for this year, uses assembler in parts of the encryption code! I was able to fix this by adding the relevant linker ‘.note.GNU-stack‘ directive.

The PAX component can be tweaked using the paxctl command, so I had to build that with the OpenWRT toolchain to try it out. I discovered that it doesnt work for files on the JFFS2 partition, only in the ramdisk. Further to enable soft mode, you need to add a kernel boot command line argument. To do this for OpenWRT, edit a file called target/linux/$KERNEL_PLATFORM/generic/config-default where in my case, $KERNEL_PLATFORM is ar71xx

Moving Targets

Right in the middle of all this, OpenWRT bumped the kernel to 3.14.26. So I had to exercise a workflow in keeping the patch current.  As it happened the grsecuroty patch was also updated to 3.14.26 so I presume this made life easier.

After downloading the stock kernel and pulling the latest OpenWRT, I again re-created the patch series, then applied grsecurity 3.14.26.  The same four rejects were present again, so fingers crossed I cherry-picked all my work from 3.14.25 onto 3.14.26. As luck would have it this was one smooth rebase!

Recap of OpenWRT grsecurity caveats

  • CONFIG_GRKERNSEC_RANDSTRUCT is not compatible with the OpenWRT build system; using it will prevent modules loading
  • Some packages may need to be modified to support NX – generally, if these use assembly language and don’t use the proper linker directive.
  • For some reason paxctl only seems to work on files in /tmp not in the JFFS overlay. This is probably only a problem when debugging
  • Your experience with the debugger gdb will probably be sub-optimal unless you put the debug target on /tmp and use paxctl to mark it with exceptions


After concluding the above, I converted the change set from my local Linux working copy into a set of additional patches on OpenWRT and rebuilt everything to double check.

The branch ‘ar71xx-3.14.26-grsecurity’ in https://github.com/pastcompute/openwrt-cc-ar71xx-hardened has all the work, along with some extra minor fixes I made to some other packages related to checksec scan results.

THIS MAY EXPLODE YOUR COMPUTER AND GET YOU POWNED! This has been working for me on one device with minimal testing and is just a proof of concept.

No responses yet

Fixing my annoying kernel bug(s) – Part 2

May 01 2012 Published by under linux

This blog entry details some of the problem outlined in these posts.

This is a lengthy technical post:
There is a detailed manual for building a Debian kernel at http://users.wowway.com/~zlinuxman/Kernel.htm. I was familiar with much of the content already but it was still a very helpful reference; for example, using the ‘src’ group to avoid root was a useful thing to learn.

The information on patching the Kernel for Phenom was cobbled together from various websites including the Gentoo forums.

My system is currently built from Debian squeeze with a bunch of packages from various other repositories including the Debian backports and packages I manually backported from Wheezy (testing).

I could have applied the necessary patch to this kernel but I decided at the same time to have another go at getting to the latest 3-series kernel.

For a long time I was stuck on a 2.6.39 kernel as I wasn’t able to successfully simply build a later kernel package from the Debian sources that were in testing. I could have tried to build from the kernel.org sources but I have tried where possible to maintain my system using .deb packages as far as possible. In the intervening year however it seems 3.2 has been released in backports, so that saved me a lot of potential problems.

So I upgraded my kernel and applied the patch. Here is Yet Another Tutorial on building a kernel the ‘Debian Way’. This will yield a DPKG file that you can install without clobbering any other kernels.


  • Install various pre-requisites – this will vary depending on your system.
    A fresh system will need many others, I needed these for LZ compression and for ‘make xconfig’
  • You need to have the Debian backports in your APT sources.list file.

    Having added this, do a sudo apt-get update.
  • The ideal method these days is to be able to do most of the work without dropping to root or using sudo. To achieve this add your account to the ‘src’ group and setup permissions accordingly.

    At this point you will need to log out and log in (although you could ssh back in as yourself, and I read somewhere recently that this may not be strictly necessary with the right ‘magic’ incantations any more…)
  • I like to experiment with virtualisation so along the way I downloaded a patch that may be necessary for this from http://users.wowway.com/~zlinuxman/kernel-package/linuxv3.diff (This is also an attachment to this post)
    Apply like:
  • I also made my own patches to do an optimized build for my AMD64 Phenom:
    File phenom_1.patch:

    File phenom_2.patch:

    File phenom_3.patch:
  • And of course, the patch to fix my Firewire subsystem crash as described in the previous post:


  1. Install the Debian source package and unpack the tree:
  2. Note – it turns out that this is in fact a 3.2.9 kernel. For some reason the Debian version is 3.2.4-1~bpo60+1 ; go figure…

  3. Apply patches: assumes the patch files are in /usr/src :
  4. Configure the kernel build:
    I started by copying the default config from the binary backports kernel, and tweaking it for my own purposes (not shown here)
  5. Finally, build the Debian packages.

    Here, CONCURRENCY_LEVEL=4 sets the number of concurrent make processes used, for taking advantage of a multi-core system.
    From the above settings, the actual package will become ‘linux-image-3.2.9-xxx-preempt-amd64’ with a Debian version of ‘1~yyy.00.00’ and cat /proc/version output of ‘3.2.9-xxx-preempt-amd64’
    Using this mechanism means you can have concurrent ‘flavours’ of a kernel installed but still being upgradable within that flavour.
  6. Installation:

    This should also trigger any DKMS modules to rebuild if present.
    My NVidia 280.13 driver rebuilds fine with this version.


Of course the proof is in the pudding.

After rebooting, I repeated the sequence necessary to trigger the fault: and it did not recur. Woot!


No responses yet

Fixing my annoying kernel bug(s) – Part 1

May 01 2012 Published by under linux

This blog entry details some of the problem outlined in this post.

Regularly enough to almost be annoying, I was having a kernel fault popup (see stack trace following this blog.) This was not quite annoying enough to do something about for a long time because the computer wasn’t crashed and there were no obvious side effects. Eventually however I realised that each time it occurred a new instance of my external backup drive was being mounted automagically, so being a little cautious about potential data loss decided to try and get to the bottom of things.

After a few days taking notes and some experimentation I discovered the following:

  • It would happen with regularity after waking the computer up from suspend to RAM.
  • I could force it to happen by ‘ejecting’ the external backup drive.

After initially suspecting it was something to do with firewire or ACPI (shudder) looking at the stack trace, and the coincidence with removing the drive, it seemed in fact to be an issue in the SCSI subsystem somewhere. In fact I then worked out the following commands would always repeat the problem:

At this stage I stumbled over an almost identical stack trace inthe lkml.org mailing list, which luckily short-circuited my experimentation – learning about the /sys and scsi device manipulation is kind of useful maybe but I had a lot of other things to do as well.

The patch for the problem is documented at https://lkml.org/lkml/2012/2/8/246.

The next stage, was how to apply it to my system? This is described in the next blog post.


The offending stack trace:

Mar 16 22:44:47 atlantis3 kernel: [ 2020.140704] sd 15:0:0:0: [sdh] Stopping disk
Mar 16 22:44:48 atlantis3 kernel: [ 2021.495923] firewire_sbp2: released fw1.0, target 15:0:0
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849206] ------------[ cut here ]------------
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849232] WARNING: at /build/buildd-linux-2.6_3.2.4-1~bpo60+1-amd64-Ns0wYl/linux-2.6-3.2.4/debian/build/source_amd64_none/fs/sysfs/inode.
c:323 sysfs_hash_and_remove+0x30/0x8b()
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849242] Hardware name: To be filled by O.E.M.
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849248] sysfs: can not remove 'bsg', no directory
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849253] Modules linked in: nls_utf8 nls_cp437 vfat fat rfcomm bridge stp bnep speedstep_lib cpufreq_powersave cpufreq_userspace powerno
w_k8 ppdev cpufreq_stats lp mperf cpufreq_conservative nfsd lockd nfs_acl auth_rpcgss sunrpc kvm_amd kvm binfmt_misc ext3 jbd fuse ext2 it87 hwmon_vid loop btusb joydev bluetoo
th rfkill usbhid hid snd_usb_audio snd_usbmidi_lib cx22702 cx88_dvb cx88_vp3054_i2c videobuf_dvb dvb_core rc_winfast tuner_simple tuner_types tda9887 ir_lirc_codec lirc_dev ir_
mce_kbd_decoder tda8290 snd_hda_codec_realtek firewire_sbp2 ir_sony_decoder snd_hda_intel snd_hda_codec ir_jvc_decoder ir_rc6_decoder snd_hwdep tuner ir_rc5_decoder cx8800 cx88
_alsa ir_nec_decoder snd_pcm_oss snd_mixer_oss cx8802 cx88xx rc_core i2c_algo_bit tveeprom snd_pcm gspca_ov519 gspca_main v4l2_common snd_seq_midi videodev snd_rawmidi snd_seq_
midi_event media snd_seq usblp v4l2_compat_ioctl32 videobuf_dma_sg snd_timer snd_seq_device videobuf_core btcx_risc sp5100_tco k10temp edac_core parpo
Mar 16 22:44:51 atlantis3 kernel: rt_pc parport snd i2c_piix4 tpm_tis tpm edac_mce_amd i2c_core tpm_bios soundcore processor evdev pcspkr thermal_sys mxm_wmi wmi snd_page_alloc
 button ext4 mbcache jbd2 crc16 dm_mod nbd btrfs zlib_deflate crc32c libcrc32c usb_storage uas sg sr_mod cdrom sd_mod crc_t10dif ata_generic ohci_hcd ehci_hcd firewire_ohci fir
ewire_core crc_itu_t pata_jmicron ahci libahci libata xhci_hcd r8169 mii scsi_mod usbcore usb_common [last unloaded: scsi_wait_scan]
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849487] Pid: 9293, comm: bash Not tainted 3.2.0-0.bpo.1-amd64 #1
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849493] Call Trace:
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849507]  [] ? warn_slowpath_common+0x78/0x8c
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849517]  [] ? warn_slowpath_fmt+0x45/0x4a
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849527]  [] ? sysfs_hash_and_remove+0x30/0x8b
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849538]  [] ? kobject_get+0x12/0x17
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849547]  [] ? mutex_lock+0xd/0x2c
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849555]  [] ? bsg_unregister_queue+0x3f/0x78
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849587]  [] ? __scsi_remove_device+0x34/0xb7 [scsi_mod]
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849613]  [] ? scsi_remove_device+0x20/0x2b [scsi_mod]
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849627]  [] ? sbp2_remove+0x77/0x138 [firewire_sbp2]
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849639]  [] ? __device_release_driver+0x7f/0xca
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849648]  [] ? device_release_driver+0x1d/0x28
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849665]  [] ? driver_unbind+0x56/0x8b
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849674]  [] ? sysfs_write_file+0xe0/0x11c
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849682]  [] ? vfs_write+0xa4/0xff
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849690]  [] ? sys_write+0x45/0x6e
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849699]  [] ? system_call_fastpath+0x16/0x1b
Mar 16 22:44:51 atlantis3 kernel: [ 2023.849706] ---[ end trace d8b356d84e0828d4 ]---
Mar 16 22:44:51 atlantis3 kernel: [ 2024.653272] firewire_sbp2: released fw1.1, target 16:0:0
Mar 16 22:46:06 atlantis3 kerneloops: Submitted 1 kernel oopses to www.kerneloops.org

No responses yet

Patching and Building a custom Linux Kernel in Debian

Apr 10 2012 Published by under linux

These posts cover a topic which seems to be documented to varying degrees across the net, but nothing quite exactly matched what I wanted to do. In the end this is a result of multiple sources of information and inspiration (and perspiration…)

For some time I had been getting a Kernel fault report popup with irritating regularity. In the end I isolated it to something going wrong with my external Firewire drive after my computer was resuming from suspend (specifically Suspend to RAM.)
In the end chasing this down required working through the following tasks:

  1. Disabling the proprietary NVidia driver and activating ‘nv’ ( I was unable to successfully configure nouveau to work with my particular dual head configuration), so that my kernel was no longer ‘TAINTED’, which would have led me into a brick wall if I had been required to report a kernel bug.
  2. Consistently replicating the fault, which included learning about a bunch of stuff in the Linux /sys filesystem.
  3. Finally getting a 3-series kernel to work on Debian Squeeze – it turns out by now 3.2 has been packaged into Debian backports, which gets me past an earlier roadblock with kernel upgraded. Upgrading to the latest kernel would eliminate if the problem had been resolved (which is was not at least of 3.2.9)
  4. Rebuilding the kernel from source – (something I have done this many times before, but it doesn’t hurt to recap) and applying the patches needed
  5. Re-enabling NVidia – which involved verifying my DKMS setup was still working.

I haven’t blogged recently due to various family mini-crises to do with pets, sickness and other issues, as well as extra busyness at work.

As it is getting late this post will conclude with the command line used to build and install my kernel, and I will expand on this in the next post.

Things to note:

  • The above will build a kernel using the same configuration as an installed Debian backports 3.2 kernel, assuming the backports kernel an source packages have been installed. There are no changes or patches yet
  • Your user must be in the ‘src’ group for the make-kpkg command to work as-is.
  • The 3.2 kernel in backports (as of March 2012) was in fact version 3.2.9 although this is not indicated in the Debian version for some reason.

No responses yet

Older posts »