Building MongoDB for 32-bit ARM on Debian/Ubuntu

Pre-built MongoDB 3.2 packages for 32-bit ARM systems with hardware floating-point support (e.g. ODROID-XU3/XU4/C1, Raspberry Pi 1/2, BeagleBoard, etc.) running Ubuntu 18.04 or compatible Debian-based distributions are available here.

Recently, I upgraded my ODROID-XU4, an eight-core embedded ARM-based platform with USB 3.0, Gigabit Ethernet, and 2GB RAM, to Ubuntu 18.04. But after doing so, I discovered that MongoDB is no longer available in the repository, as upstream seemed to have stopped supporting it. This is problematic, because the UniFi Controller software, which manages my Ubiquiti UniFi access points, is installed on this system and it requires MongoDB. As a result, I decided to manually build the latest compatible version of MongoDB for this system, since I have no desire to purchase hardware or migrate to a different platform. It turned out that this was quite a rabbit hole, and I ended up encountering a number of bugs in various other pieces of software.

Background: MongoDB

It turns out that two key decisions made by the MongoDB developers have led to this situation. First, as of version 3.2, MongoDB has deprecated all 32-bit builds, and as of version 3.4, has effectively dropped support for all 32-bit platforms. (Note that 32-bit ARM was never an officially supported platform). Secondly, as of version 3.3.11, MongoDB has officially added support for 64-bit ARM systems.

Fundamentally, these changes are due to limitations of the underlying storage engines that MongoDB uses. Before version 3.2, MongoDB defaulted to the MMAPv1 storage engine, which essentially relies on memory-mapped files. Following an acquisition, MongoDB added incorporated the WiredTiger storage engine, which became the new default in version 3.2. Among other improvements, WiredTiger adds support for B-tree-based indexes, finer-grained locking, database compression, and bypasses the 2GB database storage limit for 32-bit builds by moving to 64-bit builds.

Debian Packaging

Since I’m building for a Debian-based system, I’d like to reuse the existing package management infrastructure, and install a Debian package. As detailed in the Debian packaging documentation, this requires the source to be stored in a special structure, with package metadata in a debian/control file, changes in a debian/changelog file, a build script in the debian/rules file, etc. Fortunately, a MongoDB package for Debian already exists, so I was able to reuse the existing Debian package source for MongoDB, and directly port in the upstream changes from MongoDB 3.2.20 into this repository. Afterwards, I could simply use the dpkg-buildpackage tool to compile the source code and build the package. But, I’m not interested in producing a source package (-b), or signing my changes (-uc), so all I needed to do was invoke the tool with the appropriate arguments:

odroid:~/mongodb$ dpkg-buildpackage -b -uc

Problem #1: Mozilla SpiderMonkey

/usr/include/stdc-predef.h:40:1: fatal error: js-config.h: No such file or directory

Unfortunately, at some point during the compilation process, I encountered an error about a certain missing file. After doing some digging, I discovered that the MongoDB source includes a stripped-down copy of the Mozilla SpiderMonkey JavaScript engine, under the src/third_party/mozjs-38 directory. Since armhf isn’t a supported architecture, the MongoDB source doesn’t provide an appropriate configuration file for SpiderMonkey. This happens to be a known issue; the MongoDB developers helpfully provide a get_sources.sh script to automatically download the complete SpiderMonkey source and generate the appropriate SpiderMonkey configuration for the current build target.

Problem #2: Disk Space

No space left on device

Compilation of MongoDB gets a little farther, but at some point the disk runs out of space. Unfortunately, the compilation process needs more space than expected (eventually >16GB), and since this is an embedded ARM platform with only a 32GB eMMC flash storage, there isn’t enough left. I end up moving the build onto another machine with more disk space, mounting it remotely over NFS, and reconfiguring the build to save space by eliminating debug symbols (DEB_BUILD_OPTIONS=nodbgsym). Since I’d like reuse the previous partial build, I also need to specify that the build directory should not be cleaned (-nc).

odroid:/mnt/data/sources/mongodb$ DEB_BUILD_OPTIONS=nodbgsym dpkg-buildpackage -b -uc -nc

Problem #3: GNU gold and Vectored I/O

/usr/bin/arm-linux-gnueabihf-ld.gold: fatal error: build/opt/mongo/db/fts/fts_element_iterator.o: file too short: read only 18202 of 26977 bytes at 137446

The disk now has plenty of free space, but at some point during the build, the linker still fails with an I/O error. This seems very odd, perhaps some files were truncated when the disk ran out of space, so I clean the build and restart. But, this error doesn’t go away, and keeps reoccurring with different files at different offsets. Clearly, something else is going on.

The error is being produced by GNU gold, a newer linker that is faster than the GNU ld linker. After searching through the source code, I discover that it is being generated by the function File_read::do_readv within gold/fileread.cc (reproduced below), when the number of bytes returned by readv is smaller than expected.

void File_read::do_readv(off_t base, const Read_multiple& rm, size_t start, size_t count) {
  ...
  ssize_t got = ::readv(this->descriptor_, iov, iov_index);
  ...
  if (got != want)
    gold_fatal(_("%s: file too short: read only %zd of %zd bytes at %lld"), this->filename().c_str(),
}

Compared to the traditional read function, the readv function supports vectored I/O, which permits concurrent I/O over multiple buffers, reducing system call overhead. But much like read, readv is not guaranteed to fill the entire buffer, and can perform a partial read. GNU gold doesn’t seem to handle this behavior correctly, which appears to be a previously-undiscovered bug, now reported to the GNU Bugzilla.

But why does this occur? Conveniently, a warning happened to be printed in the kernel log, as shown below. It turns out that versions of the Linux kernel between 4.13 and 4.14 contain a bug where the inlined page_copy_sane function erroneously prints this warning for compound pages. But, my current kernel is 4.14.47-132, which already includes this bugfix, so this appears to be expected behavior.

WARNING: CPU: 6 PID: 937 at lib/iov_iter.c:695 copy_page_to_iter+0x2c4/0x4c4
Modules linked in: rpcsec_gss_krb5 lzo lzo_compress zram binfmt_misc spidev evdev spi_s3c64xx uio_pdrv_genirq uio exynos_gpiomem gpio_keys sch_fq_codel nfsd sit tunnel4 ip_tunnel ip_tables ipv6 extcon_usb_gpio
CPU: 6 PID: 937 Comm: arm-linux-gnuea Tainted: G        W       4.14.47-132 #1
Hardware name: ODROID-XU4
[<c0110b14>] (unwind_backtrace) from [<c010ce98>] (show_stack+0x10/0x14)
[<c010ce98>] (show_stack) from [<c08a8854>] (dump_stack+0x84/0x98)
[<c08a8854>] (dump_stack) from [<c01242cc>] (__warn+0xec/0x104)
[<c01242cc>] (__warn) from [<c0124394>] (warn_slowpath_null+0x20/0x28)
[<c0124394>] (warn_slowpath_null) from [<c046ae88>] (copy_page_to_iter+0x2c4/0x4c4)
[<c046ae88>] (copy_page_to_iter) from [<c021ebc8>] (generic_file_read_iter+0x2ac/0x8e8)
[<c021ebc8>] (generic_file_read_iter) from [<c038fb80>] (nfs_file_read+0x60/0x98)
[<c038fb80>] (nfs_file_read) from [<c0283324>] (do_iter_readv_writev+0x104/0x158)
[<c0283324>] (do_iter_readv_writev) from [<c02846a0>] (do_iter_read+0xe0/0x1a0)
[<c02846a0>] (do_iter_read) from [<c0285d58>] (vfs_readv+0x50/0x68)
[<c0285d58>] (vfs_readv) from [<c0285dc4>] (do_readv+0x54/0xdc)
[<c0285dc4>] (do_readv) from [<c010893c>] (__sys_trace_return+0x0/0x10)
---[ end trace 31bb0ea7bbe390aa ]---

Since I’m short on time, and memory paging behavior of the Linux kernel is quite complex, I worked around this bug by switching back to GNU ld, which does not perform vectored I/O. Another option would be LLVM lld, which can be even faster than GNU gold. This involves modifying the SConstruct file used by the SCons build system for MongoDB to set the linker.

    # This tells clang/gcc to use the gold linker if it is available - we prefer the gold linker
    # because it is much faster.
    if myenv.ToolchainIs('gcc', 'clang'):
#        AddToLINKFLAGSIfSupported(myenv, '-fuse-ld=gold')

Problem #4: RAM and the OOM Killer

Linking build/opt/mongo/mongoperf
collect2: fatal error: ld terminated with signal 9 [Killed]

Compilation finishes successfully, but something appears to be killing the linker process. Examining the kernel log shows that the kernel out-of-memory killer has been triggered.

Out of memory: Kill process 5733 (arm-linux-gnuea) score 374 or sacrifice child
Killed process 5733 (arm-linux-gnuea) total-vm:1173892kB, anon-rss:595904kB, file-rss:4kB, shmem-rss:0kB
oom_reaper: reaped process 5733 (arm-linux-gnuea), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Since dpkg-buildpackage will run as many concurrent compilation jobs as the number of available processors, the system is simply running out of memory. The ODROID-XU4 platform only has 2GB LPDDR3 of physical memory, and isn’t configured with disk-backed swap to avoid premature wear of the flash storage. One simple workaround is to enable the zram module, which trades performance for physical memory by using compressed physical memory to provide swap, at approximately a 1:2 ratio. On Debian-based systems, zram can be automatically configured and enabled at system startup by installing the appropriate package.

odroid:/mnt/data/sources/mongodb$ sudo apt-get install zram-config

Another workaround is to limit the total number of concurrent compilation jobs, by passing the appropriate parameter (-j) to dpkg-buildpackage. Additionally, switching to a more efficient linker such as GNU gold or LLVM lld could further lower linker memory usage, but due to problem #3, I would need to rebuild GNU gold without vectored I/O, or switch the entire compilation toolchain from GCC to LLVM, which could result in other problems.

odroid:/mnt/data/sources/mongodb$ DEB_BUILD_OPTIONS=nodbgsym dpkg-buildpackage -b -uc -nc -j2

Problem #5: Floating-Point Alignment on ARM

[db_test:jsobj] 2018-06-09T10:12:54.633-0400 2018-06-09T10:12:54.628-0400 I -        [testsuite]         going to run test: JsobjTests::BSONObjTests::ToStringNumber
[db_test:jsobj] 2018-06-09T10:12:54.650-0400 2018-06-09T10:12:54.649-0400 F -        [testsuite] Invalid access at address: 0x229e48e
[db_test:jsobj] 2018-06-09T10:12:54.664-0400 2018-06-09T10:12:54.664-0400 F -        [testsuite] Got signal: 7 (Bus error).
...
[db_test] 2018-06-09T10:12:54.760-0400 DBTest jsobj failed, so stopping...
[executor:db_test:job0] 2018-06-09T10:12:54.760-0400 Received a StopExecution exception: DBTest jsobj failed

At this point, compilation and linking finishes successfully, but something in the testsuite is crashing at runtime. Again, the kernel log contains useful information, and shows that an alignment exception has occurred.

Alignment trap: not handling instruction ed988b00 at [<01172c38>]
Unhandled fault: alignment exception (0x011) at 0x0229e48e
pgd = ec7f8000
[0229e48e] *pgd=489fa835, *pte=54da975f, *ppte=54da9c7f

Disassembling the faulting instruction 0xed988b00 produces the following assembly output. Note that this 32-bit word corresponds to two 16-bit instructions, indicating that the compressed 16-bit Thumb instruction set is being used.

ldr r0, [sp, #948] ; 0x3b4
lsls r3, r1, #2

The faulting instruction is a stack-pointer-relative load instruction, and subtracting the offset 0x3b4 from the pointer 0x0229e48e reveals that the stack-pointer register contains the value 0x229e0da. This address is not 4-byte aligned, which violates the alignment requirements for the ldr instruction. Compared to x86, the ARM instruction set typically has stricter alignment requirements for data accesses. Although the Linux kernel supports fixing unaligned accesses for user-mode code upon encountering an alignment exception, this is not recommended for performance reason, and is limited to a small subset of the ARM instruction set. Further extensions to the ARM instruction set, such as Vector Floating-Point (VFP) and Advanced SIMD (NEON), can impose additional alignment requirements, making it important to fix these unaligned accesses within the application.

Note that the ODROID-XU4 has a hardware floating-point unit that implements the VFP extension, so I’m running the armhf port of Ubuntu, which executes floating-point operations in hardware (hardfp), instead of via software emulation (softfp). In fact, Ubuntu only supports two different ARM ports, which are based on their upstream Debian counterparts: armhf (32-bit little-endian ARM with hardware floating-point), and arm64 (64-bit little-endian ARM with hardware floating-point). In comparison, Debian also supports armel (32-bit litte-endian ARM), which does not require hardware floating-point support.

In the C programming language, one common cause for alignment exceptions is failing to respect alignment requirements when casting to/from a pointer to a floating-point datatype, because the underlying memory may not be properly aligned. This is usually fixed by wrapping the floating-point type in a packed struct or union, which ensures the compiler inserts appropriate padding. Although a similar bug did exist in older versions of MongoDB, it was fixed in MongoDB 2.2.0, so this isn’t the cause of the current alignment exception.

Another potential cause is incorrect handling of data endianness. Although the ARM architecture itself is bi-endian, which means it can support either big-endian or little-endian memory addressing, in practice most ARM-based Linux systems use little-endian addressing. However, many networking protocols use big-endian addressing, so byte swapping needs to be performed at the network interface. It turns out that this is the root cause of the current alignment exception, and can be resolved by backporting the fix from MongoDB 3.3.3.

Problem #6: TCMalloc and libunwind

[cpp_unit_test:check_quorum_for_config_change_test] 2018-06-09T18:23:16.894-0400 Starting Program build/opt/mongo/db/repl/check_quorum_for_config_change_test... build/opt/mongo/db/repl/check_quorum_for_config_change_test
[cpp_unit_test:check_quorum_for_config_change_test] 2018-06-09T18:23:17.095-0400 Program build/opt/mongo/db/repl/check_quorum_for_config_change_test started with pid 28629.
[cpp_unit_test:check_quorum_for_config_change_test] 2018-06-09T18:23:17.751-0400 2018-06-09T18:23:17.750-0400 F -        [main] Invalid access at address: 0xaaaaaaab
[cpp_unit_test:check_quorum_for_config_change_test] 2018-06-09T18:23:17.752-0400 2018-06-09T18:23:17.752-0400 F -        [main] Got signal: 11 (Segmentation fault)

Even more of the testsuite executes successfully now, but another testcase is still crashing. This time, there’s no kernel log message, but that’s only because the ODROID-XU4 was not built with the appropriate configuration flag. Instead, enable generation of core dumps for the current shell session, and reexecute the failing test.

odroid:/mnt/data/sources/mongodb$ ulimit -c unlimited

This time, a core file is generated, which can be fed into GNU gdb to inspect the backtrace.

odroid:/mnt/data/sources/mongo$ gdb build/opt/mongo/db/repl/check_quorum_for_config_change_test core
Core was generated by `build/opt/mongo/db/repl/check_quorum_for_config_change_test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xb68b2ce6 in _ULarm_step () from /usr/lib/arm-linux-gnueabihf/libunwind.so.8
(gdb) bt
#0  0xb68b2ce6 in _ULarm_step () from /usr/lib/arm-linux-gnueabihf/libunwind.so.8
#1  0xb6f455e8 in ?? () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#2  0xb6f45a3e in GetStackTrace(void**, int, int) () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#3  0xb6f3a36a in tcmalloc::PageHeap::GrowHeap(unsigned int) () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#4  0xb6f3a5e6 in tcmalloc::PageHeap::New(unsigned int) () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#5  0xb6f395ce in tcmalloc::CentralFreeList::Populate() () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#6  0xb6f39760 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**) ()
   from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#7  0xb6f397de in tcmalloc::CentralFreeList::RemoveRange(void**, void**, int) () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#8  0xb6f3ba56 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int) ()
   from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#9  0xb6f47856 in tc_realloc () from /usr/lib/arm-linux-gnueabihf/libtcmalloc.so.4
#10 0xb6d1b5e6 in OPENSSL_LH_insert () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
#11 0xb6d22be0 in OBJ_NAME_add () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
#12 0xb6d0f15c in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

A bit of digging reveals that MongoDB uses TCMalloc, an optimized concurrent memory allocator that is faster than the standard GNU C Library (glibc) memory allocator. When new allocations trigger growth of the free page heap, the TCMalloc implementation records the depth of the stack by using libunwind to walk the call stack. This is shown by the call to GetStackTrace within PageHeap::GrowHeap, which ends up calling the underlying implementation through the GET_STACK_TRACE_OR_FRAMES macro.

It turns out that libunwind only supports three different unwind mechanisms on ARM, as follows:

  1. UNW_ARM_METHOD_DWARF, which relies on DWARF debugging information, and requires compilation with the -g flag.
  2. UNW_ARM_METHOD_EXIDX, which relies on ARM-specific unwind tables, and requires compilation with the -funwind-tables flag.
  3. UNW_ARM_METHOD_FRAME, which is for the deprecated ARM Procedure Call Standard (APCS).

Unfortunately, most ELF binaries are not built with either DWARF debugging information or ARM unwind tables, and attempting to unwind the modern ARM Architecture Procedure Call Standard (AAPCS) against the old deprecated standard is broken behavior, resulting in the current segmentation fault. As a workaround, TCMalloc supports a libgcc-based unwinder, and an emergency malloc feature, which require rebuilding the --enable-stacktrace-via-backtrace and --enable-emergency-malloc configuration flags. An alternative short-term workaround is to override the UNW_ARM_UNWIND_METHOD environment flag. This allows the testsuite to complete, and build to finish.

odroid:/mnt/data/sources/mongodb$ UNW_ARM_UNWIND_METHOD=4 DEB_BUILD_OPTIONS=nodbgsym dpkg-buildpackage -b -uc -nc -j2