[Haskell-cafe] GHC FreeBSD memory model (was: 8.4.3 release)

Viktor Dukhovni ietf-dane at dukhovni.org
Fri Jul 6 05:15:12 UTC 2018


On Thu, Jul 05, 2018 at 10:20:30PM -0400, Ben Gamari wrote:

> Hmm, it's possible this could be fixed fairly easily in that case. The
> original reason for disabling the two-step allocator is #12695, where
> GHC failed to build due to MAP_NORESERVE being undefined. I had assumed
> that this meant that reservation-only mappings weren't defined however
> now that I look again at the mmap(2) manpage it looks like I
> am probably wrong. MAP_NORESERVE merely flags to the system that swap
> space should not be reserved for the mapping.

Right, absence of MAP_NORESERVE does not mean that all the pages
will be reserved in advance.  However, the VM behaviour may be
configuration-dependent:

    https://wiki.freebsd.org/SystemTuning#SYSCTL_TUNING

    The vm.overcommit sysctl defines the overcommit behaviour of
    the vm subsystem. The virtual memory system always does accounting
    of the swap space reservation, both total for system and per-user.
    Corresponding values are available through sysctl vm.swap_total,
    that gives the total bytes available for swapping, and
    vm.swap_reserved, that gives number of bytes that may be needed
    to back all currently allocated anonymous memory. Setting bit
    0 of the vm.overcommit sysctl causes the virtual memory system
    to return failure to the process when allocation of memory
    causes vm.swap_reserved to exceed vm.swap_total. Bit 1 of the
    sysctl enforces RLIMIT_SWAP limit (see getrlimit(2)). Root is
    exempt from this limit. Bit 2 allows to count most of the
    physical memory as allocatable, except wired and free reserved
    pages (accounted by vm.stats.vm.v_free_target and
    vm.stats.vm.v_wire_count sysctls, respectively).

I have the default vm.overcommmit setting:

    $ sysctl vm.overcommit
    vm.overcommit: 0

in which "bit 0" is off, so processes can allocate more anonymous
memory than available swap.  With vm.overcommit=1 the 1TB mmap()
fails (surprisingly EINVAL rather than ENOMEM), while a 1MB mmap()
succeeds.

> My only question is how one performs reservation-and-commit mappings on
> FreeBSD. This is important since without committing any access to the
> mapping may crash the program in the case of OOM. Does FreeBSD simply
> not provide a way to safely map known-good memory?

With vm.overcommit=0, I don't off-hand know of a way to map anonymous
memory that's guaranteed to not later segfault on first access.

One way to avoid issues when vm.overcommit=1 is to use the MAP_GUARD
flag to reserve a mapping without allocating any pages.  And then
(as needed) explicitly map pages within the reserved range.

    $ cat foo.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <err.h>
    #include <sys/mman.h>

    int main(int argc, char **argv)
    {
	size_t heapmax = 1ULL << 40; /* 1TB */
	size_t heaplen = 1ULL << 20; /* 1MB */
	unsigned char *heap;
	unsigned char *heap2;

	/* Reserve 1 TB of address-space */
	heap = mmap(NULL, heapmax, PROT_NONE, MAP_GUARD, -1, 0);
	if (heap == MAP_FAILED)
	    err(1, "mmap");
	printf ("%p(%zd)\n", heap, heapmax);

	/* Allocate last 1MB of the 1TB address-space */
	heap2 = &heap[heapmax - heaplen];
	heap2 = mmap(heap2, heaplen, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE|MAP_FIXED, -1, 0);
	if (heap2 == MAP_FAILED)
	    err(1, "mmap");

	/* Use the allocated space */
	heap2[0] = 'A';
	heap2[heaplen-1] = 'z';
	printf ("%p(%zd) %c %c\n", heap, heaplen, heap2[0], heap2[heaplen-1]);
	sleep(20);
	return 0;
    }

    $ ./foo & sleep 1; pid=$! ; cat /proc/$pid/map; ps -o vsz= -o rss= $pid ; wait
    [1] 84867
    0x800e00000(1099511627776)
    0x800e00000(1048576) A z
    0x400000 0x401000 1 0 0xfffff80f04f75780 r-x 2 0 0x9210 COW NC vnode /tmp/foo NCH -1
    0x600000 0x601000 1 0 0xfffff80b872ed1e0 rw- 1 0 0x3000 NCOW NNC default - CH 1001
    0x800600000 0x800621000 33 0 0xfffff80011df3780 r-x 173 76 0x1000 COW NC vnode /libexec/ld-elf.so.1 NCH -1
    0x800621000 0x800642000 22 0 0xfffff80b872ed690 rw- 1 0 0x3000 NCOW NNC default - CH 1001
    0x800820000 0x800821000 1 0 0xfffff80d348e3c30 rw- 1 0 0x3000 COW NNC vnode /libexec/ld-elf.so.1 CH 1001
    0x800821000 0x800822000 1 0 0xfffff80f432df690 rw- 1 0 0x3000 NCOW NNC default - CH 1001
    0x800822000 0x8009b4000 402 0 0xfffff8000d720e10 r-x 173 76 0x1000 COW NC vnode /lib/libc.so.7 NCH -1
    0x8009b4000 0x800bb4000 0 0 0xfffff8035779ea50 --- 1 0 0x2000 NCOW NNC default - NCH -1
    0x800bb4000 0x800bc0000 12 0 0xfffff8084feace10 rw- 1 0 0x3000 COW NNC vnode /lib/libc.so.7 CH 1001
    0x800bc0000 0x800bda000 5 0 0xfffff80682dcfc30 rw- 2 0 0x3000 NCOW NNC default - CH 1001
    0x800c00000 0x800e00000 9 0 0xfffff80682dcfc30 rw- 2 0 0x3000 NCOW NNC default - CH 1001
    0x800e00000 0x10800d00000 0 0 0 --- 0 0 0x0 COW NC none - NCH -1
    0x10800d00000 0x10800e00000 2 0 0xfffff8018644c1e0 rw- 1 0 0x3000 NCOW NNC default - CH 1001
    0x10800e00000 0x10801000000 6 0 0xfffff80b76b20d20 rw- 1 0 0x3000 NCOW NNC default - CH 1001
    0x7fffdffff000 0x7ffffffdf000 0 0 0 --- 0 0 0x0 NCOW NNC none - NCH -1
    0x7ffffffdf000 0x7ffffffff000 3 0 0xfffff80c6b72c1e0 rw- 1 0 0x3000 NCOW NNC default - CH 1001
    0x7ffffffff000 0x800000000000 0 0 0xfffff800117b6000 r-x 100 0 0x6 NCOW NNC default - NCH -1
    9340 2072
    [1]+  Done                    ./foo

Note the 1TB-1MB hole in the process memory map:

    0x800e00000 0x10800d00000 0 0 0 --- 0 0 0x0 COW NC none - NCH -1

followed by a 1MB mapping:

    0x10800d00000 0x10800e00000 2 0 0xfffff8018644c1e0 rw- 1 0 0x3000 NCOW NNC default - CH 1001

Note also that the "vsz" does not include the hole.  So we are
guaranteed exclusive use of a virtual address range, but it is not
initially allocated, and later allocations can proceed incrementally.
Those later allocation may or may not be subject to overcommit
with OOM segfaults depending on vm.overcommit.

-- 
	Viktor.


More information about the Haskell-Cafe mailing list