"parted" incorrectly reports "loop" rather than "msdos" for the partition
table type, when the (fake) partition table comes from the "--mbr" option
of "mkfs.fat" (in dosfstools-4.2+), and the FAT variant in question is
FAT16 or FAT32. (See RHBZ#2026224.) Work this around by
- parsing the partition table ourselves, and
- overriding "loop" with "msdos" when appropriate.
Note that when the FAT variant is FAT12, "parted" fails to parse the fake
MBR partition table completely (see RHBZ#2026220), which we cannot work
around. However, FAT12 should be a rare corner case in libguestfs usage --
"mkfs.fat" auto-chooses FAT12 only below 9MB disk size, and even "-F 12"
can only be forced up to and including 255MB disk size.
Add the helper function "has_bogus_mbr" to the Utils module; we'll use it
elsewhere too.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1931821
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20211125094954.9713-5-lersek@redhat.com>
[lersek@redhat.com: drop "fun" keyword, and use partial application, in
the definition of "sec0at" [Rich]]
Since commit 994ca1f8eb ("daemon: Reimplement 'part_get_mbr_part_type'
API in OCaml.", 2018-05-02), we've not had any calls to
print_partition_table() that would pass a "false" argument for the
"add_m_option" parameter.
Remove the parameter, and inside part_get_mbr_part_type(), remove the dead
branch.
Relatedly, update the comment on the
"print_partition_table_machine_readable" OCaml function, originally from
commit 32e661f421 ("daemon: Reimplement ‘part_list’ API in OCaml.",
2017-07-27). Because print_partition_table() now passes "-m" to "parted"
unconditionally, and there are no use cases left that would *forbid* "-m",
"print_partition_table_machine_readable" is almost equivalent to
print_partition_table() -- modulo the enforcement of the "BYT;" header.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20211125094954.9713-4-lersek@redhat.com>
The directory that readdir() and closedir() work on is BUS_PATH
("/sys/bus/virtio/drivers/9pnet_virtio"), not "/sys/block". Fix the error
messages that are sent when readdir() or closedir() fails.
(The invalid "sys/block" pathname could be a leftover from when the
directory reading logic was (perhaps) copied from "daemon/sync.c".)
Fixes: 5f10c33503
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20211125094954.9713-3-lersek@redhat.com>
Search the usage output of "mkfs.fat" for "--mbr[="; cache the result for
further invocations. If the option is supported, pass "--mbr=n" to
"mkfs.fat". This will prevent the creation of a bogus partition table
whose first (and only) entry describes a partition that contains the
partition table.
(Such a bogus partition table breaks "parted", which is a tool used by
libguestfs extensively, both internally and in public libguestfs APIs.)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1931821
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20211125094954.9713-2-lersek@redhat.com>
Another test was removing all disk_*.img files, which removed the file
used by this test. This was ultimately caused by us using parallel
tests.
Put the disk image files back into the tests/gdisk/ subdirectory to
avoid this race.
Fixes: commit 6d32773e81
"my" variable $output masks earlier declaration in same scope at ./gdisk/test-expand-gpt.pl line 73.
"my" variable $end_sectors masks earlier declaration in same scope at ./gdisk/test-expand-gpt.pl line 78.
Actually cargo caches downloaded libraries. The previous change
caused cargo to download and rebuild these after make clean which is
overly aggressive. Use make distclean instead.
Updates: commit 1834f19d20
Commit e597fc5317 ("daemon/yara: fix undefined behavior due to Yara 4.0
API changes", 2021-10-12) prevents the daemon from using such a Yara
version that precedes 4.0.0.
If only yara < 4 is found, treat the library as absent, rather than
attempting and failing to compile the yara module of the daemon. Note the
version requirement in the documentation too.
Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211013133611.21599-4-lersek@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Eliminate the AC_CHECK_LIB / AC_CHECK_HEADER tests for Yara, for the
following reasons:
- Upstream Yara has provided a pkg-config file since 2015, so the
(now-fixed) pkg-config check should always find it, without the
AC_CHECK_LIB / AC_CHECK_HEADER fallback branch.
- In a subsequent patch, we'll want to test for the incompatible Yara API
changes described at
<https://github.com/VirusTotal/yara/wiki/Backward-incompatible-changes-in-YARA-4.0-API>.
That's easy to do with pkg-config, but impossible with AC_CHECK_*,
without a custom test. Namely, both AC_CHECK_DECLS and AC_CHECK_TYPES
appear unable to check the parameter list of a function pointer typedef
(namely YR_CALLBACK_FUNC and YR_COMPILER_CALLBACK_FUNC). And writing a
dedicated test for this is overkill.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211013133611.21599-3-lersek@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
The upstream Yara project has always provided its "libyara/yara.pc.in"
file with "Name: yara" -- ever since its introduction in commit
334bd1a671ca ("Add support for pkg-config", 2015-01-08).
In spite of this, when the (optional) Yara dependency was added to
libguestfs in commit 2e24129da3 ("appliance: add yara dependency",
2017-05-02), PKG_CHECK_MODULES was invoked with [libyara] as
list-of-modules.
(That was clearly a bug. I'm unsure what Debian calls their Yara
pkg-config module, but calling it anything else than "yara" would be a
distribution bug: upstream projects provide pkg-config files specifically
so that dependent projects can find their dependencies *regardless of
distribution*.)
As a consequence, on Fedora today, the PKG_CHECK_MODULES macro always
fails, and only the AC_CHECK_LIB / AC_CHECK_HEADER branch finds Yara.
In a subsequent patch, we'll want to add a version requirement to the
PKG_CHECK_MODULES macro invocation, so at first, fix the pkg-config
identifier of Yara.
Fixes: 2e24129da3
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211013133611.21599-2-lersek@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Currently, the Yara test case ("yara/test-yara-scan.sh") fails, with the
following obscure error message:
> ><fs> yara-scan /text.txt
> libguestfs: error: deserialise_yara_detection_list:
Namely, the Yara rule match list serialization / de-serialization, between
the daemon and the library, is broken. It is caused by the following
incompatible pointer passing (i.e., undefined behavior), in function
do_internal_yara_scan(), file "daemon/yara.c":
> r = yr_rules_scan_fd (rules, fd, 0, yara_rules_callback, (void *) path, 0);
^^^^^^^^^^^^^^^^^^^
The prototype of yara_rules_callback() is:
> static int
> yara_rules_callback (int code, void *message, void *data)
however, in Yara commit 2b121b166d25 ("Track string matches using
YR_SCAN_CONTEXT.", 2020-02-27), which was included in the upstream v4.0.0
release, the rules callback prototype was changed as follows:
> diff --git a/libyara/include/yara/types.h b/libyara/include/yara/types.h
> index cad095cd70c2..f415033c4aa6 100644
> --- a/libyara/include/yara/types.h
> +++ b/libyara/include/yara/types.h
> @@ -661,6 +644,7 @@ struct YR_MEMORY_BLOCK_ITERATOR
>
>
> typedef int (*YR_CALLBACK_FUNC)(
> + YR_SCAN_CONTEXT* context,
> int message,
> void* message_data,
> void* user_data);
Therefore, the yara_rules_callback() function is entered with a mismatched
parameter list in the daemon, and so it passes garbage to
send_detection_info(), for serializing the match list.
This incompatible change was in fact documented by the Yara project:
https://github.com/VirusTotal/yara/wiki/Backward-incompatible-changes-in-YARA-4.0-API#scanning-callback
Gcc too warns about the incompatible pointer type, under
"-Wincompatible-pointer-types". However, libguestfs is built without
"-Werror" by default, so the warning is easy to miss, and the bug only
manifests at runtime.
(The same problem exists for yr_compiler_set_callback() /
compile_error_callback():
https://github.com/VirusTotal/yara/wiki/Backward-incompatible-changes-in-YARA-4.0-API#compiler-callback
except that this instance of the problem is not triggered by the test
case, as the rule list always compiles.)
Rather than simply fixing the parameter lists, consider the following
approach.
If Yara's YR_CALLBACK_FUNC and YR_COMPILER_CALLBACK_FUNC typedefs were not
for *pointer* types but actual *function* prototypes, then we could use
them directly in the declarations of our callback functions. Then any
future changes in the param lists would force a "conflicting types"
*compilation error* (not a warning). Illustration:
/* this is *not* a pointer type */
typedef int HELLO_FUNC (void);
/* function declarations */
static HELLO_FUNC my_hello_good;
static HELLO_FUNC my_hello_bad;
/* function definition, with explicit parameter list */
static int my_hello_good (void) { return 1; }
/* function definition with wrong param list -> compilation error */
static int my_hello_bad (int i) { return i; }
Unfortunately, given that the Yara-provided typedefs are already pointers,
we can't do this, in standard C anyway. Type derivation only allows for
array, structure, union, function, and pointer type derivation; it does
not allow "undoing" previous derivations.
However, using gcc's "typeof" keyword, the idea is possible. Given
YR_CALLBACK_FUNC, the expression
(YR_CALLBACK_FUNC)NULL
is a well-defined null pointer, and the function designator expression
*(YR_CALLBACK_FUNC)NULL
has the desired function type.
Of course, evaluating this expression would be undefined behavior, but in
the GCC extension expression
typeof (*(YR_CALLBACK_FUNC)NULL)
the operand of the "typeof" operator is never evaluated, as it does not
have a variably modified type. We can therefore use this "typeof" in the
same role as HELLO_FUNC had in the above example.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211011223627.20856-4-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
[lersek@redhat.com: clean up whitespace in "YR_RULE *rule"]
gcc emits the following warning:
> proto.c: In function ‘send_file_complete’:
> proto.c:437:10: error: ‘buf’ may be used uninitialized
> [-Werror=maybe-uninitialized]
> 437 | return send_file_chunk (g, 0, buf, 0);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In theory, passing the 1-byte array "buf", with indeterminate contents, to
xdr_bytes() ultimately, could be fine -- assuming xdr_bytes() never reads
the contents of the buffer, due to the buffer size being zero. However,
the xdr_bytes() manual does not seem to guarantee this (it also does not
explicitly permit passing a NULL buffer alongside size=0, which would be
even simpler for the caller).
In order to shut up the compiler, just zero-initialize the buffer --
that's simpler than adding diagnostics pragmas. The "maybe-uninitialized"
warning is otherwise very useful, so keep it globally enabled (per
WARN_CFLAGS / WERROR_CFLAGS).
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211011223627.20856-3-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
In OCaml 4.13:
Alert ocaml_deprecated_cli: Setting a warning with a sequence of lowercase or uppercase letters,
like 'CDEFLMPSUVYZX', is deprecated.
Use the equivalent signed form: +C+D+E+F+L+M+P+S+U+V+Y+Z+X+52-3.
(cherry picked from
guestfs-tools commit fa4f59e1d99c08d7e0bae2a7cb54f254a6506d67)
The current "arg_string_list" and "free_string_list" implementations go
back to commit b6f01f3260 ("Add Go (language) bindings.", 2013-07-03).
There are two problems with them:
- "free_string_list" doesn't actually free anything,
- at the *first* such g.Internal_test() call site that passes an
Ostringlist member inside the Optargs argument, namely:
> g.Internal_test ("abc",
> string_addr ("def"),
> []string{},
> false,
> 0,
> 0,
> "123",
> "456",
> []byte{'a', 'b', 'c', '\000', 'a', 'b', 'c'},
> &guestfs.OptargsInternal_test{Ostringlist_is_set: true,
> Ostringlist: []string{}
> }
> )
the "golang/run-bindtests" case crashes:
> panic: runtime error: cgo argument has Go pointer to Go pointer
>
> goroutine 1 [running]:
> libguestfs.org/guestfs.(*Guestfs).Internal_test.func7(0xc000018180,
> 0xadfb60, 0xadfb80, 0xc000010048, 0x0, 0x0, 0x0, 0xae3e10, 0xae3e30,
> 0xade3a0, ...)
> golang/src/libguestfs.org/guestfs/guestfs.go:6729 +0xa9
> libguestfs.org/guestfs.(*Guestfs).Internal_test(0xc000018180, 0x4ee3a5,
> 0x3, 0xc000061be8, 0xc000061af8, 0x0, 0x0, 0xc000061a00, 0x0, 0x0, ...)
> golang/src/libguestfs.org/guestfs/guestfs.go:6729 +0x3c9
> main.main()
> golang/bindtests/bindtests.go:77 +0x151e
> exit status 2
> FAIL run-bindtests (exit status: 1)
In Daniel P. Berrangé's words [1],
> You're allowed to pass a Go pointer to C via CGo, but the memory that
> points to is not allowed to contained further Go pointers. So the struct
> fields must strictly use a C pointer.
One pattern to solve the problem has been shown on stackoverflow [2].
Thus, rewrite the "arg_string_list" and "free_string_list" functions
almost entirely in C, following that example.
While this approach is not the most idiomatic Go, as a solution exists
without C helper functions [3], it should still be acceptable, at least as
an incremental improvement, for letting "golang/run-bindtests" pass.
[1] https://listman.redhat.com/archives/libguestfs/2021-September/msg00118.html
[2] https://stackoverflow.com/questions/35924545/golang-cgo-panic-runtime-error-cgo-argument-has-go-pointer-to-go-pointer
[3] https://listman.redhat.com/archives/libguestfs/2021-September/msg00106.html
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210921192939.32468-1-lersek@redhat.com>
Tested-by: "Richard W.M. Jones" <rjones@redhat.com>
Acked-by: "Richard W.M. Jones" <rjones@redhat.com>
According to xfs_admin(8):
> -c 0|1 Enable (1) or disable (0) lazy-counters in the filesys‐
> tem.
>
> Lazy-counters may not be disabled on Version 5 su‐
> perblock filesystems (i.e. those with metadata CRCs en‐
> abled).
>
> [...]
According to mkfs.xfs(1):
> -m global_metadata_options
> Section Name: [metadata]
> These options specify metadata format options that ei‐
> ther apply to the entire filesystem or aren't easily
> characterised by a specific functionality group. The
> valid global_metadata_options are:
>
> [...]
>
> crc=value
> This is used to create a filesystem which
> maintains and checks CRC information in all
> metadata objects on disk. The value is ei‐
> ther 0 to disable the feature, or 1 to en‐
> able the use of CRCs.
>
> [...]
>
> By default, mkfs.xfs will enable metadata
> CRCs.
Consistently with the above, the first "xfs_admin" test case in
"generator/actions_core.ml", which attempts to disable lazy counters,
always fails:
> 534/550 test_xfs_admin_0
> libguestfs: error: xfs_admin: /dev/sda1: Cannot disable lazy-counters on V5 fs
We can resolve this test failure in three ways:
(1) Extend do_mkfs() [daemon/mkfs.c], possibly even introduce
do_mkfs_xfs(), and permit the caller to specify "-m crc=0" for
mkfs.xfs. Then use this option when the temporary filesystem is
created in the XFS test that disables lazy counters.
(2) Extend the "guestfs_int_xfsinfo" structure in the libguestfs-common
project, with an "xfs_crc" field. Extend parse_xfs_info()
[daemon/xfs.c] to populate the field from "meta-data=...crc=[01]".
Modify the test case to check the following post-condition:
xfs_crc || xfs_lazycount == 0
instead of the current
xfs_lazycount == 0
effectively ignoring "xfs_lazycount" when "xfs_crc" is set.
(3) Remove the test altogether that attempts to disable lazy counters
after filesystem creation.
Given that new XFS filesystems are created with metadata CRCs enabled by
default, and several XFS features depend on metadata CRCs being enabled,
this patch implements option (3).
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210920052335.3358-4-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
The "test-md-and-lvm-devices" test case creates, among other things, a
RAID0 array (md127) that spans two *differently sized* block devices
(sda1: 20MB, lv0: 16MB).
In Linux v3.14, the layout of such arrays was changed incompatibly and
undetectably. If an array were created with a pre-v3.14 kernel and
assembled on a v3.14+ kernel, or vice versa, data could be corrupted.
In Linux v5.4, a mitigation was added, requiring the user to specify the
layout version of such RAID0 arrays explicitly, as a module parameter. If
the user fails to specify a layout version, the v5.4+ kernel refuses to
assemble such arrays. This is why "test-md-and-lvm-devices" currently
fails, with any v5.4+ appliance kernel.
Until we implement a more general solution (see the bugzilla link below),
work around the issue by sizing sda1 and lv0 identically. For this,
increase the size of sdb1 to 24MB: when one 4MB extent is spent on LVM
metadata, the resultant lv0 size (20MB) will precisely match the size of
sda1.
This workaround only affects sizes, and does not interfere with the
original purpose of this test case, which is to test various *stackings*
between disk partitions, software RAID (md), and LVM logical volumes.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=2005485
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210920052335.3358-3-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
[lersek@redhat.com: remove stray empty line]
In commit 6d32773e81 ("tests: Run the tests in parallel.", 2021-03-18),
the "abs_srcdir" macro value that the 9p test would see changed from
".../tests/9p" to just ".../tests" -- the last component got dropped.
(Said commit updated some "abs_srcdir"-based references accordingly, for
example under "tests/disks", but "tests/9p/test-9p.sh" was missed.)
Therefore, the guest-visible location of the "/test-9p.sh" file changed to
"/9p/test-9p.sh", and a non-recursive listing of the guest-visible root
directory would not return the file. Thus, the test fails now.
Restore the host-side base directory to ".../tests/9p".
Fixes: 6d32773e81
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210920052335.3358-2-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
bash can read input from a spawned process, and even provide input to
such process. This feature relies on /dev/fd/ being present. In the
past udev silently created this symlink, so this bash feature worked
more or less by accident. With recent systemd versions, such as 246
which is included in Leap 15.3, the symlink is not created anymore. As
a result scripts, such as /sbin/dhclient-script, fail to work
properly.
This symlink should have been created in version 1 of this variant of /init.
https://bugzilla.opensuse.org/show_bug.cgi?id=1190501
Signed-off-by: Olaf Hering <olaf@aepfle.de>
systemd-sysvinit contains the reboot command, which is used to
properly stop the VM. This was required by other packages, and as a
result always available. Since Leap 15.3 it will not be installed, and
as a result the VM will just panic because /init died.
If the appliance is started with --network, dhclient will run
/usr/sbin/dhclient-script, which in turn may call /sbin/netconfig to
update /etc/resolv.conf. Install sysconfig-netconfig to make sure DNS
resolving actually works.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Pass $(HIVEX_LIBS) with -cclib under the "daemon_utils_tests_LINK" target;
otherwise the OCaml compiler does not tell the linker where "-lhivex" can
be found, and the linking step fails if "-lhivex" is not on a system
library path.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210908133542.19002-3-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
"ocamlc -where" is supposed to "print the location of the standard library
and exit". While this directory contains core OCaml C header files, it
does not contain hivex-related C header files. Trim "guestfsd_CPPFLAGS"
accordingly.
Furthermore, the hivex module for OCaml may exist elsewhere than under the
OCaml standard library directory. Invoke "ocamlfind query hivex" to find
this module. This is what AC_CHECK_OCAML_PKG(hivex) does too, in
"m4/guestfs-ocaml.m4" and "m4/ocaml.m4".
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210908133542.19002-2-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
This warning is bogus, caused by the analyzer cannot track that len ==
0 if roots == NULL. I just changed the code to make it easier to
analyze, this doesn't fix any real bug.
guestfs-c.c: In function 'guestfs_finalize':
guestfs-c.c:85:9: error: dereference of NULL '0B' [CWE-476] [-Werror=analyzer-null-dereference]
85 | caml_remove_generational_global_root (roots[i]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Found by GCC -fanalyzer:
xattr.c:478:32: error: '%zu' directive output may be truncated writing between 1 and 19 bytes into a region of size 16 [-Werror=format-truncation=]
478 | snprintf (num, sizeof num, "%zu", nr_attrs);
| ^
xattr.c:478:32: note: directive argument in the range [0, 2305843009213693950]
/usr/include/bits/stdio2.h:71:10: note: '__builtin___snprintf_chk' output between 2 and 20 bytes into a destination of size 16
71 | return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
| ^
Short log:
Laszlo Ersek (2):
Makefile.am: supply missing $(LIBGUESTFS_CFLAGS)
Makefile.am: use $(LIBGUESTFS_LIBS) for linking OCaml test programs
Richard W.M. Jones (4):
tools: Refactor create_standard_options
tools: Add optional --program-name flag.
mltools/test-getopt.sh: Allow test to work for tools version 2
mltools: Add JSON_parser.object_get_bool
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
In commit 6d32773e81 ("tests: Run the tests in parallel.", 2021-03-18),
the working directory relative to which "test-parallel-mount-local" would
be launched (by the test machinery) changed from "tests/mount-local" to
just "tests".
While the relative pathname of the "guestunmount" executable was updated
inside "test-parallel-mount-local" accordingly, the relative pathname of
the FUSE client ("test-parallel-mount-local" itself, just invoked with
"--test") was not. This issue guarantees that the exec call fails in the
child, and so the test case always hangs.
Because we had removed "mount-local" from the end of the working
directory, prepend it now to the relative pathname of the FUSE client
executable.
Fixes: 6d32773e81
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210902135124.15191-3-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Each worker thread of "test-parallel-mount-local" performs the following
steps (among others):
(1) it starts an appliance dedicated to that thread, using a private
scratch disk image,
(2) exports a dedicated FUSE mount point on the host, exposing the file
system on the appliance's disk,
(3) launches a child process for manipulating the particular FUSE mount
point on the host,
(4) enters a FUSE request processing loop, translating requests between
the host kernel (coming in via the FUSE mount point) and the
appliance.
Items to note:
- The child process from step (3) consists of a single thread of execution
(see fork() in POSIX): a duplicate of the parent process's respective
worker thread.
- The child process from step (3) blocks on any FUSE mount point access on
the host until the worker thread in the parent process starts processing
FUSE requests, in step (4).
- The FUSE request processing in step (4), in the worker thread living in
the parent process, terminates if and only if the child process unmounts
the FUSE mount point originating from (2).
Should the exec call in step (3) fail for any reason, the child currently
jumps to the "error" label. This is wrong: under the error label, we call
guestfs_close() on the appliance -- but the appliance is owned by the
parent process's worker thread, not the child. What happens is that the
child kills off the appliance while the parent's worker thread is in the
FUSE request processing loop (4).
The "error" label was never meant to be reached by the child process -- if
exec fails for any reason, exit the child immediately. The parent will
remain in the FUSE request processing loop (4) forever, but no state will
be corrupted. For example, using another (interactive) session on the
host, the FUSE mount points can be interacted with, and if all of them are
manually unmounted, the FUSE request processing (4) completes in every
worker thread.
This patch does not fix the primary issue with
"test-parallel-mount-local", but removes "chaos" from the symptoms. The
next patch will fix the actual regression in this test case.
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20210902135124.15191-2-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>