In https://bugzilla.redhat.com/show_bug.cgi?id=2082806 we've been
tracking an insidious qemu bug which intermittently prevents the
libguestfs appliance from starting. The symptoms are that SeaBIOS
starts and displays its messages, but the kernel isn't reached. We
found that the kernel does in fact start, but when it tries to set up
page tables and jump to protected mode it gets a triple fault which
causes the emulated CPU in qemu to reset (qemu exits).
This seems to only affect TCG (not KVM).
Yesterday I found that this is caused by using -cpu max which enables
the "la57" feature (5-level page tables[0]), and that we can make the
problem go away using -cpu max,la57=off. Note that I still don't
fully understand the qemu bug, so this is only a workaround.
I chose to disable 5-level page tables for both TCG and KVM, partly to
make the patch simpler, and partly because I guess it's not a feature
(ie. 57 bit linear addresses) that is useful for the libguestfs
appliance case, where we have limited physical memory and no need to
run any programs with huge address spaces.
I tested this by running both the direct & libvirt paths overnight. I
expect that this patch will fail with old qemu/libvirt which doesn't
understand the "la57" feature, but this is only intended as a
temporary workaround.
[0] Article about 5-level page tables as background:
https://lwn.net/Articles/717293/
Thanks: Laszlo Ersek
Fixes: https://answers.launchpad.net/ubuntu/+source/libguestfs/+question/701625
Acked-by: Laszlo Ersek <lersek@redhat.com>
Representing "iface" in the "drive_create_data" and "drive" structures is
now useless; the direct backend ignores "iface", while the libvirt one
rejects it unless it is empty. Unify both backends -- make them both
ignore "iface". (Which only relaxes the libvirt backend, so it cannot
cause compatibility problems.) This lets us remove the fields. Update the
documentation as well.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1844341
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20220504134155.11832-3-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Currently the guestfs_readdir() API can not list long directories, due to
it sending back the whole directory listing in a single guestfs protocol
response, which is limited to GUESTFS_MESSAGE_MAX (approx. 4MB) in size.
Introduce the "internal_readdir" action, for transferring the directory
listing from the daemon to the library through a FileOut parameter.
Rewrite guestfs_readdir() on top of this new internal function:
- The new "internal_readdir" action is a daemon action. Do not repurpose
the "readdir" proc_nr (138) for "internal_readdir", as some distros ship
the binary appliance to their users, and reusing the proc_nr could
create a mismatch between library & appliance with obscure symptoms.
Replace the old proc_nr (138) with a new proc_nr (511) instead; a
mismatch would then produce a clear error message. Assume the new action
will first be released in libguestfs-1.48.2.
- Turn "readdir" from a daemon action into a non-daemon one. Call the
daemon action guestfs_internal_readdir() manually, receive the FileOut
parameter into a temp file, then deserialize the dirents array from the
temp file.
This patch sneakily fixes an independent bug, too. In the pre-patch
do_readdir() function [daemon/readdir.c], when readdir() returns NULL, we
don't distinguish "end of directory stream" from "readdir() failed". This
rewrite fixes this problem -- I didn't see much value separating out the
fix for the original do_readdir().
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1674392
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20220502085601.15012-2-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
If the appliance is a QCOW2 image, function get_root_uuid_with_file()
fails to read ext filesystem signature (0x53EF at offset 0x438) from it.
This results in the following error:
libguestfs: error: /usr/lib64/guestfs/appliance/root: appliance is not
an extfs filesystem
The error itself is harmless, but misleading. So let's skip retrieving
the signature and UUID in case the image contains QCOW2 header. It's
safe because in this case we'll retrieve it later from RAW image dumped
from that QCOW2 by "qemu-img dd".
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
This was a feature that allowed you to add drives to the appliance
after launching it. It was complicated to implement, and only worked
for the libvirt backend (not "direct", which is the default backend).
It also turned out to be a bad idea. The original concept was that
appliance creation was slow, so to examine multiple guests you should
launch the handle once then hot-add the disks from each guest in turn
to manipulate them. However this is terrible from a security point of
view, especially for multi-tenant, because the drives from one guest
might compromise the appliance and thus the filesystems/drives from
subsequent guests.
It also turns out that hotplugging is very slow. Nowadays appliance
creation should be faster than hotplugging.
The main use case for this was virt-df, but virt-df no longer uses it
after we discovered the problems outlined above.
User-Mode Linux was an alternative hypervisor that could run the
appliance, instead of using qemu. It had many limitations including
lack of network, and UML support in Linux has been semi-broken for a
long time. It was also slower than KVM on baremeal in general and had
various corner cases which were much slower including the emulated
serial port which made bulk uploads and downloads painful. Also of
course it lacked qemu-specific features like qcow2 or any
network-backed disk, so many disk images could not be opened this way.
This was never supported in RHEL.
See-also: https://bugzilla.redhat.com/1144197
This experimental feature allowed you (in theory) to connect to an
existing instance of the libguestfs daemon. (Again, in theory) it
allowed you to attach to running guests. This didn't work well in
practice. If you want to do this, install qemu-guest-agent inside
your guest instead.
This also disables the --live options in guestfish and guestmount.
(The option now prints an error).
This was never supported in RHEL.
The daemon tests relied on this connection method to perform tests on
a bare daemon, so this removes those tests. They were not especially
valuable.
See-also: https://bugzilla.redhat.com/798980
GCC 12 gives a warning about our previous attempt to check the length
of the socket path. In the ensuing discussion it was pointed out that
it is easier to get snprintf to do the hard work. snprintf will
return an int >= UNIX_PATH_MAX if the path is too long, or < 0 if
there are other errors such as locale/encoding problems. So we should
just check for those two cases instead.
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/NPKWMTSJ2A2ABNJJEH6WTZIAEFTX6CQY/
Thanks: Martin Sebor and Laszlo Ersek
The 169.254.0.0/16 network specification (for the appliance) is currently
duplicated between the direct backend and the libvirt backend. In a
subsequent patch, we're going to need the network specification in yet
another spot; extract it now to the NETWORK_ADDRESS and NETWORK_PREFIX
macros (simply as strings).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-3-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
The <qemu:commandline> trick we use for adding our virtio-net-pci device
in the libvirt backend can conflict with libvirtd's and QEMU's PCI address
assignment. Try to mitigate that by placing our device in slot 0x1e on the
root bus. In practice this could only conflict with a "dmi-to-pci-bridge"
device model, which libvirtd itself places in slot 0x1e. However, given
the XMLs we generate, and modern QEMU versions, libvirtd has no reason to
auto-add "dmi-to-pci-bridge". Refer to
<https://libvirt.org/formatdomain.html#controllers>.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-2-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2030709
Thanks: label@rockylinux.org
---
RWMJ notes: I fixed the original patch so it compiled. This patch
sets osinfo to "rocky8", which doesn't exist in the osinfo db yet.
Arguably we might want to set this to "centos8", but we can see what
libosinfo decides to do. Here is partial virt-inspector output on a
Rocky Linux disk image:
$ ./run virt-inspector -a disk.img
<?xml version="1.0"?>
<operatingsystems>
<operatingsystem>
<root>/dev/rl/root</root>
<name>linux</name>
<arch>x86_64</arch>
<distro>rocky</distro>
<product_name>Rocky Linux 8.5 (Green Obsidian)</product_name>
<major_version>8</major_version>
<minor_version>5</minor_version>
<package_format>rpm</package_format>
<package_management>dnf</package_management>
<hostname>localhost.localdomain</hostname>
<osinfo>rocky8</osinfo>
<mountpoints>
<mountpoint dev="/dev/rl/root">/</mountpoint>
<mountpoint dev="/dev/sda1">/boot</mountpoint>
</mountpoints>
<filesystems>
<filesystem dev="/dev/rl/root">
<type>xfs</type>
<uuid>fed8331f-9f25-40cd-883e-090cd640559d</uuid>
</filesystem>
<filesystem dev="/dev/rl/swap">
<type>swap</type>
<uuid>6da2c121-ea7d-49ce-98a3-14a37fceaadd</uuid>
</filesystem>
<filesystem dev="/dev/sda1">
<type>xfs</type>
<uuid>4efafe61-2d20-4d93-8055-537e09bfd033</uuid>
</filesystem>
</filesystems>
gcc emits the following warning:
> proto.c: In function ‘send_file_complete’:
> proto.c:437:10: error: ‘buf’ may be used uninitialized
> [-Werror=maybe-uninitialized]
> 437 | return send_file_chunk (g, 0, buf, 0);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In theory, passing the 1-byte array "buf", with indeterminate contents, to
xdr_bytes() ultimately, could be fine -- assuming xdr_bytes() never reads
the contents of the buffer, due to the buffer size being zero. However,
the xdr_bytes() manual does not seem to guarantee this (it also does not
explicitly permit passing a NULL buffer alongside size=0, which would be
even simpler for the caller).
In order to shut up the compiler, just zero-initialize the buffer --
that's simpler than adding diagnostics pragmas. The "maybe-uninitialized"
warning is otherwise very useful, so keep it globally enabled (per
WARN_CFLAGS / WERROR_CFLAGS).
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211011223627.20856-3-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
qemu 6.1 has decided to change qemu-img create so that a backing
format (-F) is required if a backing file (-b) is specified. Since we
don't want to change the libguestfs API to force callers to specify
this because that would be an API break, autodetect it.
This is similar to commit c8c181e8d9 ("launch: libvirt: Autodetect
backing format for readonly drive overlays").
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1998820
Windows Server 2022 preview is identified as <osinfo>win2k16</osinfo>.
Although current osinfo-db does not have an entry "win2k22", return
this instead.
osinfo-db issue to add win2k22:
https://gitlab.com/libosinfo/osinfo-db/-/issues/82
Inspection information for the guest:
type: windows
distro: windows
product_name: Windows Server 2022 Datacenter
product_variant: Server
version: 10.0
arch: x86_64
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1997446
Reported-by: Yongkui Guo
On RISC-V there is no default machine type. Invoking QEMU requires to
specify a board model with the -M option. So let's define MACHINE_TYPE
as virt.
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
QEMU has deprecated this option:
commit 166310299a1e7824bbff17e1f016659d18b4a559
Author: Daniel P. Berrangé
Date: Tue Oct 20 17:08:27 2020 +0100
os: deprecate the -enable-fips option and QEMU's FIPS enforcement
The -enable-fips option was added a long time ago to prevent the use of
single DES when VNC when FIPS mode is enabled. It should never have been
added, because apps are supposed to unconditionally honour FIPS mode
based on the '/proc/sys/crypto/fips_enabled' file contents.
In addition there is more to achieving FIPS compliance than merely
blocking use of certain algorithms. Those algorithms which are used
need to perform self-tests at runtime.
QEMU's built-in cryptography provider has no support for self-tests,
and neither does the nettle library.
If QEMU is required to be used in a FIPS enabled host, then it must be
built with the libgcrypt library enabled, which will unconditionally
enforce FIPS compliance in any algorithm usage.
Thus there is no need to keep either the -enable-fips option in QEMU, or
QEMU's internal FIPS checking methods.
This gnulib feature abstracts away threads, locks and TLS, and also
allowed libguestfs to be linked with or without pthread. However
since pthread these days is part of glibc and so every program is
using pthread, and we want to get rid of gnulib as a dependency, just
use pthread directly.
Note this requires libvirt >= 7.1.0 which was only released in March 2021.
With an older libvirt you will see this error:
Original error from libvirt: unsupported configuration: Invalid mode attribute 'maximum' [code=67 int1=-1]
In theory we could check if this is supported by looking at the
libvirt capabilities and fall back, but this commit does not do that,
in the expectation that most people will be using the default backend
(direct) and on Fedora/RHEL we will add an explicit minimum version
dependency to the package.
qemu support has been around quite a bit longer (at least since 2017).
Fixes: commit 30f74f38bd
QEMU has a newish feature (from about 2017 / qemu 2.9) called -cpu max
which is supposed to select the best CPU, ideal for libguestfs.
After this change, on x86-64:
KVM TCG
Direct -cpu max -cpu max
(non-libvirt)
Libvirt <cpu mode="host-passthrough"> <cpu mode="host-model">
<model fallback="allow"/> <model fallback="allow"/>
</cpu> </cpu>
Thanks: Daniel Berrangé
Previously this was in common/utils. However it is not used anywhere
else, and guestfs-tools wants to remove gnulib dependencies, so move
this to libguestfs.
Bug originally reported here by trysis:
https://stackoverflow.com/questions/64273334/test-x-in-mounted-filesystem
If the user is root then we override normally access controls in FUSE,
see https://bugzilla.redhat.com/show_bug.cgi?id=1106548.
However this causes test -x to mark all files as executable. We
shouldn't let root execute any file, only ones which have the 'x' bit
set. Therefore this narrows the fix in bug 1106548 so it only applies
to read and write bits.
To test this I created a disk with guestfish which had an executable
and a non-executable file:
$ guestfish -N fs -m /dev/sda1
><fs> touch /file1
><fs> touch /file2
><fs> chmod 0755 /file1
><fs> ll /
total 24
drwxr-xr-x 3 root root 4096 Oct 12 14:04 .
drwxr-xr-x 19 root root 4096 Oct 12 14:04 ..
-rwxr-xr-x 1 root root 0 Oct 12 14:04 file1
-rw-r--r-- 1 root root 0 Oct 12 14:04 file2
drwx------ 2 root root 16384 Oct 12 14:04 lost+found
I then mounted and tested it as non-root:
$ guestmount -a test1.img -m /dev/sda1 /tmp/mnt -v -x
$ ls -l /tmp/mnt
total 16
-rwxr-xr-x. 1 root root 0 Oct 12 15:04 file1
-rw-r--r--. 1 root root 0 Oct 12 15:04 file2
drwx------. 2 root root 16384 Oct 12 15:04 lost+found
$ test -x /tmp/mnt/file1; echo $?
0
$ test -x /tmp/mnt/file2; echo $?
1
and as root:
$ sudo guestmount -a test1.img -m /dev/sda1 /tmp/mnt -v -x
$ test -x /tmp/mnt/file1; echo $?
0
$ test -x /tmp/mnt/file2; echo $?
0
In the debug output for non-root we can see the difference:
libguestfs: /file1: testing access mask X_OK: caller UID:GID = 1000:1000, file UID:GID = 0:0, file mode = 100755, result = OK
libguestfs: /file2: testing access mask X_OK: caller UID:GID = 1000:1000, file UID:GID = 0:0, file mode = 100644, result = EACCESS
and for root:
libguestfs: /file1: testing access mask X_OK: caller UID:GID = 0:0, file UID:GID = 0:0, file mode = 100755, result = OK
libguestfs: /file2: testing access mask X_OK: caller UID:GID = 0:0, file UID:GID = 0:0, file mode = 100644, result = OK
After this commit the root output changes to this (ie. same decision
as non-root):
libguestfs: /file1: testing access mask X_OK: caller UID:GID = 0:0, file UID:GID = 0:0, file mode = 100755, result = OK
libguestfs: /file2: testing access mask X_OK: caller UID:GID = 0:0, file UID:GID = 0:0, file mode = 100644, result = EACCESS
When guestfs_lvm_canonical_lv_name was called with a /dev/dm* or
/dev/mapper* name which was not an LV then a noisy error would be
printed. This would typically have happened with encrypted disks, and
now happens very noticably when inspecting Windows BitLocker-
encrypted guests.
This commit hides this error in all cases, although it is still logged
to debug. See comment and the thread below for detailed rationale.
https://www.redhat.com/archives/libguestfs/2020-October/thread.html#00055
This commit deprecates luks-open/luks-open-ro/luks-close for the more
generic sounding names cryptsetup-open/cryptsetup-close, which also
correspond directly to the cryptsetup commands.
The optional cryptsetup-open readonly flag is used to replace the
functionality of luks-open-ro.
The optional cryptsetup-open crypttype parameter can be used to select
the type (corresponding to cryptsetup open --type), which allows us to
open BitLocker-encrypted disks with no extra effort. As a convenience
the crypttype parameter may be omitted, and libguestfs will use a
heuristic (based on vfs-type output) to try to determine the correct
type to use.
The deprecated functions and the new functions are all (re-)written in
OCaml.
There is no new test here, unfortunately. It would be nice to test
Windows BitLocker support in this new API, however the Linux tools do
not support creating BitLocker disks, and while it is possible to
create one under Windows, the smallest compressed disk I could create
is 37M because of a mixture of the minimum support size for BitLocker
disks and the fact that encrypted parts of NTFS cannot be compressed.
Also synchronise with common module.
This brings libguestfs into line with other projects which have a
separate include/ directory for the public header.
It's also the case that <guestfs.h> has never particularly belonged in
the lib/ subdirectory. Some tools add -Ilib/ but they only need
<guestfs.h> and not any other headers from that directory, and
separating out the public header allows us to clean those up. This is
certainly the case for examples, and some language bindings and some
tests.
In future I'm hopeful we can use this as the basis to tease out other
dependencies, as a prelude to separating them out from the repo.
For the appliance of the QCOW2 format, the function get_root_uuid()
fails to get the UUID of the disk image.
In this case, let us read the first 256k bytes of the disk image with
the 'qemu-img dd' command. Then pass the read block to the 'file'
command.
Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Argon2 is the default LUKS Password-Based Key Derivation Function
(PBKDF) for some new guests such as RHEL 8.2 and Fedora. It is
designed to be "memory hard", meaning that by design it requires large
amounts of memory, making it expensive to brute-force. Unfortunately
the default for guests which had more than a few GB of RAM at install
time is to require about 1 GB of RAM to decrypt the block device,
which is considerably larger than the default available in the
libguestfs appliance.
To make it possible to open these encrypted disks we need to make the
appliance larger. This could be done as a one-off, and the current
workaround is simply to set LIBGUESTFS_MEMSIZE=2048 or a similar
amount. However since we don't know in advance whether we could be
dealing with an encrypted disk, partition, etc. or what PBKDF it uses,
the only way to deal with this in all circumstances is to increase the
default memsize. This commit increases it quite a lot (768 -> 1280)
which is unfortunate.
Note as there is some confusion on this point: Since libguestfs does
not attempt to decrypt disks in parallel, you only need ~ 1GB in
total, not per encrypted disk.
For a reproducer, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1837765#c14
At the moment it is empty, so probably it does not exist. Remove it to
avoid adding spurious content to the pkg-config file in case that
variable will get a value in the future.
We use a similar trick to libvirt to allow external C programs that
use libguestfs to be compiled against the built (but not installed)
libguestfs with:
../libguestfs/run ./configure
make
What actually happens is we have a second pkg-config file
(lib/local/libguestfs.pc) which points to the locally built
libguestfs. The ./run script sets up PKG_CONFIG_PATH to point to this
directory. Assuming that ./configure is using pkg-config/pkgconf and
not some other half-baked solution it will pick up the libguestfs.pc
file from here which will set CFLAGS and LIBS appropriately.
Appliance device names are not reliable since the kernel no longer
enumerates virtio-scsi devices serially. Instead get the UUID of the
appliance and pass this as the parameter.
Note this requires supermin >= 5.1.18 (from around July 2017).