In a following commit, g->hv may now be NULL if the user does not
explicitly either call guestfs_set_hv or set LIBGUESTFS_HV.
There were multiple places in the code that assumed g->hv was not
NULL, so they were all revised so that now g->hv is only not NULL if
set by the user and (in the normal case) is NULL meaning "use the
default hypervisor".
For the direct backend we call get_default_hv_direct early on and set
data->qemu in the handle. We need to modify the KVM test so that the
current qemu binary is passed along.
We also have to change execve to execvpe since the path to qemu is not
absolute.
Add a generic way for backends to report the default hypervisor
(ie. QEMU) back to the main code.
The direct implemention reflects the current way that the hypervisor
is chosen at configure time (see m4/guestfs-qemu.m4).
For the libvirt backend, we are already getting this from libvirt
domcapabilities, so we can just return that field.
Note this may return NULL (roughly "no data"), particularly in the
libvirt case because that requires us to have launched the appliance
already.
https://bugzilla.redhat.com/2082806 is a private kernel bug (sigh) for
RHEL 8, where some missing backports caused 5-level page tables to
fail when qemu emulated them. This was fixed back in 2023, so let's
remove this workaround now.
Previously we tested if KVM was available, and cached that for the
qemu binary. I think this was actually wrong. For example, if the
machine restarts, then the cache is still around, but KVM might be
enabled or disabled because of a new host kernel.
In any case, let's radically simplify this.
Test for KVM on each run. Consequently we can remove all the qemu
test caching stuff as it is no longer used anywhere.
I also tightened up the code that runs the QMP query-kvm command, so
now any unexpected output will cause a runtime failure. This command
ought to work, and if it breaks we ought to know about it and fix it.
It's always a good idea to provide proper entropy to the libguestfs
appliance (eg. for any cryptographic operations). We already assume
virtio, so we can assume virtio-rng is present.
Previously we parsed qemu -help output to get the version of qemu and
to test for other features. After prior commits, this is no longer
done, so we can remove the qemu -help invocation and associated code
completely.
This function is now only used in one place, to print the version of
qemu in direct mode, when debugging is enabled.
Remove this function and replace with a direct command invocation
('qemu --version'). We only need to run this command when debugging
is enabled, and we copy all of the output to the debug channel.
I have made the assumption here that qemu -version does not try to
create a display device. (The previous invocation of qemu -help
actually ran 'qemu -display none -help' indicating that this was not
always the case.)
This is actually an improvement on before, since now we get to see the
full output of 'qemu --version'. The new output looks like:
libguestfs: begin testing qemu features
libguestfs: command: run: /usr/bin/qemu-kvm
libguestfs: command: run: \ -version
libguestfs: qemu: QEMU emulator version 10.1.0 (qemu-10.1.0-8.fc44)
libguestfs: qemu: Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
We already effectively assume that qemu is much newer than the 1.3.0
baseline previously documented. As one example, commit 47857751a7
("lib: direct: Remove test for qemu mandatory locking") assumes that
qemu >= 2.10. Because passt support is desirable in general, let's
assume that qemu is at least version 7.2.0, and document it.
qemu 7.2.0 was released in December 2022, nearly 3 years ago, and RHEL
9 is currently on qemu 9.1.0.
This keeps the SLIRP fallback path in case passt isn't installed, but
we should remove that fallback in future too.
We can safely assume that qemu supports -nodefaults and
-no-user-config, since these have been supported since forever.
-no-hpet was deprecated in qemu 8.0 and the option removed in early
2024, replaced with -machine hpet=off. HPET defaults to 'on' in
upstream qemu, and to 'off' in downstream RHEL rebuilds.
Since (for libguestfs) we can assume an up to date Linux kernel is
running inside the guest, and that the kernel will do the right thing
with regards to timers, we don't need to mess with qemu defaults. In
practice, Linux chooses kvm-clock.
Thanks: Thomas Huth, Daniel Berrange
We tested for QEMU >= 2.10 support for mandatory locking. I believe
this is for all practical purposes always enabled now (and qemu 2.10
is ancient history) so simply assume it's true always.
On QEMU 7.2.0+, if "passt" is available, ask QEMU for passt ("stream")
rather than SLIRP ("user") networking.
For this, we need to run passt ourselves. Given that passt daemonizes by
default, start it with our traditional function guestfs_int_cmd_run(). Ask
passt to save its PID file, because in case something goes wrong before
we're completely sure the appliance (i.e. QEMU) is up and running, we'll
need to kill passt, the *grandchild*, ourselves.
Pass "--one-off" to passt (same as libvirt). This way, once we have proof
that QEMU has connected to passt (because the appliance shows signs of
life), we need not clean up passt ourselves -- once QEMU exits, passt will
see an EOF on the unix domain socket, and exit as well.
Passt is way more flexible than SLIRP, and passt normally intends to
imitate the host environment in the guest as much as possible. This means
that, when switching from SLIRP to passt, the guest would see changes to
the following:
- guest IP address,
- guest subnet mask,
- host (= gateway) IP address,
- host (= gateway) MAC address.
Extract the SLIRP defaults into the new macros NETWORK_GW_IP and
NETWORK_GW_MAC, and pass them explicitly to passt. In particular,
"tests/rsync/test-rsync.sh" fails without setting the host address
(NETWORK_GW_IP) properly.
(These artifacts can be verified in the appliance with "virt-rescue
--network", by running "ip addr", "ip route", and "ip neighbor" at the
virt-rescue prompt. There are four scenarios: two libguest backends, times
passt being installed or not installed.)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20230714132213.96616-8-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Run this command across the source:
perl -pi.bak -e 's/(20[012][0-9])-20[12][012]/$1-2023/g' `git ls-files`
and remove changes to po{,-docs}/*.po{,t} (these will be regenerated
later when we run 'make dist').
setenv can call malloc and is not safe to call here. Glibc is usually
tolerant of this and we haven't had problems before, but if you use
GLIBC_TUNABLES glibc.malloc.check=1 (or any alternate malloc / libc
which serializes) then you would see hangs if starting multiple
libguestfs handles from different threads at the same time.
This commit also updates the common submodule to pick up:
commit 3c64bcdeaf684f05f46f3928b55aadafdfe72720
Author: Richard W.M. Jones <rjones@redhat.com>
Date: Fri Oct 14 11:07:21 2022 +0100
utils: Add function for copying the environment and adding new entries
libguestfs is currently calling setenv at an unsafe location between
fork and exec. To fix this we need a way to copy and modify the
environment before fork and then we can pass the modified environ to
execve-like functions. nbdkit already does the same so use that code.
This function is copied and adapted from here under a compatible license:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/utils/environ.c
Thanks: Siddhesh Poyarekar
These were added in libguestfs 1.14, but never really used. Only a
handful of probes were available. When I was benchmarking libguestfs
in 2016 I didn't even use these probes because better/simpler
techniques were available.
In https://bugzilla.redhat.com/show_bug.cgi?id=2082806 we've been
tracking an insidious qemu bug which intermittently prevents the
libguestfs appliance from starting. The symptoms are that SeaBIOS
starts and displays its messages, but the kernel isn't reached. We
found that the kernel does in fact start, but when it tries to set up
page tables and jump to protected mode it gets a triple fault which
causes the emulated CPU in qemu to reset (qemu exits).
This seems to only affect TCG (not KVM).
Yesterday I found that this is caused by using -cpu max which enables
the "la57" feature (5-level page tables[0]), and that we can make the
problem go away using -cpu max,la57=off. Note that I still don't
fully understand the qemu bug, so this is only a workaround.
I chose to disable 5-level page tables for both TCG and KVM, partly to
make the patch simpler, and partly because I guess it's not a feature
(ie. 57 bit linear addresses) that is useful for the libguestfs
appliance case, where we have limited physical memory and no need to
run any programs with huge address spaces.
I tested this by running both the direct & libvirt paths overnight. I
expect that this patch will fail with old qemu/libvirt which doesn't
understand the "la57" feature, but this is only intended as a
temporary workaround.
[0] Article about 5-level page tables as background:
https://lwn.net/Articles/717293/
Thanks: Laszlo Ersek
Fixes: https://answers.launchpad.net/ubuntu/+source/libguestfs/+question/701625
Acked-by: Laszlo Ersek <lersek@redhat.com>
This was a feature that allowed you to add drives to the appliance
after launching it. It was complicated to implement, and only worked
for the libvirt backend (not "direct", which is the default backend).
It also turned out to be a bad idea. The original concept was that
appliance creation was slow, so to examine multiple guests you should
launch the handle once then hot-add the disks from each guest in turn
to manipulate them. However this is terrible from a security point of
view, especially for multi-tenant, because the drives from one guest
might compromise the appliance and thus the filesystems/drives from
subsequent guests.
It also turns out that hotplugging is very slow. Nowadays appliance
creation should be faster than hotplugging.
The main use case for this was virt-df, but virt-df no longer uses it
after we discovered the problems outlined above.
The 169.254.0.0/16 network specification (for the appliance) is currently
duplicated between the direct backend and the libvirt backend. In a
subsequent patch, we're going to need the network specification in yet
another spot; extract it now to the NETWORK_ADDRESS and NETWORK_PREFIX
macros (simply as strings).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-3-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
QEMU has deprecated this option:
commit 166310299a1e7824bbff17e1f016659d18b4a559
Author: Daniel P. Berrangé
Date: Tue Oct 20 17:08:27 2020 +0100
os: deprecate the -enable-fips option and QEMU's FIPS enforcement
The -enable-fips option was added a long time ago to prevent the use of
single DES when VNC when FIPS mode is enabled. It should never have been
added, because apps are supposed to unconditionally honour FIPS mode
based on the '/proc/sys/crypto/fips_enabled' file contents.
In addition there is more to achieving FIPS compliance than merely
blocking use of certain algorithms. Those algorithms which are used
need to perform self-tests at runtime.
QEMU's built-in cryptography provider has no support for self-tests,
and neither does the nettle library.
If QEMU is required to be used in a FIPS enabled host, then it must be
built with the libgcrypt library enabled, which will unconditionally
enforce FIPS compliance in any algorithm usage.
Thus there is no need to keep either the -enable-fips option in QEMU, or
QEMU's internal FIPS checking methods.
Appliance device names are not reliable since the kernel no longer
enumerates virtio-scsi devices serially. Instead get the UUID of the
appliance and pass this as the parameter.
Note this requires supermin >= 5.1.18 (from around July 2017).
Nowadays there are hard drives and operating systems which support
"4K native" sector size. In this mode physical and logical block size
exposed to the operating system is equal to 4096 bytes.
GPT partition table (as a known example) being created in this mode will
place GPT header at LBA1 which is 4096 bytes. libguetfs is unable to
recognize partition table on such physical block devices or disk images.
The reason is that libguestfs appliance will look for a GPT header at
LBA1 which is seen at 512 byte offset.
In order to fix the issue we need a way to provide correct logical block
size for attached disks. Fortunately QEMU and libvirt already provides
a way to specify physical/logical block size per disk basis.
After discussion in a mailing list we agreed that physical block size is
rarely used and is not so important. Thus both physical and logical
block size will be set to the same value.
In this patch one more optional parameter 'blocksize' is added
to add_drive_opts API method. Valid values are 512 and 4096.
add_drive_scratch has the same optional parameter for a consistency and
testing purpose.
add-domain and add_libvirt_dom will pass logical_block_size value from
libvirt XML to add_drive_opts method.
Enhance the UEFI firmware lookup function with the information on the
libvirt firmware autoselection, allowing it to return a value to use for
the appliance.
At the moment no firmware is selected this way, so there is no behaviour
change.
We've been carrying this exact patch in RHEL 7 for several years. It
reverts the change made in 2014 where we switched to using the virbr0
bridge for libguestfs networking instead of SLIRP. We thought SLIRP
was going to become unsupported in qemu, but recently there have been
more encouraging signs since it looks like SLIRP will be spun off as a
separate project, running as a modular process and properly secured
and supported.
This reverts commit 224de20b9a.
This option was removed from qemu for no apparent reason except to
break existing consumers. It does the same as -no-user-config, added
in May 2012, so use that instead.
When using the direct backend, you should see the result of testing
the qemu binary for the availability of KVM:
libguestfs: qemu KVM: enabled
Thanks: Andrea Bolognani.
Ancient qemu 1.5 (in RHEL 7) does not understand the
file.file.filename= and file.driver= parameters. Go back to using the
old-style file= and format= parameters when we're not trying to set
the file.backing.file.locking=off parameter.
Fixes commit 9fe592808c.
Thanks: Yongkui Guo, Václav Kadlčík.
QEMU does not accept options unrecognized by the block driver
in use. Disable locking only for read-only disks that are
file-backed, as that's the only block driver it is supported
with.
Signed-off-by: Lars Seipel <ls@slrz.net>
Rather unnecessarily this function returned the parsed qemu version.
This complicates further refactoring, so I have changed the function
not to return this, and instead there is a separate function you have
to call to get the version struct (‘guestfs_int_qemu_version’).
Apart from a tiny amount of extra copying this is simply refactoring
of the interface between the direct-mode backend and the qemu query
functions.
This feature allows you to use different image formats for the fixed
appliance. The raw format is used by default.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
If SIGTERM is blocked in the main program, then it ends up still being
blocked in the subprocess after we fork. This means that we cannot
kill qemu by sending SIGTERM to it. This commit fixes the problem by
unblocking SIGTERM unconditionally after fork.
Thanks: wtfuzz on IRC for reporting and analysis.
PCI devices don't exist/work. You would see errors such as:
qemu-system-s390x: -device virtio-rng-pci,rng=rng0: MSI-X support is mandatory in the S390 architecture
In its current form this is very hard to implement because it requires
us to "unparse" the options, including removing any shell quoting.
It wasn't implemented at all for the libvirt backend.
Also contrary to the documentation, the configure script did not use
these options for testing, but constructed its own set of qemu test
options.
virtio-scsi has been supported in qemu since 2012, and it is superior
in every respect to virtio-blk. There's no reason to still be using
virtio-blk.
virtio-scsi support was initially added in 2012
(commit 0c0a7d0d86).
You can still use virtio-blk using the (deprecated) iface parameter,
but don't do that in new code.