On QEMU 7.2.0+, if "passt" is available, ask QEMU for passt ("stream")
rather than SLIRP ("user") networking.
For this, we need to run passt ourselves. Given that passt daemonizes by
default, start it with our traditional function guestfs_int_cmd_run(). Ask
passt to save its PID file, because in case something goes wrong before
we're completely sure the appliance (i.e. QEMU) is up and running, we'll
need to kill passt, the *grandchild*, ourselves.
Pass "--one-off" to passt (same as libvirt). This way, once we have proof
that QEMU has connected to passt (because the appliance shows signs of
life), we need not clean up passt ourselves -- once QEMU exits, passt will
see an EOF on the unix domain socket, and exit as well.
Passt is way more flexible than SLIRP, and passt normally intends to
imitate the host environment in the guest as much as possible. This means
that, when switching from SLIRP to passt, the guest would see changes to
the following:
- guest IP address,
- guest subnet mask,
- host (= gateway) IP address,
- host (= gateway) MAC address.
Extract the SLIRP defaults into the new macros NETWORK_GW_IP and
NETWORK_GW_MAC, and pass them explicitly to passt. In particular,
"tests/rsync/test-rsync.sh" fails without setting the host address
(NETWORK_GW_IP) properly.
(These artifacts can be verified in the appliance with "virt-rescue
--network", by running "ip addr", "ip route", and "ip neighbor" at the
virt-rescue prompt. There are four scenarios: two libguest backends, times
passt being installed or not installed.)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20230714132213.96616-8-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Introduce a small function for creating pathnames for PID files.
guestfs_int_make_pid_path() is something of an amalgamation of
guestfs_int_make_temp_path() [1] and guestfs_int_create_socketname() [2]:
- it creates a pathname under sockdir, like [2],
- it uses the handle's unique counter, like [1],
- it takes a name like both [1] and [2], but the name is not size-limited
like in [2], plus we hardcode the suffix from [1] as ".pid".
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20230714132213.96616-7-lersek@redhat.com>
Consider the following inverted call tree (effectively a dependency tree
-- callees are at the top and near the left margin):
lazy_make_tmpdir() [lib/tmpdirs.c]
guestfs_int_lazy_make_tmpdir() [lib/tmpdirs.c]
guestfs_int_make_temp_path() [lib/tmpdirs.c]
guestfs_int_lazy_make_sockdir() [lib/tmpdirs.c]
guestfs_int_create_socketname() [lib/launch.c]
lazy_make_tmpdir() is our common workhorse / helper function that
centralizes the mkdtemp() function call.
guestfs_int_lazy_make_tmpdir() and guestfs_int_lazy_make_sockdir() are the
next level functions, both calling lazy_make_tmpdir(), just feeding it
different dirname generator functions, and different "is_runtime_dir"
qualifications. These functions create temp dirs for various, more
specific, purposes (see the manual and "lib/guestfs-internal.h" for more
details).
On a yet higher level are guestfs_int_make_temp_path() and
guestfs_int_create_socketname() -- they serve for creating *entries* in
those specific temp directories.
The discrepancy here is that, although all the other functions live in
"lib/tmpdirs.c", guestfs_int_create_socketname() is defined in
"lib/launch.c". That makes for a confusing code reading; move the function
to "lib/tmpdirs.c", just below its sibling function
guestfs_int_make_temp_path().
While at it, correct the leading comment on
guestfs_int_create_socketname() -- the socket pathname is created in the
socket directory, not in the temporary directory.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20230714132213.96616-6-lersek@redhat.com>
We generate the <interface type="user"> element on libvirt 3.8.0+ already.
For selecting passt rather than SLIRP, we only need to insert the child
element <backend type='passt'>. Make that child element conditional on
libvirt 9.0.0+, plus "passt --help" being executable.
For the latter, place the new helper function guestfs_int_passt_runnable()
in "lib/launch.c" -- we're going to use the same function for the direct
backend as well.
This change exposes a number of (perceived) shortcomings in libvirt; I've
filed <https://bugzilla.redhat.com/show_bug.cgi?id=2222766> about those.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20230714132213.96616-3-lersek@redhat.com>
Currently we #define NETWORK_ADDRESS as "169.254.0.0". That's a bug in
libguestfs, but it's been invisible -- thus far it's been canceled out by
*independent* bugs in *both* QEMU and libvirt.
(1) With the direct backend, the current definition of NETWORK_ADDRESS
results in the following QEMU command line option:
-netdev user,id=usernet,net=169.254.0.0/16
According to the QEMU documentation, the "net" property here is supposed
to specify the *exact* IP address that the guest will receive. The guest
however receives 169.254.2.15 (I've checked that with virt-rescue).
In other words, libguestfs doesn't do what the QEMU documentation says,
and QEMU doesn't do what the QEMU documentation says either. The end
result has been good enough -- but only until now.
(2) With the libvirt backend, the current definition of NETWORK_ADDRESS
results in the following domain XML snippet:
<interface type="user">
<model type="virtio"/>
<ip family="ipv4" address="169.254.0.0" prefix="16"/>
</interface>
which libvirt translates, in turn, to
-netdev {"type":"user","net":"169.254.0.0/16","id":"hostnet0"}
According to the domain XML documentation, the @address attribute here is
again supposed to specify the *exact* IP address that the guest will
receive. However, the guest receives 169.254.2.15 (I've checked that with
virt-rescue).
In other words, libguestfs doesn't do what the libvirt documentation says,
and libvirt doesn't do what the libvirt documentation says either. The end
result has been good enough -- but only until now.
Where things break down though is the subsequent passt enablement, in the
rest of this series. For example, when using the libvirt backend together
with passt, libvirt translates the @address attribute to passt's
"--address" option, but passt takes the address *verbatim*, and not as a
subnet base address. That difference is visible in the appliance; for
example, when running virt-rescue on a Fedora 38 image, and issuing "ip
addr". Namely, after enabling passt for the libvirt backend, the
guest-visible IP address changes from 169.254.2.15 to 169.254.0.0, which
is an IP address that makes no sense for an endpoint.
Fix the latent bug by specifying the actual guest IP address we want, in
NETWORK_ADDRESS.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20230714132213.96616-2-lersek@redhat.com>
The last (only?) caller of guestfs_int_cmd_clear_close_files() disappeared
in commit e4c3968880 ("lib/info: Remove /dev/fd hacking and pass a true
filename to qemu-img info.", 2018-01-23), part of v1.37.36.
Simplify the code by removing guestfs_int_cmd_clear_close_files().
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20230711113906.107340-1-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
We require libvirt >= 0.10.2, and we included code to check this at
configure-, compile- and run-time. Remove the checks at compile and
run time (keep the ./configure check). Libvirt 0.10.2 was released
over 10 years ago so it's safe to assume that everyone has it by now.
Run this command across the source:
perl -pi.bak -e 's/(20[012][0-9])-20[12][012]/$1-2023/g' `git ls-files`
and remove changes to po{,-docs}/*.po{,t} (these will be regenerated
later when we run 'make dist').
Add an API to return the build ID of the guest. This to allow a
future change to be able to distinguish between Windows 10 and Windows 11
which can only be done using the build ID.
For Windows we can read the CurrentBuildNumber key from the registry.
For Linux there happens to be a BUILD_ID field in /etc/os-release.
I've never seen a Linux distro that actually uses this.
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
setenv can call malloc and is not safe to call here. Glibc is usually
tolerant of this and we haven't had problems before, but if you use
GLIBC_TUNABLES glibc.malloc.check=1 (or any alternate malloc / libc
which serializes) then you would see hangs if starting multiple
libguestfs handles from different threads at the same time.
This commit also updates the common submodule to pick up:
commit 3c64bcdeaf684f05f46f3928b55aadafdfe72720
Author: Richard W.M. Jones <rjones@redhat.com>
Date: Fri Oct 14 11:07:21 2022 +0100
utils: Add function for copying the environment and adding new entries
libguestfs is currently calling setenv at an unsafe location between
fork and exec. To fix this we need a way to copy and modify the
environment before fork and then we can pass the modified environ to
execve-like functions. nbdkit already does the same so use that code.
This function is copied and adapted from here under a compatible license:
https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/utils/environ.c
Thanks: Siddhesh Poyarekar
These were added for GCC 11. The problem has been fixed in GCC 12.
On macOS (clang) these produced errors like this:
tsk.c:75:32: error: unknown warning group '-Wanalyzer-file-leak', ignored [-Werror,-Wunknown-warning-option]
^
On macOS, several pages of errors like:
In file included from readdir.c:26:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/rpc/xdr.h:126:3: error: type name requires a specifier or qualifier
bool_t (*x_getlong)(struct __rpc_xdr *, int *);
^
launch.c:191:3: error: implicit declaration of function 'sigemptyset' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
sigemptyset (&sigset);
^
launch.c:192:3: error: implicit declaration of function 'sigaddset' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
sigaddset (&sigset, SIGTERM);
^
launch.c:193:3: error: implicit declaration of function 'sigprocmask' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
sigprocmask (SIG_UNBLOCK, &sigset, NULL);
^
3 errors generated.
These were added in libguestfs 1.14, but never really used. Only a
handful of probes were available. When I was benchmarking libguestfs
in 2016 I didn't even use these probes because better/simpler
techniques were available.
qemu (7.0) does not support -cpu max for TCG.
Note this change is necessary but not sufficient for getting
libguestfs to run on RISC-V, because there is also currently no
working path to make -kernel work.
On riscv64:
readdir.c: In function ‘guestfs_impl_readdir’:
readdir.c:127:3: error: implicit declaration of function ‘unlink’ [-Werror=implicit-function-declaration]
127 | unlink (tmpfn);
| ^~~~~~
I also changed the #include lines to make them look a bit more
like use in other files.
In https://bugzilla.redhat.com/show_bug.cgi?id=2082806 we've been
tracking an insidious qemu bug which intermittently prevents the
libguestfs appliance from starting. The symptoms are that SeaBIOS
starts and displays its messages, but the kernel isn't reached. We
found that the kernel does in fact start, but when it tries to set up
page tables and jump to protected mode it gets a triple fault which
causes the emulated CPU in qemu to reset (qemu exits).
This seems to only affect TCG (not KVM).
Yesterday I found that this is caused by using -cpu max which enables
the "la57" feature (5-level page tables[0]), and that we can make the
problem go away using -cpu max,la57=off. Note that I still don't
fully understand the qemu bug, so this is only a workaround.
I chose to disable 5-level page tables for both TCG and KVM, partly to
make the patch simpler, and partly because I guess it's not a feature
(ie. 57 bit linear addresses) that is useful for the libguestfs
appliance case, where we have limited physical memory and no need to
run any programs with huge address spaces.
I tested this by running both the direct & libvirt paths overnight. I
expect that this patch will fail with old qemu/libvirt which doesn't
understand the "la57" feature, but this is only intended as a
temporary workaround.
[0] Article about 5-level page tables as background:
https://lwn.net/Articles/717293/
Thanks: Laszlo Ersek
Fixes: https://answers.launchpad.net/ubuntu/+source/libguestfs/+question/701625
Acked-by: Laszlo Ersek <lersek@redhat.com>
Representing "iface" in the "drive_create_data" and "drive" structures is
now useless; the direct backend ignores "iface", while the libvirt one
rejects it unless it is empty. Unify both backends -- make them both
ignore "iface". (Which only relaxes the libvirt backend, so it cannot
cause compatibility problems.) This lets us remove the fields. Update the
documentation as well.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1844341
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20220504134155.11832-3-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Currently the guestfs_readdir() API can not list long directories, due to
it sending back the whole directory listing in a single guestfs protocol
response, which is limited to GUESTFS_MESSAGE_MAX (approx. 4MB) in size.
Introduce the "internal_readdir" action, for transferring the directory
listing from the daemon to the library through a FileOut parameter.
Rewrite guestfs_readdir() on top of this new internal function:
- The new "internal_readdir" action is a daemon action. Do not repurpose
the "readdir" proc_nr (138) for "internal_readdir", as some distros ship
the binary appliance to their users, and reusing the proc_nr could
create a mismatch between library & appliance with obscure symptoms.
Replace the old proc_nr (138) with a new proc_nr (511) instead; a
mismatch would then produce a clear error message. Assume the new action
will first be released in libguestfs-1.48.2.
- Turn "readdir" from a daemon action into a non-daemon one. Call the
daemon action guestfs_internal_readdir() manually, receive the FileOut
parameter into a temp file, then deserialize the dirents array from the
temp file.
This patch sneakily fixes an independent bug, too. In the pre-patch
do_readdir() function [daemon/readdir.c], when readdir() returns NULL, we
don't distinguish "end of directory stream" from "readdir() failed". This
rewrite fixes this problem -- I didn't see much value separating out the
fix for the original do_readdir().
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1674392
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20220502085601.15012-2-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
If the appliance is a QCOW2 image, function get_root_uuid_with_file()
fails to read ext filesystem signature (0x53EF at offset 0x438) from it.
This results in the following error:
libguestfs: error: /usr/lib64/guestfs/appliance/root: appliance is not
an extfs filesystem
The error itself is harmless, but misleading. So let's skip retrieving
the signature and UUID in case the image contains QCOW2 header. It's
safe because in this case we'll retrieve it later from RAW image dumped
from that QCOW2 by "qemu-img dd".
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
This was a feature that allowed you to add drives to the appliance
after launching it. It was complicated to implement, and only worked
for the libvirt backend (not "direct", which is the default backend).
It also turned out to be a bad idea. The original concept was that
appliance creation was slow, so to examine multiple guests you should
launch the handle once then hot-add the disks from each guest in turn
to manipulate them. However this is terrible from a security point of
view, especially for multi-tenant, because the drives from one guest
might compromise the appliance and thus the filesystems/drives from
subsequent guests.
It also turns out that hotplugging is very slow. Nowadays appliance
creation should be faster than hotplugging.
The main use case for this was virt-df, but virt-df no longer uses it
after we discovered the problems outlined above.
User-Mode Linux was an alternative hypervisor that could run the
appliance, instead of using qemu. It had many limitations including
lack of network, and UML support in Linux has been semi-broken for a
long time. It was also slower than KVM on baremeal in general and had
various corner cases which were much slower including the emulated
serial port which made bulk uploads and downloads painful. Also of
course it lacked qemu-specific features like qcow2 or any
network-backed disk, so many disk images could not be opened this way.
This was never supported in RHEL.
See-also: https://bugzilla.redhat.com/1144197
This experimental feature allowed you (in theory) to connect to an
existing instance of the libguestfs daemon. (Again, in theory) it
allowed you to attach to running guests. This didn't work well in
practice. If you want to do this, install qemu-guest-agent inside
your guest instead.
This also disables the --live options in guestfish and guestmount.
(The option now prints an error).
This was never supported in RHEL.
The daemon tests relied on this connection method to perform tests on
a bare daemon, so this removes those tests. They were not especially
valuable.
See-also: https://bugzilla.redhat.com/798980
GCC 12 gives a warning about our previous attempt to check the length
of the socket path. In the ensuing discussion it was pointed out that
it is easier to get snprintf to do the hard work. snprintf will
return an int >= UNIX_PATH_MAX if the path is too long, or < 0 if
there are other errors such as locale/encoding problems. So we should
just check for those two cases instead.
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/NPKWMTSJ2A2ABNJJEH6WTZIAEFTX6CQY/
Thanks: Martin Sebor and Laszlo Ersek
The 169.254.0.0/16 network specification (for the appliance) is currently
duplicated between the direct backend and the libvirt backend. In a
subsequent patch, we're going to need the network specification in yet
another spot; extract it now to the NETWORK_ADDRESS and NETWORK_PREFIX
macros (simply as strings).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-3-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
The <qemu:commandline> trick we use for adding our virtio-net-pci device
in the libvirt backend can conflict with libvirtd's and QEMU's PCI address
assignment. Try to mitigate that by placing our device in slot 0x1e on the
root bus. In practice this could only conflict with a "dmi-to-pci-bridge"
device model, which libvirtd itself places in slot 0x1e. However, given
the XMLs we generate, and modern QEMU versions, libvirtd has no reason to
auto-add "dmi-to-pci-bridge". Refer to
<https://libvirt.org/formatdomain.html#controllers>.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-2-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2030709
Thanks: label@rockylinux.org
---
RWMJ notes: I fixed the original patch so it compiled. This patch
sets osinfo to "rocky8", which doesn't exist in the osinfo db yet.
Arguably we might want to set this to "centos8", but we can see what
libosinfo decides to do. Here is partial virt-inspector output on a
Rocky Linux disk image:
$ ./run virt-inspector -a disk.img
<?xml version="1.0"?>
<operatingsystems>
<operatingsystem>
<root>/dev/rl/root</root>
<name>linux</name>
<arch>x86_64</arch>
<distro>rocky</distro>
<product_name>Rocky Linux 8.5 (Green Obsidian)</product_name>
<major_version>8</major_version>
<minor_version>5</minor_version>
<package_format>rpm</package_format>
<package_management>dnf</package_management>
<hostname>localhost.localdomain</hostname>
<osinfo>rocky8</osinfo>
<mountpoints>
<mountpoint dev="/dev/rl/root">/</mountpoint>
<mountpoint dev="/dev/sda1">/boot</mountpoint>
</mountpoints>
<filesystems>
<filesystem dev="/dev/rl/root">
<type>xfs</type>
<uuid>fed8331f-9f25-40cd-883e-090cd640559d</uuid>
</filesystem>
<filesystem dev="/dev/rl/swap">
<type>swap</type>
<uuid>6da2c121-ea7d-49ce-98a3-14a37fceaadd</uuid>
</filesystem>
<filesystem dev="/dev/sda1">
<type>xfs</type>
<uuid>4efafe61-2d20-4d93-8055-537e09bfd033</uuid>
</filesystem>
</filesystems>
gcc emits the following warning:
> proto.c: In function ‘send_file_complete’:
> proto.c:437:10: error: ‘buf’ may be used uninitialized
> [-Werror=maybe-uninitialized]
> 437 | return send_file_chunk (g, 0, buf, 0);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In theory, passing the 1-byte array "buf", with indeterminate contents, to
xdr_bytes() ultimately, could be fine -- assuming xdr_bytes() never reads
the contents of the buffer, due to the buffer size being zero. However,
the xdr_bytes() manual does not seem to guarantee this (it also does not
explicitly permit passing a NULL buffer alongside size=0, which would be
even simpler for the caller).
In order to shut up the compiler, just zero-initialize the buffer --
that's simpler than adding diagnostics pragmas. The "maybe-uninitialized"
warning is otherwise very useful, so keep it globally enabled (per
WARN_CFLAGS / WERROR_CFLAGS).
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211011223627.20856-3-lersek@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
qemu 6.1 has decided to change qemu-img create so that a backing
format (-F) is required if a backing file (-b) is specified. Since we
don't want to change the libguestfs API to force callers to specify
this because that would be an API break, autodetect it.
This is similar to commit c8c181e8d9 ("launch: libvirt: Autodetect
backing format for readonly drive overlays").
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1998820
Windows Server 2022 preview is identified as <osinfo>win2k16</osinfo>.
Although current osinfo-db does not have an entry "win2k22", return
this instead.
osinfo-db issue to add win2k22:
https://gitlab.com/libosinfo/osinfo-db/-/issues/82
Inspection information for the guest:
type: windows
distro: windows
product_name: Windows Server 2022 Datacenter
product_variant: Server
version: 10.0
arch: x86_64
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1997446
Reported-by: Yongkui Guo
On RISC-V there is no default machine type. Invoking QEMU requires to
specify a board model with the -M option. So let's define MACHINE_TYPE
as virt.
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
QEMU has deprecated this option:
commit 166310299a1e7824bbff17e1f016659d18b4a559
Author: Daniel P. Berrangé
Date: Tue Oct 20 17:08:27 2020 +0100
os: deprecate the -enable-fips option and QEMU's FIPS enforcement
The -enable-fips option was added a long time ago to prevent the use of
single DES when VNC when FIPS mode is enabled. It should never have been
added, because apps are supposed to unconditionally honour FIPS mode
based on the '/proc/sys/crypto/fips_enabled' file contents.
In addition there is more to achieving FIPS compliance than merely
blocking use of certain algorithms. Those algorithms which are used
need to perform self-tests at runtime.
QEMU's built-in cryptography provider has no support for self-tests,
and neither does the nettle library.
If QEMU is required to be used in a FIPS enabled host, then it must be
built with the libgcrypt library enabled, which will unconditionally
enforce FIPS compliance in any algorithm usage.
Thus there is no need to keep either the -enable-fips option in QEMU, or
QEMU's internal FIPS checking methods.