In https://bugzilla.redhat.com/show_bug.cgi?id=2082806 we've been
tracking an insidious qemu bug which intermittently prevents the
libguestfs appliance from starting. The symptoms are that SeaBIOS
starts and displays its messages, but the kernel isn't reached. We
found that the kernel does in fact start, but when it tries to set up
page tables and jump to protected mode it gets a triple fault which
causes the emulated CPU in qemu to reset (qemu exits).
This seems to only affect TCG (not KVM).
Yesterday I found that this is caused by using -cpu max which enables
the "la57" feature (5-level page tables[0]), and that we can make the
problem go away using -cpu max,la57=off. Note that I still don't
fully understand the qemu bug, so this is only a workaround.
I chose to disable 5-level page tables for both TCG and KVM, partly to
make the patch simpler, and partly because I guess it's not a feature
(ie. 57 bit linear addresses) that is useful for the libguestfs
appliance case, where we have limited physical memory and no need to
run any programs with huge address spaces.
I tested this by running both the direct & libvirt paths overnight. I
expect that this patch will fail with old qemu/libvirt which doesn't
understand the "la57" feature, but this is only intended as a
temporary workaround.
[0] Article about 5-level page tables as background:
https://lwn.net/Articles/717293/
Thanks: Laszlo Ersek
Fixes: https://answers.launchpad.net/ubuntu/+source/libguestfs/+question/701625
Acked-by: Laszlo Ersek <lersek@redhat.com>
(cherry picked from commit 59d7e6e017)
This was a feature that allowed you to add drives to the appliance
after launching it. It was complicated to implement, and only worked
for the libvirt backend (not "direct", which is the default backend).
It also turned out to be a bad idea. The original concept was that
appliance creation was slow, so to examine multiple guests you should
launch the handle once then hot-add the disks from each guest in turn
to manipulate them. However this is terrible from a security point of
view, especially for multi-tenant, because the drives from one guest
might compromise the appliance and thus the filesystems/drives from
subsequent guests.
It also turns out that hotplugging is very slow. Nowadays appliance
creation should be faster than hotplugging.
The main use case for this was virt-df, but virt-df no longer uses it
after we discovered the problems outlined above.
The 169.254.0.0/16 network specification (for the appliance) is currently
duplicated between the direct backend and the libvirt backend. In a
subsequent patch, we're going to need the network specification in yet
another spot; extract it now to the NETWORK_ADDRESS and NETWORK_PREFIX
macros (simply as strings).
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-3-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
The <qemu:commandline> trick we use for adding our virtio-net-pci device
in the libvirt backend can conflict with libvirtd's and QEMU's PCI address
assignment. Try to mitigate that by placing our device in slot 0x1e on the
root bus. In practice this could only conflict with a "dmi-to-pci-bridge"
device model, which libvirtd itself places in slot 0x1e. However, given
the XMLs we generate, and modern QEMU versions, libvirtd has no reason to
auto-add "dmi-to-pci-bridge". Refer to
<https://libvirt.org/formatdomain.html#controllers>.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2034160
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Message-Id: <20211223103701.12702-2-lersek@redhat.com>
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Note this requires libvirt >= 7.1.0 which was only released in March 2021.
With an older libvirt you will see this error:
Original error from libvirt: unsupported configuration: Invalid mode attribute 'maximum' [code=67 int1=-1]
In theory we could check if this is supported by looking at the
libvirt capabilities and fall back, but this commit does not do that,
in the expectation that most people will be using the default backend
(direct) and on Fedora/RHEL we will add an explicit minimum version
dependency to the package.
qemu support has been around quite a bit longer (at least since 2017).
Fixes: commit 30f74f38bd
QEMU has a newish feature (from about 2017 / qemu 2.9) called -cpu max
which is supposed to select the best CPU, ideal for libguestfs.
After this change, on x86-64:
KVM TCG
Direct -cpu max -cpu max
(non-libvirt)
Libvirt <cpu mode="host-passthrough"> <cpu mode="host-model">
<model fallback="allow"/> <model fallback="allow"/>
</cpu> </cpu>
Thanks: Daniel Berrangé
Appliance device names are not reliable since the kernel no longer
enumerates virtio-scsi devices serially. Instead get the UUID of the
appliance and pass this as the parameter.
Note this requires supermin >= 5.1.18 (from around July 2017).
Nowadays there are hard drives and operating systems which support
"4K native" sector size. In this mode physical and logical block size
exposed to the operating system is equal to 4096 bytes.
GPT partition table (as a known example) being created in this mode will
place GPT header at LBA1 which is 4096 bytes. libguetfs is unable to
recognize partition table on such physical block devices or disk images.
The reason is that libguestfs appliance will look for a GPT header at
LBA1 which is seen at 512 byte offset.
In order to fix the issue we need a way to provide correct logical block
size for attached disks. Fortunately QEMU and libvirt already provides
a way to specify physical/logical block size per disk basis.
After discussion in a mailing list we agreed that physical block size is
rarely used and is not so important. Thus both physical and logical
block size will be set to the same value.
In this patch one more optional parameter 'blocksize' is added
to add_drive_opts API method. Valid values are 512 and 4096.
add_drive_scratch has the same optional parameter for a consistency and
testing purpose.
add-domain and add_libvirt_dom will pass logical_block_size value from
libvirt XML to add_drive_opts method.
Enhance the UEFI firmware lookup function with the information on the
libvirt firmware autoselection, allowing it to return a value to use for
the appliance.
At the moment no firmware is selected this way, so there is no behaviour
change.
Previously, is_custom_hv() used to compare the QEMU executable found
during configure to the hypervisor set to check whether it is a custom
one; however, the QEMU found at configure time can be different than
what libvirt was configured with.
This fixes the libvirt backend when libguestfs is configured with a
different QEMU, that now will be specified as emulator overriding the
libvirt one.
We've been carrying this exact patch in RHEL 7 for several years. It
reverts the change made in 2014 where we switched to using the virbr0
bridge for libguestfs networking instead of SLIRP. We thought SLIRP
was going to become unsupported in qemu, but recently there have been
more encouraging signs since it looks like SLIRP will be spun off as a
separate project, running as a modular process and properly secured
and supported.
This reverts commit 224de20b9a.
After the previous commit, wherever we had:
start_element ("foo") {
string ("bar");
} end_element ();
this can now be replaced by:
single_element ("foo", "bar");
In some places when generating XML output in C code we use some clever
macros:
start_element ("memory") {
attribute ("unit", "MiB");
string_format ("%d", g->memsize);
} end_element ();
This commit which is mostly refactoring moves the repeated definitions
of these macros into a common header file.
I also took this opportunity to change / clean up the macros:
- The macros are now documented properly.
- comment() and empty_element() macros are now available everywhere.
- Error handling has been made generic.
- Added do..while(0) around some of the macros to make them safe to
use in all contexts.
Commit aa9e0057b1 made the libvirt backend
use <shareable/> for the disk of the appliance, since at that time all
the instances were using the disk directly.
OTOH, commit 3ad44c8660 switched to
overlays for read-only disks, including the appliance, so effectively
protecting the appliance.
Because of this, the libvirt backend does not need <shareable/> anymore.
Thanks to: Daniel Berrange, Richard W.M. Jones, Peter Krempa.
We were dropping the add_drive copyonread flag when using the libvirt
backend. This resulted in significant performance degradation (2x-3x
slower) when running virt-v2v against VMware servers.
Thanks: Kun Wei.
This feature allows you to use different image formats for the fixed
appliance. The raw format is used by default.
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Only in end-user messages and documentation. This change was done
mostly mechanically using the Perl script attached below.
I also changed don't -> don’t etc and made some other simple fixes.
See also: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
----------
#!/usr/bin/perl -w
use strict;
use Locale::PO;
my $re = qr{'([-\w%.,=?*/]+)'};
my %files = ();
foreach my $filename ("po/libguestfs.pot", "po-docs/libguestfs-docs.pot") {
my $poref = Locale::PO->load_file_asarray($filename);
foreach my $po (@$poref) {
if ($po->msgid =~ $re) {
my @refs = split /\s+/, $po->reference;
foreach my $ref (@refs) {
my ($file, $lineno) = split /:/, $ref, 2;
$file =~ s{^\.\./}{};
if (exists $files{$file}) {
push @{$files{$file}}, $lineno;
} else {
$files{$file} = [$lineno];
}
}
}
}
}
foreach my $file (sort keys %files) {
unless (-w $file) {
warn "warning: $file is probably generated\n"; # have to edit generator
next;
}
my @lines = sort { $a <=> $b } @{$files{$file}};
#print "editing $file at lines ", join (", ", @lines), " ...\n";
open FILE, "<$file" or die "$file: $!";
my @all = ();
push @all, $_ while <FILE>;
close FILE;
my $ext = $file;
$ext =~ s/^.*\.//;
foreach (@lines) {
# Don't mess with verbatim sections in POD files.
next if $ext eq "pod" && $all[$_-1] =~ m/^ /;
unless ($all[$_-1] =~ $re) {
# this can happen for multi-line strings, have to edit it
# by hand
warn "warning: $file:$_ does not contain expected content\n";
next;
}
$all[$_-1] =~ s/$re/‘$1’/g;
}
rename "$file", "$file.bak";
open FILE, ">$file" or die "$file: $!";
print FILE $_ for @all;
close FILE;
my $mode = (stat ("$file.bak"))[2];
chmod ($mode & 0777, "$file");
}