In a following commit, g->hv may now be NULL if the user does not
explicitly either call guestfs_set_hv or set LIBGUESTFS_HV.
There were multiple places in the code that assumed g->hv was not
NULL, so they were all revised so that now g->hv is only not NULL if
set by the user and (in the normal case) is NULL meaning "use the
default hypervisor".
For the direct backend we call get_default_hv_direct early on and set
data->qemu in the handle. We need to modify the KVM test so that the
current qemu binary is passed along.
We also have to change execve to execvpe since the path to qemu is not
absolute.
The string allocated by generic_qmp_test via safe_strdup is passed to
parse_has_kvm but never freed afterwards, leaking memory on every KVM
capability check. Use CLEANUP_FREE to ensure automatic cleanup.
Co-authored-by: Claude <noreply@anthropic.com>
Qemu will refuse to start on some (all?) architectures if firmware
files are missing, or for other reasons, and complain to stderr. Those
messages should be made visible to the user as part of the error message.
Dan mentioned that there is a special machine type ("none") we can use
when querying for KVM. It has no CPU, memory, etc and does not run,
but you can still enable KVM for it.
Note we have to remove the -cpu parameter, otherwise qemu prints this
error:
qemu-kvm: apic-id property was not initialized properly
Updates: commit 5da8102f5f
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
When checking for QMP properties, we run a qemu command and interact
with the QMP console. However we also start the guest running (there
is no actual guest in this situation). This occasionally causes this
line to be printed:
libguestfs: generic_qmp_test: 3: {"timestamp": {"seconds": 1768823946, "microseconds": 898287}, "event": "GUEST_PANICKED", "data": {"action": "poweroff", "info": {"core": 0, "psw-addr": 0, "reason": "disabled-wait", "psw-mask": 562956395872256, "type": "s390"}}}\r\n
which confuses our parser.
As there is no reason to start the non-existent guest, add the -S
option which causes qemu to start up in a paused state.
For some reason this only happens on s390x but I think it could happen
on all architectures, so it may be a timing issue or something
particular about s390x firmware.
Commit 669eda1e24 ("lib/launch-direct.c: Simplify test for KVM, remove
qemu caching") made it so that failure of generic_qmp_test() will
cause launch to fail. However we still used debug messages to print
the error, so unless you have debugging enabled you wouldn't see any
error message if this function fails.
Updates: commit 669eda1e24
On RHEL 10.2 aarch64 (only) we see:
$ echo '{ "execute": "qmp_capabilities" }' '{ "execute": "query-kvm" }' '{ "execute": "quit" }' | QEMU_AUDIO_DRV=none "/usr/libexec/qemu-kvm" -display none -machine "virt,accel=kvm:hvf:tcg" -qmp stdio
qemu-kvm: invalid accelerator hvf
qemu-kvm: falling back to KVM
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 9}, "package": "qemu-kvm-9.1.0-15.el10_0.4"}, "capabilities": ["oob"]}}
qemu-kvm: warning: CPU model cortex-a57-arm-cpu is deprecated -- use 'host' / 'max'
qemu-kvm: kvm_init_vcpu: kvm_arch_init_vcpu failed (0): Invalid argument
Unfortunately we cannot use guestfs_int_get_cpu_model (as that
requires us to already know if KVM is supported), so we just have to
guess that -cpu max will work, at least enough for QMP to work.
Fixes: https://issues.redhat.com/browse/RHEL-121076
Reported-by: Xiang Hua Chen
Previously we tested if KVM was available, and cached that for the
qemu binary. I think this was actually wrong. For example, if the
machine restarts, then the cache is still around, but KVM might be
enabled or disabled because of a new host kernel.
In any case, let's radically simplify this.
Test for KVM on each run. Consequently we can remove all the qemu
test caching stuff as it is no longer used anywhere.
I also tightened up the code that runs the QMP query-kvm command, so
now any unexpected output will cause a runtime failure. This command
ought to work, and if it breaks we ought to know about it and fix it.
Previously we ran 'qemu -device ?' to list devices, using the output
to test if qemu supported particular devices. This was extensively
used in the past, but after recent commits there are no further users
of this code, so we can remove it all completely.
Previously we parsed qemu -help output to get the version of qemu and
to test for other features. After prior commits, this is no longer
done, so we can remove the qemu -help invocation and associated code
completely.
This function is now only used in one place, to print the version of
qemu in direct mode, when debugging is enabled.
Remove this function and replace with a direct command invocation
('qemu --version'). We only need to run this command when debugging
is enabled, and we copy all of the output to the debug channel.
I have made the assumption here that qemu -version does not try to
create a display device. (The previous invocation of qemu -help
actually ran 'qemu -display none -help' indicating that this was not
always the case.)
This is actually an improvement on before, since now we get to see the
full output of 'qemu --version'. The new output looks like:
libguestfs: begin testing qemu features
libguestfs: command: run: /usr/bin/qemu-kvm
libguestfs: command: run: \ -version
libguestfs: qemu: QEMU emulator version 10.1.0 (qemu-10.1.0-8.fc44)
libguestfs: qemu: Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
We used to read the QMP schema so we could see if the binary supported
qemu mandatory locking. However after commit 47857751a7 ("lib:
direct: Remove test for qemu mandatory locking") this is dead code, so
remove it.
Updates: commit 47857751a7
We tested for QEMU >= 2.10 support for mandatory locking. I believe
this is for all practical purposes always enabled now (and qemu 2.10
is ancient history) so simply assume it's true always.
All recent compilers support this (except MS compilers which we don't
care about). Assume it is supported. We test it in ./configure and
hard fail if it doesn't work.
We still define HAVE_ATTRIBUTE_CLEANUP but you can now assume it is
always defined and don't have to check it.
This was only theoretically supported, via curl. It's unlikely that
it really worked as it was never tested.
If needed it's better to use nbdkit-curl-plugin instead (this applies
to http and ftp as well).
Run this command across the source:
perl -pi.bak -e 's/(20[012][0-9])-20[12][012]/$1-2023/g' `git ls-files`
and remove changes to po{,-docs}/*.po{,t} (these will be regenerated
later when we run 'make dist').
These were added for GCC 11. The problem has been fixed in GCC 12.
On macOS (clang) these produced errors like this:
tsk.c:75:32: error: unknown warning group '-Wanalyzer-file-leak', ignored [-Werror,-Wunknown-warning-option]
^
Schema parsing was failing with errors such as:
libguestfs: QMP parse error: '[' or '{' expected near end of file (ignored)
This happened because the QMP command was actually completely failing
and never printing a result at all. This happens because the qemu
audio driver can't be set up without a console. We can suppress this
by setting the environment variable QEMU_AUDIO_DRV=none, which is the
same thing that libvirt does, and also the same thing that we are
already doing when launching the real appliance subprocess.
See also: https://bugzilla.redhat.com/show_bug.cgi?id=1692047
When using the direct backend, you should see the result of testing
the qemu binary for the availability of KVM:
libguestfs: qemu KVM: enabled
Thanks: Andrea Bolognani.
GCC 8 thinks that the case drive_protocol_gluster may fall through, most
probably because the only code is a switch case that handles the
elements of an enum, and thus letting other fall through. In reality
this ought to not happen at all, so help GCC by break'ing the case,
which will then lead to the abort() at the end of
guestfs_int_drive_source_qemu_param.
While YAJL mostly works fine, it did not see any active development in
the last 3 years. OTOH, Jansson is another JSON C implementation, with
a very liberal license, and a much nicer API.
Hence, switch all of libguestfs from YAJL to Jansson:
- configure checks, and buildsystem in general
- packages pulled in the appliance
- actual implementations
- contrib scripts
- documentation
This also makes use of the better APIs available (e.g. json_object_get,
json_array_foreach, and json_object_foreach). This does not change the
API of our OCaml Yajl module.
QEMU >= 2.10 started to do mandatory locking. This checks the QMP
schema to see if we are using that version of qemu. (Note it is
sometimes disabled in downstream builds, and it was also enabled in
upstream prereleases with version 2.9.9x, so we cannot just check the
version number).
Rename the cache files like ‘qemu.stat’ etc so they include the qemu
binary "key" (ie. size and mtime) in the name. This allows a single
user to use multiple qemu binaries in parallel without conflicts.
This adds an extra test using QMP (the QEMU Monitor Protocol). This
allows us to get extra information about the qemu binary beyond what
is available from the version number or ‘qemu -help’.
The previous code duplicated a lot of common code for reading and
writing the cache file per data field. This change simply factors out
that common code. This makes it simpler to add new tests in future.
This is just refactoring, it should have no effect.
Rather unnecessarily this function returned the parsed qemu version.
This complicates further refactoring, so I have changed the function
not to return this, and instead there is a separate function you have
to call to get the version struct (‘guestfs_int_qemu_version’).
Apart from a tiny amount of extra copying this is simply refactoring
of the interface between the direct-mode backend and the qemu query
functions.
virtio-scsi has been supported in qemu since 2012, and it is superior
in every respect to virtio-blk. There's no reason to still be using
virtio-blk.
virtio-scsi support was initially added in 2012
(commit 0c0a7d0d86).
You can still use virtio-blk using the (deprecated) iface parameter,
but don't do that in new code.
Only in end-user messages and documentation. This change was done
mostly mechanically using the Perl script attached below.
I also changed don't -> don’t etc and made some other simple fixes.
See also: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
----------
#!/usr/bin/perl -w
use strict;
use Locale::PO;
my $re = qr{'([-\w%.,=?*/]+)'};
my %files = ();
foreach my $filename ("po/libguestfs.pot", "po-docs/libguestfs-docs.pot") {
my $poref = Locale::PO->load_file_asarray($filename);
foreach my $po (@$poref) {
if ($po->msgid =~ $re) {
my @refs = split /\s+/, $po->reference;
foreach my $ref (@refs) {
my ($file, $lineno) = split /:/, $ref, 2;
$file =~ s{^\.\./}{};
if (exists $files{$file}) {
push @{$files{$file}}, $lineno;
} else {
$files{$file} = [$lineno];
}
}
}
}
}
foreach my $file (sort keys %files) {
unless (-w $file) {
warn "warning: $file is probably generated\n"; # have to edit generator
next;
}
my @lines = sort { $a <=> $b } @{$files{$file}};
#print "editing $file at lines ", join (", ", @lines), " ...\n";
open FILE, "<$file" or die "$file: $!";
my @all = ();
push @all, $_ while <FILE>;
close FILE;
my $ext = $file;
$ext =~ s/^.*\.//;
foreach (@lines) {
# Don't mess with verbatim sections in POD files.
next if $ext eq "pod" && $all[$_-1] =~ m/^ /;
unless ($all[$_-1] =~ $re) {
# this can happen for multi-line strings, have to edit it
# by hand
warn "warning: $file:$_ does not contain expected content\n";
next;
}
$all[$_-1] =~ s/$re/‘$1’/g;
}
rename "$file", "$file.bak";
open FILE, ">$file" or die "$file: $!";
print FILE $_ for @all;
close FILE;
my $mode = (stat ("$file.bak"))[2];
chmod ($mode & 0777, "$file");
}
Create own blocks for all the parts dealing with FILE*: this way there
is no need to recycle the same FILE* variable for all the operations,
and have each block its own variable automatically cleaned up.
This also fixes a potential undefined behaviour on error: POSIX says
that after a call fclose(), a FILE* cannot be used anymore, not even
on fclose() failure. The previous behaviour for fclose == -1 was to jump
to the error label, which would then try to call fclose() again (since
the FILE* pointer was still non-null).