mirror of
https://github.com/libguestfs/libguestfs.git
synced 2026-03-21 22:53:37 +00:00
docs: Move architecture and internals documentation to guestfs-internals(1).
This commit is contained in:
@@ -20,6 +20,7 @@ include $(top_srcdir)/subdir-rules.mk
|
||||
EXTRA_DIST = \
|
||||
guestfs-faq.pod \
|
||||
guestfs-hacking.pod \
|
||||
guestfs-internals.pod \
|
||||
guestfs-performance.pod \
|
||||
guestfs-recipes.pod \
|
||||
guestfs-release-notes.pod \
|
||||
@@ -29,6 +30,7 @@ EXTRA_DIST = \
|
||||
CLEANFILES = \
|
||||
stamp-guestfs-faq.pod \
|
||||
stamp-guestfs-hacking.pod \
|
||||
stamp-guestfs-internals.pod \
|
||||
stamp-guestfs-performance.pod \
|
||||
stamp-guestfs-recipes.pod \
|
||||
stamp-guestfs-release-notes.pod \
|
||||
@@ -37,6 +39,7 @@ CLEANFILES = \
|
||||
man_MANS = \
|
||||
guestfs-faq.1 \
|
||||
guestfs-hacking.1 \
|
||||
guestfs-internals.1 \
|
||||
guestfs-performance.1 \
|
||||
guestfs-recipes.1 \
|
||||
guestfs-release-notes.1 \
|
||||
@@ -44,6 +47,7 @@ man_MANS = \
|
||||
noinst_DATA = \
|
||||
$(top_builddir)/html/guestfs-faq.1.html \
|
||||
$(top_builddir)/html/guestfs-hacking.1.html \
|
||||
$(top_builddir)/html/guestfs-internals.1.html \
|
||||
$(top_builddir)/html/guestfs-performance.1.html \
|
||||
$(top_builddir)/html/guestfs-recipes.1.html \
|
||||
$(top_builddir)/html/guestfs-release-notes.1.html \
|
||||
@@ -71,6 +75,17 @@ stamp-guestfs-hacking.pod: guestfs-hacking.pod
|
||||
$<
|
||||
touch $@
|
||||
|
||||
guestfs-internals.1 $(top_builddir)/html/guestfs-internals.1.html: stamp-guestfs-internals.pod
|
||||
|
||||
stamp-guestfs-internals.pod: guestfs-internals.pod
|
||||
$(PODWRAPPER) \
|
||||
--section 1 \
|
||||
--man guestfs-internals.1 \
|
||||
--html $(top_builddir)/html/guestfs-internals.1.html \
|
||||
--license LGPLv2+ \
|
||||
$<
|
||||
touch $@
|
||||
|
||||
guestfs-performance.1 $(top_builddir)/html/guestfs-performance.1.html: stamp-guestfs-performance.pod
|
||||
|
||||
stamp-guestfs-performance.pod: guestfs-performance.pod
|
||||
|
||||
@@ -990,7 +990,7 @@ F<examples/debug-logging.c> program in the libguestfs sources.
|
||||
=head2 Digging deeper into the appliance boot process.
|
||||
|
||||
Enable debugging and then read this documentation on the appliance
|
||||
boot process: L<guestfs(3)/INTERNALS>.
|
||||
boot process: L<guestfs-internals(1)>.
|
||||
|
||||
=head2 libguestfs hangs or fails during run/launch.
|
||||
|
||||
@@ -1015,6 +1015,8 @@ useful debugging information from libvirtd in F</tmp/libvirtd.log>
|
||||
|
||||
=head1 DESIGN/INTERNALS OF LIBGUESTFS
|
||||
|
||||
See also L<guestfs-internals(1)>.
|
||||
|
||||
=head2 Why don't you do everything through the FUSE / filesystem
|
||||
interface?
|
||||
|
||||
|
||||
@@ -735,6 +735,7 @@ Create the branch in git:
|
||||
|
||||
L<guestfs(3)>,
|
||||
L<guestfs-examples(3)>,
|
||||
L<guestfs-internals(3)>,
|
||||
L<guestfs-performance(1)>,
|
||||
L<guestfs-release-notes(1)>,
|
||||
L<guestfs-testing(1)>,
|
||||
|
||||
415
docs/guestfs-internals.pod
Normal file
415
docs/guestfs-internals.pod
Normal file
@@ -0,0 +1,415 @@
|
||||
=head1 NAME
|
||||
|
||||
guestfs-internals - architecture and internals of libguestfs
|
||||
|
||||
=head1 DESCRIPTION
|
||||
|
||||
This manual page is for hackers who want to understand how libguestfs
|
||||
works internally. This is just a description of how libguestfs works
|
||||
now, and it may change at any time in the future.
|
||||
|
||||
=head1 ARCHITECTURE
|
||||
|
||||
Internally, libguestfs is implemented by running an appliance (a
|
||||
special type of small virtual machine) using L<qemu(1)>. Qemu runs as
|
||||
a child process of the main program.
|
||||
|
||||
┌───────────────────┐
|
||||
│ main program │
|
||||
│ │
|
||||
│ │ child process / appliance
|
||||
│ │ ┌──────────────────────────┐
|
||||
│ │ │ qemu │
|
||||
├───────────────────┤ RPC │ ┌─────────────────┐ │
|
||||
│ libguestfs ◀╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍▶ guestfsd │ │
|
||||
│ │ │ ├─────────────────┤ │
|
||||
└───────────────────┘ │ │ Linux kernel │ │
|
||||
│ └────────┬────────┘ │
|
||||
└───────────────│──────────┘
|
||||
│
|
||||
│ virtio-scsi
|
||||
┌──────┴──────┐
|
||||
│ Device or │
|
||||
│ disk image │
|
||||
└─────────────┘
|
||||
|
||||
The library, linked to the main program, creates the child process and
|
||||
hence the appliance in the L</guestfs_launch> function.
|
||||
|
||||
Inside the appliance is a Linux kernel and a complete stack of
|
||||
userspace tools (such as LVM and ext2 programs) and a small
|
||||
controlling daemon called L</guestfsd>. The library talks to
|
||||
L</guestfsd> using remote procedure calls (RPC). There is a mostly
|
||||
one-to-one correspondence between libguestfs API calls and RPC calls
|
||||
to the daemon. Lastly the disk image(s) are attached to the qemu
|
||||
process which translates device access by the appliance's Linux kernel
|
||||
into accesses to the image.
|
||||
|
||||
A common misunderstanding is that the appliance "is" the virtual
|
||||
machine. Although the disk image you are attached to might also be
|
||||
used by some virtual machine, libguestfs doesn't know or care about
|
||||
this. (But you will care if both libguestfs's qemu process and your
|
||||
virtual machine are trying to update the disk image at the same time,
|
||||
since these usually results in massive disk corruption).
|
||||
|
||||
=head1 STATE MACHINE
|
||||
|
||||
libguestfs uses a state machine to model the child process:
|
||||
|
||||
|
|
||||
guestfs_create / guestfs_create_flags
|
||||
|
|
||||
|
|
||||
____V_____
|
||||
/ \
|
||||
| CONFIG |
|
||||
\__________/
|
||||
^ ^ \
|
||||
| \ \ guestfs_launch
|
||||
| _\__V______
|
||||
| / \
|
||||
| | LAUNCHING |
|
||||
| \___________/
|
||||
| /
|
||||
| guestfs_launch
|
||||
| /
|
||||
__|____V
|
||||
/ \
|
||||
| READY |
|
||||
\________/
|
||||
|
||||
The normal transitions are (1) CONFIG (when the handle is created, but
|
||||
there is no child process), (2) LAUNCHING (when the child process is
|
||||
booting up), (3) READY meaning the appliance is up, actions can be
|
||||
issued to, and carried out by, the child process.
|
||||
|
||||
The guest may be killed by L</guestfs_kill_subprocess>, or may die
|
||||
asynchronously at any time (eg. due to some internal error), and that
|
||||
causes the state to transition back to CONFIG.
|
||||
|
||||
Configuration commands for qemu such as L</guestfs_set_path> can only
|
||||
be issued when in the CONFIG state.
|
||||
|
||||
The API offers one call that goes from CONFIG through LAUNCHING to
|
||||
READY. L</guestfs_launch> blocks until the child process is READY to
|
||||
accept commands (or until some failure or timeout).
|
||||
L</guestfs_launch> internally moves the state from CONFIG to LAUNCHING
|
||||
while it is running.
|
||||
|
||||
API actions such as L</guestfs_mount> can only be issued when in the
|
||||
READY state. These API calls block waiting for the command to be
|
||||
carried out. There are no non-blocking versions, and no way to issue
|
||||
more than one command per handle at the same time.
|
||||
|
||||
Finally, the child process sends asynchronous messages back to the
|
||||
main program, such as kernel log messages. You can register a
|
||||
callback to receive these messages.
|
||||
|
||||
=head1 INTERNALS
|
||||
|
||||
=head2 APPLIANCE BOOT PROCESS
|
||||
|
||||
This process has evolved and continues to evolve. The description
|
||||
here corresponds only to the current version of libguestfs and is
|
||||
provided for information only.
|
||||
|
||||
In order to follow the stages involved below, enable libguestfs
|
||||
debugging (set the environment variable C<LIBGUESTFS_DEBUG=1>).
|
||||
|
||||
=over 4
|
||||
|
||||
=item Create the appliance
|
||||
|
||||
C<supermin --build> is invoked to create the kernel, a small initrd
|
||||
and the appliance.
|
||||
|
||||
The appliance is cached in F</var/tmp/.guestfs-E<lt>UIDE<gt>> (or in
|
||||
another directory if C<LIBGUESTFS_CACHEDIR> or C<TMPDIR> are set).
|
||||
|
||||
For a complete description of how the appliance is created and cached,
|
||||
read the L<supermin(1)> man page.
|
||||
|
||||
=item Start qemu and boot the kernel
|
||||
|
||||
qemu is invoked to boot the kernel.
|
||||
|
||||
=item Run the initrd
|
||||
|
||||
C<supermin --build> builds a small initrd. The initrd is not the
|
||||
appliance. The purpose of the initrd is to load enough kernel modules
|
||||
in order that the appliance itself can be mounted and started.
|
||||
|
||||
The initrd is a cpio archive called
|
||||
F</var/tmp/.guestfs-E<lt>UIDE<gt>/appliance.d/initrd>.
|
||||
|
||||
When the initrd has started you will see messages showing that kernel
|
||||
modules are being loaded, similar to this:
|
||||
|
||||
supermin: ext2 mini initrd starting up
|
||||
supermin: mounting /sys
|
||||
supermin: internal insmod libcrc32c.ko
|
||||
supermin: internal insmod crc32c-intel.ko
|
||||
|
||||
=item Find and mount the appliance device
|
||||
|
||||
The appliance is a sparse file containing an ext2 filesystem which
|
||||
contains a familiar (although reduced in size) Linux operating system.
|
||||
It would normally be called
|
||||
F</var/tmp/.guestfs-E<lt>UIDE<gt>/appliance.d/root>.
|
||||
|
||||
The regular disks being inspected by libguestfs are the first
|
||||
devices exposed by qemu (eg. as F</dev/vda>).
|
||||
|
||||
The last disk added to qemu is the appliance itself (eg. F</dev/vdb>
|
||||
if there was only one regular disk).
|
||||
|
||||
Thus the final job of the initrd is to locate the appliance disk,
|
||||
mount it, and switch root into the appliance, and run F</init> from
|
||||
the appliance.
|
||||
|
||||
If this works successfully you will see messages such as:
|
||||
|
||||
supermin: picked /sys/block/vdb/dev as root device
|
||||
supermin: creating /dev/root as block special 252:16
|
||||
supermin: mounting new root on /root
|
||||
supermin: chroot
|
||||
Starting /init script ...
|
||||
|
||||
Note that C<Starting /init script ...> indicates that the appliance's
|
||||
init script is now running.
|
||||
|
||||
=item Initialize the appliance
|
||||
|
||||
The appliance itself now initializes itself. This involves starting
|
||||
certain processes like C<udev>, possibly printing some debug
|
||||
information, and finally running the daemon (C<guestfsd>).
|
||||
|
||||
=item The daemon
|
||||
|
||||
Finally the daemon (C<guestfsd>) runs inside the appliance. If it
|
||||
runs you should see:
|
||||
|
||||
verbose daemon enabled
|
||||
|
||||
The daemon expects to see a named virtio-serial port exposed by qemu
|
||||
and connected on the other end to the library.
|
||||
|
||||
The daemon connects to this port (and hence to the library) and sends
|
||||
a four byte message C<GUESTFS_LAUNCH_FLAG>, which initiates the
|
||||
communication protocol (see below).
|
||||
|
||||
=back
|
||||
|
||||
=head2 COMMUNICATION PROTOCOL
|
||||
|
||||
Don't rely on using this protocol directly. This section documents
|
||||
how it currently works, but it may change at any time.
|
||||
|
||||
The protocol used to talk between the library and the daemon running
|
||||
inside the qemu virtual machine is a simple RPC mechanism built on top
|
||||
of XDR (RFC 1014, RFC 1832, RFC 4506).
|
||||
|
||||
The detailed format of structures is in F<src/guestfs_protocol.x>
|
||||
(note: this file is automatically generated).
|
||||
|
||||
There are two broad cases, ordinary functions that don't have any
|
||||
C<FileIn> and C<FileOut> parameters, which are handled with very
|
||||
simple request/reply messages. Then there are functions that have any
|
||||
C<FileIn> or C<FileOut> parameters, which use the same request and
|
||||
reply messages, but they may also be followed by files sent using a
|
||||
chunked encoding.
|
||||
|
||||
=head3 ORDINARY FUNCTIONS (NO FILEIN/FILEOUT PARAMS)
|
||||
|
||||
For ordinary functions, the request message is:
|
||||
|
||||
total length (header + arguments,
|
||||
but not including the length word itself)
|
||||
struct guestfs_message_header (encoded as XDR)
|
||||
struct guestfs_<foo>_args (encoded as XDR)
|
||||
|
||||
The total length field allows the daemon to allocate a fixed size
|
||||
buffer into which it slurps the rest of the message. As a result, the
|
||||
total length is limited to C<GUESTFS_MESSAGE_MAX> bytes (currently
|
||||
4MB), which means the effective size of any request is limited to
|
||||
somewhere under this size.
|
||||
|
||||
Note also that many functions don't take any arguments, in which case
|
||||
the C<guestfs_I<foo>_args> is completely omitted.
|
||||
|
||||
The header contains the procedure number (C<guestfs_proc>) which is
|
||||
how the receiver knows what type of args structure to expect, or none
|
||||
at all.
|
||||
|
||||
For functions that take optional arguments, the optional arguments are
|
||||
encoded in the C<guestfs_I<foo>_args> structure in the same way as
|
||||
ordinary arguments. A bitmask in the header indicates which optional
|
||||
arguments are meaningful. The bitmask is also checked to see if it
|
||||
contains bits set which the daemon does not know about (eg. if more
|
||||
optional arguments were added in a later version of the library), and
|
||||
this causes the call to be rejected.
|
||||
|
||||
The reply message for ordinary functions is:
|
||||
|
||||
total length (header + ret,
|
||||
but not including the length word itself)
|
||||
struct guestfs_message_header (encoded as XDR)
|
||||
struct guestfs_<foo>_ret (encoded as XDR)
|
||||
|
||||
As above the C<guestfs_I<foo>_ret> structure may be completely omitted
|
||||
for functions that return no formal return values.
|
||||
|
||||
As above the total length of the reply is limited to
|
||||
C<GUESTFS_MESSAGE_MAX>.
|
||||
|
||||
In the case of an error, a flag is set in the header, and the reply
|
||||
message is slightly changed:
|
||||
|
||||
total length (header + error,
|
||||
but not including the length word itself)
|
||||
struct guestfs_message_header (encoded as XDR)
|
||||
struct guestfs_message_error (encoded as XDR)
|
||||
|
||||
The C<guestfs_message_error> structure contains the error message as a
|
||||
string.
|
||||
|
||||
=head3 FUNCTIONS THAT HAVE FILEIN PARAMETERS
|
||||
|
||||
A C<FileIn> parameter indicates that we transfer a file I<into> the
|
||||
guest. The normal request message is sent (see above). However this
|
||||
is followed by a sequence of file chunks.
|
||||
|
||||
total length (header + arguments,
|
||||
but not including the length word itself,
|
||||
and not including the chunks)
|
||||
struct guestfs_message_header (encoded as XDR)
|
||||
struct guestfs_<foo>_args (encoded as XDR)
|
||||
sequence of chunks for FileIn param #0
|
||||
sequence of chunks for FileIn param #1 etc.
|
||||
|
||||
The "sequence of chunks" is:
|
||||
|
||||
length of chunk (not including length word itself)
|
||||
struct guestfs_chunk (encoded as XDR)
|
||||
length of chunk
|
||||
struct guestfs_chunk (encoded as XDR)
|
||||
...
|
||||
length of chunk
|
||||
struct guestfs_chunk (with data.data_len == 0)
|
||||
|
||||
The final chunk has the C<data_len> field set to zero. Additionally a
|
||||
flag is set in the final chunk to indicate either successful
|
||||
completion or early cancellation.
|
||||
|
||||
At time of writing there are no functions that have more than one
|
||||
FileIn parameter. However this is (theoretically) supported, by
|
||||
sending the sequence of chunks for each FileIn parameter one after
|
||||
another (from left to right).
|
||||
|
||||
Both the library (sender) I<and> the daemon (receiver) may cancel the
|
||||
transfer. The library does this by sending a chunk with a special
|
||||
flag set to indicate cancellation. When the daemon sees this, it
|
||||
cancels the whole RPC, does I<not> send any reply, and goes back to
|
||||
reading the next request.
|
||||
|
||||
The daemon may also cancel. It does this by writing a special word
|
||||
C<GUESTFS_CANCEL_FLAG> to the socket. The library listens for this
|
||||
during the transfer, and if it gets it, it will cancel the transfer
|
||||
(it sends a cancel chunk). The special word is chosen so that even if
|
||||
cancellation happens right at the end of the transfer (after the
|
||||
library has finished writing and has started listening for the reply),
|
||||
the "spurious" cancel flag will not be confused with the reply
|
||||
message.
|
||||
|
||||
This protocol allows the transfer of arbitrary sized files (no 32 bit
|
||||
limit), and also files where the size is not known in advance
|
||||
(eg. from pipes or sockets). However the chunks are rather small
|
||||
(C<GUESTFS_MAX_CHUNK_SIZE>), so that neither the library nor the
|
||||
daemon need to keep much in memory.
|
||||
|
||||
=head3 FUNCTIONS THAT HAVE FILEOUT PARAMETERS
|
||||
|
||||
The protocol for FileOut parameters is exactly the same as for FileIn
|
||||
parameters, but with the roles of daemon and library reversed.
|
||||
|
||||
total length (header + ret,
|
||||
but not including the length word itself,
|
||||
and not including the chunks)
|
||||
struct guestfs_message_header (encoded as XDR)
|
||||
struct guestfs_<foo>_ret (encoded as XDR)
|
||||
sequence of chunks for FileOut param #0
|
||||
sequence of chunks for FileOut param #1 etc.
|
||||
|
||||
=head3 INITIAL MESSAGE
|
||||
|
||||
When the daemon launches it sends an initial word
|
||||
(C<GUESTFS_LAUNCH_FLAG>) which indicates that the guest and daemon is
|
||||
alive. This is what L</guestfs_launch> waits for.
|
||||
|
||||
=head3 PROGRESS NOTIFICATION MESSAGES
|
||||
|
||||
The daemon may send progress notification messages at any time. These
|
||||
are distinguished by the normal length word being replaced by
|
||||
C<GUESTFS_PROGRESS_FLAG>, followed by a fixed size progress message.
|
||||
|
||||
The library turns them into progress callbacks (see
|
||||
L</GUESTFS_EVENT_PROGRESS>) if there is a callback registered, or
|
||||
discards them if not.
|
||||
|
||||
The daemon self-limits the frequency of progress messages it sends
|
||||
(see C<daemon/proto.c:notify_progress>). Not all calls generate
|
||||
progress messages.
|
||||
|
||||
=head2 FIXED APPLIANCE
|
||||
|
||||
When libguestfs (or libguestfs tools) are run, they search a path
|
||||
looking for an appliance. The path is built into libguestfs, or can
|
||||
be set using the C<LIBGUESTFS_PATH> environment variable.
|
||||
|
||||
Normally a supermin appliance is located on this path (see
|
||||
L<supermin(1)/SUPERMIN APPLIANCE>). libguestfs reconstructs this
|
||||
into a full appliance by running C<supermin --build>.
|
||||
|
||||
However, a simpler "fixed appliance" can also be used. libguestfs
|
||||
detects this by looking for a directory on the path containing all
|
||||
the following files:
|
||||
|
||||
=over 4
|
||||
|
||||
=item * F<kernel>
|
||||
|
||||
=item * F<initrd>
|
||||
|
||||
=item * F<root>
|
||||
|
||||
=item * F<README.fixed> (note that it B<must> be present as well)
|
||||
|
||||
=back
|
||||
|
||||
If the fixed appliance is found, libguestfs skips supermin entirely
|
||||
and just runs the virtual machine (using qemu or the current backend,
|
||||
see L</BACKEND>) with the kernel, initrd and root disk from the fixed
|
||||
appliance.
|
||||
|
||||
Thus the fixed appliance can be used when a platform or a Linux
|
||||
distribution does not support supermin. You build the fixed appliance
|
||||
on a platform that does support supermin using
|
||||
L<libguestfs-make-fixed-appliance(1)>, copy it over, and use that
|
||||
to run libguestfs.
|
||||
|
||||
=head1 SEE ALSO
|
||||
|
||||
L<guestfs(3)>,
|
||||
L<guestfs-hacking(3)>,
|
||||
L<guestfs-examples(3)>,
|
||||
L<libguestfs-test-tool(1)>,
|
||||
L<libguestfs-make-fixed-appliance(1)>,
|
||||
L<http://libguestfs.org/>.
|
||||
|
||||
=head1 AUTHORS
|
||||
|
||||
Richard W.M. Jones (C<rjones at redhat dot com>)
|
||||
|
||||
=head1 COPYRIGHT
|
||||
|
||||
Copyright (C) 2009-2015 Red Hat Inc.
|
||||
@@ -570,6 +570,7 @@ L<supermin(1)>,
|
||||
L<guestfish(1)>,
|
||||
L<guestfs(3)>,
|
||||
L<guestfs-examples(3)>,
|
||||
L<guestfs-internals(1)>,
|
||||
L<libguestfs-make-fixed-appliance(1)>,
|
||||
L<stap(1)>,
|
||||
L<qemu(1)>,
|
||||
|
||||
Reference in New Issue
Block a user