This change was done almost entirely automatically using the script
below. This uses the OCaml lexer to read the source files and extract
the strings and locations. Strings which are "candidates" (in this
case, longer than 3 lines) are replaced in the output with quoted
string literals.
Since the OCaml lexer is used, it already substitutes all escape
sequences correctly. I diffed the output of the generator and it is
identical after this change, except for UUIDs, which change because of
how Utils.stable_uuid is implemented.
Thanks: Nicolas Ojeda Bar
$ ocamlfind opt -package unix,compiler-libs.common find_strings.ml \
-o find_strings.opt -linkpkg
$ for f in $( git ls-files -- \*.ml ) ; do ./find_strings.opt $f ; done
open Printf
let read_whole_file path =
let buf = Buffer.create 16384 in
let chan = open_in path in
let maxlen = 16384 in
let b = Bytes.create maxlen in
let rec loop () =
let r = input chan b 0 maxlen in
if r > 0 then (
Buffer.add_substring buf (Bytes.to_string b) 0 r;
loop ()
)
in
loop ();
close_in chan;
Buffer.contents buf
let count_chars c str =
let count = ref 0 in
for i = 0 to String.length str - 1 do
if c = String.unsafe_get str i then incr count
done;
!count
let subs = ref []
let consider_string str loc =
let nr_lines = count_chars '\n' str in
if nr_lines > 3 then
subs := (str, loc) :: !subs
let () =
Lexer.init ();
let filename = Sys.argv.(1) in
let content = read_whole_file filename in
let lexbuf = Lexing.from_string content in
let rec loop () =
let token = Lexer.token lexbuf in
(match token with
| Parser.EOF -> ();
| STRING (s, loc, sopt) ->
consider_string s loc; (* sopt? *)
loop ();
| token ->
loop ();
)
in
loop ();
(* The list of subs is already reversed, which is convenient
* because we must the file substitutions in reverse order.
*)
let subs = !subs in
let new_content = ref content in
List.iter (
fun (str, loc) ->
let { Location.loc_start = { pos_cnum = p1 };
loc_end = { pos_cnum = p2 } } = loc in
let len = String.length !new_content in
let before = String.sub !new_content 0 (p1-1) in
let after = String.sub !new_content (p2+1) (len - p2 - 1) in
new_content := before ^ "{|" ^ str ^ "|}" ^ after
) subs;
let new_content = !new_content in
if content <> new_content then (
(* Update the file in place. *)
let new_filename = filename ^ ".new"
and backup_filename = filename ^ ".bak" in
let chan = open_out new_filename in
fprintf chan "%s" new_content;
close_out chan;
Unix.rename filename backup_filename;
Unix.rename new_filename filename
)
Run this command across the source:
perl -pi.bak -e 's/(20[012][0-9])-20[12][012]/$1-2023/g' `git ls-files`
and remove changes to po{,-docs}/*.po{,t} (these will be regenerated
later when we run 'make dist').
Previously we had lots of types like String, Device, StringList,
DeviceList, etc. where Device was just a String with magical
properties (but only inside the daemon), and DeviceList was just a
list of Device strings.
Replace these with some simple top-level types:
String
StringList
and move the magic into a subtype.
The change is mechanical, for example:
old ---> new
FileIn "filename" String (FileIn, "filename")
DeviceList "devices" StringList (Device, "devices")
Handling BufferIn is sufficiently different from a plain String
throughout all the bindings that it still uses a top-level type.
(Compare with FileIn/FileOut where the only difference is in the
protocol, but the bindings can uniformly treat it as a plain String.)
There is no semantic change, and the generated files are identical
except for a minor change in the (deprecated) Perl
%guestfs_introspection table.
Only in end-user messages and documentation. This change was done
mostly mechanically using the Perl script attached below.
I also changed don't -> don’t etc and made some other simple fixes.
See also: https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
----------
#!/usr/bin/perl -w
use strict;
use Locale::PO;
my $re = qr{'([-\w%.,=?*/]+)'};
my %files = ();
foreach my $filename ("po/libguestfs.pot", "po-docs/libguestfs-docs.pot") {
my $poref = Locale::PO->load_file_asarray($filename);
foreach my $po (@$poref) {
if ($po->msgid =~ $re) {
my @refs = split /\s+/, $po->reference;
foreach my $ref (@refs) {
my ($file, $lineno) = split /:/, $ref, 2;
$file =~ s{^\.\./}{};
if (exists $files{$file}) {
push @{$files{$file}}, $lineno;
} else {
$files{$file} = [$lineno];
}
}
}
}
}
foreach my $file (sort keys %files) {
unless (-w $file) {
warn "warning: $file is probably generated\n"; # have to edit generator
next;
}
my @lines = sort { $a <=> $b } @{$files{$file}};
#print "editing $file at lines ", join (", ", @lines), " ...\n";
open FILE, "<$file" or die "$file: $!";
my @all = ();
push @all, $_ while <FILE>;
close FILE;
my $ext = $file;
$ext =~ s/^.*\.//;
foreach (@lines) {
# Don't mess with verbatim sections in POD files.
next if $ext eq "pod" && $all[$_-1] =~ m/^ /;
unless ($all[$_-1] =~ $re) {
# this can happen for multi-line strings, have to edit it
# by hand
warn "warning: $file:$_ does not contain expected content\n";
next;
}
$all[$_-1] =~ s/$re/‘$1’/g;
}
rename "$file", "$file.bak";
open FILE, ">$file" or die "$file: $!";
print FILE $_ for @all;
close FILE;
my $mode = (stat ("$file.bak"))[2];
chmod ($mode & 0777, "$file");
}
Daemon 'proc_nr's have to be assigned monotonically and uniquely to
each daemon function. However in practice it can be difficult to work
out which is the next free proc_nr. Placing all of them into a single
table in a new file (proc_nr.ml) should make this easier.
Group the APIs logically and move them into new modules:
Actions_core:
Core APIs and anything that doesn't fit into another group, eg. launch.
(With some more effort this could be split further.)
Actions_augeas:
Augeas APIs, eg. aug-init.
Actions_debug:
Debug APIs.
Actions_hivex:
Hivex APIs, eg. hivex-open.
Actions_inspection:
Inspection APIs, eg. inspect-get-type.
Actions_properties:
Handle properties, eg. set-hv, get-hv.
Actions_tsk:
SleuthKit APIs, eg. filesystem-walk.
*_deprecated:
All of the above modules have deprecated variants, where we
place the deprecated actions.