This extra level of locking turned out to be unnecessary.
* guix/scripts/offload.scm (with-machine-lock): Remove.
(machine-lock-file): Remove.
(acquire-build-slot): Remove surrounding 'with-machine-lock'.
This lock was unnecessary and it led to a contention when many 'guix
offload' processes are polling for available machines.
* guix/scripts/offload.scm (machine-choice-lock-file): Remove.
(choose-build-machine): Remove surrounding 'with-file-lock (machine-lock-file)'.
This is a followup to ed7b44370f71126087eb953f36aad8dc4c44109f;
following that commit, 'guix offload test' and 'guix offload status'
would abort with a backtrace instead of clearly diagnosing a missing
'guix' command on the build machine.
* guix/scripts/offload.scm (assert-node-has-guix): Call 'leave' when
NODE is not an inferior. Remove 'catch' blocks for 'node-repl-error'.
(check-machine-availability): Invoke 'assert-node-has-guix' first.
(check-machine-status): Print a warning when 'remote-inferior' returns #f.
Using inferiors and thus 'guix repl' simplifies setup on build
machines (no need to worry about GUILE_LOAD_PATH etc.)
Furthermore, the 'guix repl -t machine' protocol running in a remote
pipe addresses several issues with the current implementation of nodes
and RREPLs in Guile-SSH: fewer round trips, doesn't leave a 'guile
--listen' process behind it, stateless (since a new process is started
each time), more efficient (the SSH channel can be reused), more
reliable (no 'pgrep', 'pkill', and shellology; see
<https://github.com/artyom-poptsov/guile-ssh/issues/11> as an example.)
* guix/ssh.scm (inferior-remote-eval): New procedure.
(send-files): Use it instead of 'make-node' and 'node-eval'.
* guix/scripts/offload.scm (node-guile-version): New procedure.
(node-free-disk-space, transfer-and-offload, node-load)
(choose-build-machine, assert-node-has-guix): Use 'remote-inferior'
instead of 'make-node' and 'inferior-eval' instead of 'node-eval'.
(assert-node-can-import, assert-node-can-export): Likewise, and add
'session' parameter.
(check-machine-availability): Likewise, and add calls to
'close-inferior' and 'disconnect!'.
(check-machine-status): Likewise.
* doc/guix.texi (Daemon Offload Setup): Remove bit related to 'guile' in
$PATH and $GUILE_LOAD_PATH; mention 'guix' alone.
Fixes a regression introduced in
bbe66a530a whereby the actual
load (non-normalized) would be displayed.
* guix/scripts/offload.scm (check-machine-status): Add call to
'normalized-load'.
Previously, if a remote build would fail due to lack of disk space, this
would be considered a permanent failure and thus cached as a build
failure if the local daemon runs with '--cache-failures'.
* guix/scripts/offload.scm (transfer-and-offload): Upon
'nix-protocol-error?' call 'node-free-disk-space' and return 1 instead
of 100 if the result if lower than 10 MiB.
Fixes <https://bugs.gnu.org/33378>.
* guix/scripts/offload.scm (node-free-disk-space): New procedure.
(%minimum-disk-space): New variable.
(choose-build-machine): Call 'node-free-disk-space' and take it into
account in addition to LOAD.
(check-machine-status): Display the free disk space.
* guix/scripts/offload.scm (machine-load): Remove.
(node-load, normalized-load): New procedures.
(choose-build-machine): Call 'open-ssh-session' and 'make-node' from
here; pass the node to 'node-load'.
(check-machine-status): Use 'node-load' instead of 'machine-load'. Call
'disconnect!' on SESSION.
* guix/scripts/offload.scm (call-with-timeout): New procedure.
(with-timeout): New macro.
(process-request): Use it around 'transfer-and-offload' call.
Previously we were looking at the load of the past 5 minutes, which
means that, after a build, we could end up waiting for 5 minutes for
that metric to be low enough.
* guix/scripts/offload.scm (machine-load): Compute RAW based on ONE, not
FIVE.
This fixes a regression in 'retrieve-files*' introduced in
896fec476f, whereby (guix scripts offload)
would not read the initial sexp now sent by the remote host via
'store-export-channel'. This would effectively prevent file retrieval
entirely when offloading.
* guix/ssh.scm (retrieve-files*): New procedure, like former
'retrieve-files' but with an extra #:import parameter.
(retrieve-files): Rewrite in terms of 'retrieve-files*'.
(file-retrieval-port): Make private.
* guix/scripts/offload.scm (transfer-and-offload): Pass #:import to
'retrieve-files*'.
(retrieve-files*): Remove.
* guix/scripts/offload.scm (check-machine-status): New procedure.
(guix-offload): Call it when the argument is "status".
* doc/guix.texi (Daemon Offload Setup): Document it.
* guix/scripts/offload.scm (build-machines): Comment out
'(set! %fresh-auto-compile #t)' since with Guile 2.2.3 it could lead to
an actual rebuild of everything that gets loaded from there on. See
<https://bugs.gnu.org/29226>.
* guix/ui.scm (load*): Likewise.
Previously we would call 'machine-load' once per machine, which was very
costly when there were many machines. Now we arrange to call it only
once on average (when all the machines have the same 'speed' value).
* guix/scripts/offload.scm (random-seed, shuffle): New procedures.
(choose-build-machine)[machines+slots+loads]: Rename to...
[machines+slots]: ... this. Remove load from the tuples therein.
[undecorate]: Adjust accordingly.
[machine-less-loaded-or-faster?]: Remove.
[machine-faster?]: New procedure.
Sort MACHINES+SLOTS according to 'machine-faster?'. Call
'machine-load?' as the last thing.
The '%slots' list could grow indefinitely; in practice though,
guix-daemon is likely to restart 'guix offload' often enough.
* guix/scripts/offload.scm (%slots): Remove.
(choose-build-machine): Don't 'set!' %SLOTS. Return the acquired slot
as a second value.
(process-request): Adjust accordingly. Release the returned slot after
'transfer-and-offload'.
This fixes a memory leak that can be seen by running:
(map (lambda _ (machine-load m)) (iota 1000))
* guix/scripts/offload.scm (machine-load): Add call to 'disconnect!'.
This avoids the open/fstat/close syscalls upon a cache hit that we had
with the previous idiom:
(call-with-input-file file read-derivation)
where caching happened in 'read-derivation' itself.
* guix/derivations.scm (%read-derivation): Rename to...
(read-derivation): ... this.
(read-derivation-from-file): New procedure.
(derivation-prerequisites, substitution-oracle)
(derivation-prerequisites-to-build):
(derivation-path->output-path, derivation-path->output-paths):
(derivation-path->base16-hash, map-derivation): Use
'read-derivation-from-file' instead of (call-with-input-file …
read-derivation).
* guix/grafts.scm (item->deriver): Likewise.
* guix/scripts/build.scm (log-url, options->things-to-build): Likewise.
* guix/scripts/graph.scm (file->derivation): Remove.
(derivation-dependencies, %derivation-node-type): Use
'read-derivation-from-file' instead.
* guix/scripts/offload.scm (guix-offload): Likewise.
* guix/scripts/perform-download.scm (guix-perform-download): Likewise.
* guix/scripts/publish.scm (load-derivation): Remove.
(narinfo-string): Use 'read-derivation-from-file'.
This allows 'guix publish' threads as well as 'guix substitute' and
'guix offload' processes to be properly labeled in 'top', 'pstree', etc.
* guix/workers.scm (worker-thunk): Add #:thread-name parameter and honor it.
(make-pool): Likewise.
* guix/scripts/publish.scm (http-write): Add calls to 'set-thread-name'
in bodies of 'call-with-new-thread'.
(guix-publish): Call 'set-thread-name'. Pass #:thread-name to 'make-pool'.
* guix/scripts/offload.scm (guix-offload): Call 'set-thread-name'.
* guix/scripts/substitute.scm (guix-substitute): Likewise.
* guix/scripts/offload.scm (connect-to-remote-daemon)
(store-import-channel, store-export-channel, send-files)
(retrieve-files): Move to (guix ssh).
(nonce): Add optional 'name' parameter and use it.
(retrieve-files*): New procedure.
(transfer-and-offload): Use it instead of 'retrieve-files', and add
first parameter to 'send-files'.
(assert-node-can-import): Likewise.
(assert-node-can-export): Use 'retrieve-files' instead of
'store-export-channel'.
* guix/ssh.scm: New file.
* configure.ac: Use 'GUIX_CHECK_GUILE_SSH' and define 'HAVE_GUILE_SSH'
Automake conditional.
* Makefile.am (MODULES) [HAVE_GUILE_SSH]: Add guix/ssh.scm.
* guix/scripts/offload.scm (check-machine-availability): Add 'pred'
parameter and honor it.
(guix-offload): for the "test" sub-command, accept an extra 'regexp'
parameter. Pass a second argument to 'check-machine-availability'.
This fixes a regression introduced in
21531add32 whereby the build log would no
longer be sent to FD 4, thereby leading the daemon to not see the build
log.
* guix/scripts/offload.scm (transfer-and-offload): Parameterize
CURRENT-BUILD-OUTPUT-PORT.
* guix/scripts/offload.scm (assert-node-repl, assert-node-has-guix)
(nonce, assert-node-can-import, assert-node-can-export)
(check-machine-availability): New procedures.
(%random-state): New variable.
(guix-offload): Add case for "test".
* doc/guix.texi (Daemon Offload Setup): Document it. Remove obsolete
bit about remote invocation of 'guix build'.
This fixes a longstanding issue where 'choose-build-machine' would make
on average O(N log(N)) calls to 'machine-load', plus an extra call for
the selected machine, instead of N calls.
* guix/scripts/offload.scm (machine-load): Add comment.
(machine-power-factor, machine-less-loaded-or-faster?): Remove.
(choose-build-machine)[machines+slots]: Rename to...
[machines+slots+loads]: ... this.
[undecorate]: Adjust accordingly.
[machine-less-loaded-or-faster?]: New procedure.
Remove extra 'machine-load' call in body.
* guix/scripts/offload.scm (<build-machine>)[daemon-socket]: New field.
(connect-to-remote-daemon): New procedure.
(%gc-root-file, register-gc-root, remove-gc-roots, offload): Remove.
(transfer-and-offload): Rewrite using 'connect-to-remote-daemon' and
RPCs over SSH.
(store-import-channel, store-export-channel): New procedures.
(send-files, retrieve-files): Rewrite using these.
* guix/scripts/offload.scm (<build-machine>)[ssh-options]: Remove.
[host-key, host-key-type]: New fields.
(%lsh-command, %lshg-command, user-lsh-private-key): Remove.
(user-openssh-private-key, private-key-from-file*): New procedures.
(host-key->type+key, open-ssh-session): New procedures.
(remote-pipe): Remove 'mode' parameter. Rewrite in terms of
'open-ssh-session' etc. Update users.
(send-files)[missing-files]: Rewrite using the bidirectional channel
port.
Remove call to 'call-with-compressed-output-port'.
(retrieve-files): Remove call to 'call-with-decompressed-port'.
(machine-load): Remove exit status logic.
* doc/guix.texi (Requirements): Mention Guile-SSH.
(Daemon Offload Setup): Document 'host-key' and 'private-key'. Show the
default value on each @item line.
* m4/guix.m4 (GUIX_CHECK_GUILE_SSH): New macro.
* config-daemon.ac: Use 'GUIX_CHECK_GUILE_SSH'. Set
'HAVE_DAEMON_OFFLOAD_HOOK' as a function of that.
Suggested by Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>.
* guix/scripts/offload.scm (register-gc-root)[script]: Replace
'false-if-exception' with a finer-grain 'system-error handler.
Provide the name of MACHINE in 'leave' error message.