Fetch FreeBSD ports with parallel connection support and connection pipelining
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Tobias Kortkamp b82269e533
Add tag,release,publish Makefile
1 month ago
.github Create FUNDING.yml 4 months ago
libias@a175135398 Chase libias 1 month ago
overlay/Mk overlay: Define targets after bsd.port.pre.mk 6 months ago
vendor Vendor libcurl and libnghttp2 1 month ago
.cirrus.yml cirrus: Bump FreeBSD version and remove curl from dependencies 1 month ago
.gitignore bsd.overlay.mk: Use parfetch-static from the overlay dir 6 months ago
.gitmodules Vendor libcurl and libnghttp2 1 month ago
.kakrc.local Move event loop setup to its own module 6 months ago
CHANGELOG.adoc Release 0.1.2 1 month ago
LICENSE Initial commit 6 months ago
Makefile Add tag,release,publish Makefile 1 month ago
README.adoc README.adoc: Add option documentation here too 6 months ago
build.ninja Initial commit 6 months ago
build.ninja.spec Vendor libcurl and libnghttp2 1 month ago
configure Initial commit 6 months ago
loop.c Chase libias and add .cirrus.yml 6 months ago
loop.h Add a progress bar 6 months ago
parfetch.c Chase libias and use Workqueue for initial distfile check 3 months ago
progress.c Avoid trailing escape sequences at process end 6 months ago
progress.h Add missing license headers 6 months ago

README.adoc

Fetch FreeBSD ports with parallel connection support and connection pipelining.

Caution
This is an experiment. Use at your own risk.

This is a glue application between libcurl and the ports framework. libcurl does all the heavy lifting. It can be activated by enabling an overlay. It replaces the default do-fetch and checksum targets.

A statically linked binary is available in the overlay to make it easy to use with Poudriere(-devel) as well.

Why?

The ports framework does not make use of modern features like HTTP pipelining. All USES=cargo ports fetch most distfiles from a single host (https://crates.io) so could benefit greatly from it. Ports with hundreds of distfiles call fetch(1) hundreds of times and will open hundreds of connections one after the other.

As an example here is a basic time comparison for devel/tokei where instead of opening >150 connections to https://crates.io sequentially, it just opens 1 connection to it and can fetch all crates basically immediately.

asciinema

Even ports like x11-toolkits/wlroots with only a handful of distfiles can benefit from it.

Default fetch

$ time make -C devel/tokei distclean checksum
       43.86 real         3.52 user         5.04 sys
$ time make -C x11-toolkits/wlroots distclean checksum
       12.28 real         1.25 user         1.81 sys

Parfetch

$ time make -C devel/tokei OVERLAYS=/usr/local/share/parfetch/overlay distclean checksum
        3.24 real         1.24 user         1.22 sys
$ time make -C x11-toolkits/wlroots OVERLAYS=/usr/local/share/parfetch/overlay distclean checksum
        2.79 real         0.85 user         0.85 sys

Configure Parfetch

Build

$ ./configure && ninja

Install

$ ninja install

Local ports setup

Enable the overlay in /etc/make.conf:

OVERLAYS+=	/usr/local/share/parfetch/overlay

Poudriere setup

Warning
This requires a Poudriere version with overlay support. For example poudriere-devel.

Enable

Make the overlay available to Poudriere:

$ poudriere ports -c -p parfetch -m null -M /usr/local/share/parfetch/overlay

Usage

Build devel/tokei and use Parfetch to fetch distfiles:

$ poudriere bulk -O parfetch devel/tokei

Parfetch options

Options can be set in make.conf.

PARFETCH_MAKESUM_EPHEMERAL

When defined during makesum, distinfo is created/updated but no distfiles are saved to disk. Note that the files are still downloaded completely to checksum them but DISTDIR is left untouched.

PARFETCH_MAKESUM_KEEP_TIMESTAMP

When defined during makesum, retain the previous TIMESTAMP in distinfo. This can be useful when refreshing patches that have no code changes and thus do not warrant a TIMESTAMP bump.

PARFETCH_MAX_HOST_CONNECTIONS

This sets the maximum number of simultaneous open connections to a single host.

Default is 1.

PARFETCH_MAX_TOTAL_CONNECTIONS

This sets the global connection limit. Parfetch will not use more than this number of connections.

Default is 4.