_POSIX_C_SOURCE is necessary for getopt(3), and _DEFAULT_SOURCE is
necessary for the d_type enumerations in struct dirent.
Since we are here, use the C11 standard explicitly.
These fixes only allow us to build with glibc>=2.28, which is when the
<threads.h> header was added [1], necessary to access the thread_local
macro.
[1] https://stackoverflow.com/a/22875599
Now a parent can store its own fd, which will live on for as long as the
parent does, allowing us to use relative *at functions (that might be
faster on the kernel side?) and store smaller path buffers.
Since we were playing with soft limit detection, also make the EMFILE
operations conditional on limited_fds, avoiding potentially expensive
mutex operations when possible.
When checking the soft limit, we use 2 for the number of standard
file descriptors, because we closed stdin in main().
Making it a thread_local variable that can be used by any function means
we can set it from recurse_into_parents with ease, increasing the
likelihood it's set when allocating a new struct.
We also add a debugging print to the malloc branch in task allocation;
this allows us to compare how many allocations use p_old and how many
don't. Again with the linux kernel tree, ~50% of allocations now use
p_old.
This allows us to save some malloc()/free() calls.
The structs can leak on program exit, but an external vector to store
them could fix this.
As it stands, this isn't really useful: in multiple tests with the linux
kernel tree, the p_old branch was taken between 0 and 3 times in a full
run.
- other threads need to see the correct value of p->files, so use
'release' in the parent cleanup section
- recurse_into_parents needs to see the correct value of p->files, so
use 'acquire' there
- also add comments explaining each branch in the parent cleanup section
This allows us to get a clean valgrind pass.
valgrind -s --soname-synonyms=somalloc=NONE --leak-check=full erm -r u-boot-2021.10/
==24385== Memcheck, a memory error detector
==24385== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==24385== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==24385== Command: erm -r u-boot-2021.10/
==24385==
==24385==
==24385== HEAP SUMMARY:
==24385== in use at exit: 0 bytes in 0 blocks
==24385== total heap usage: 4,047 allocs, 4,047 frees, 3,850,073 bytes allocated
==24385==
==24385== All heap blocks were freed -- no leaks are possible
==24385==
==24385== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
- Rrror out clearly on most allocation failures (the others will simply
segfault). Avoid getting into weird program conditions in case some
operations fail.
- Improve organization of main(), no function pointer usage should be
necessary.
- Make main thread also a worker thread: avoid leaving a useless thread
around simply waiting for others to complete. Also means one less
thread to launch.
This is usually not being run as root so it isn't a security
vulnerability, but in the interest of security, we should open the
directory using open() with the appropriate flags to avoid following a
symlink erroneously.
Inspired by [1].
[1] https://blog.rust-lang.org/2022/01/20/cve-2022-21658.html
There's no occasion when queue_remove won't succeed or call exit(), as
handled by the while loop above it, so we can also switch it to a void
function.
Instead of always adding files to the queue, we can try to remove them
in the readdir loop. This allows us to:
- add fewer items to the queue
- skip allocating and copying the path, since with the dir stream open
we can use unlinkat(2)
- allocate parent task lazily, since it might not be needed
- stop using a recursive mutex, which can be slightly more expensive
Doing this in a naive way lead to a slow down, since we were holding the
queue mutex during the entirety of the operation. Instead, it was
necessary to change the loop structure a lot in order to be able to add
items to the queue without knowing the number of entries in the
directory. It could have been calculated with a readdir(3) loop +
rewinddir(3), but that would have added a lot of syscalls. In order to
work around that, we changed the purpose of the atomic int in struct
task.
Now, removed_count holds how many entries from the directory were
removed and a flag in its most significant byte to signal whether we are
still adding entries to the queue that refer to it or not. This flag,
plus some fancy atomic operations, allow us to control whether the
directory cleanup should happen in the thread that was adding its
entries to the queue or in the thread that removes the last item from
the queue.
We consider it safe to use the most significant bit of the unsigned int
as a flag because scandir(3) returns a signed int for the number of
entries in a directory.
- we were allocating plen+nlen+1 and accessing plen+nlen+1; the correct
allocation size should have been plen+nlen+2, because it needed to fit
the null byte and the slash
- printing buf after it's been added to queue gets into a race condition
where it can be freed before it's printed
ENFILE is for system wide file descriptors, we need to deal with EMFILE
instead. Use a cond var to signal that a directory stream has been
closed, which means we can open another one.
Use PTHREAD_STACK_MIN where available.
This required some changes to work on musl:
- the removal of an intermediary buffer with PATH_MAX bytes. This was an
optimization, since it avoided usage of a *printf function and halved
the amount of copying being done
- stop using scandir, since qsort uses a lot of stack. This is partially
a performance improvement: avoided calling qsort multiple times and
allocating the result list, but it locks the queue mutex for a longer
time
- thread stack size: (8K - 1024) so the libc TLS can fit in 8K, and we
allocate only two memory pages (on systems with 4K pages, at least),
plus one guard page. Will probably error out on glibc...
- mutex type: recursive so we can lock the queue entirely for some
operations; isn't used yet.
- also fix threadings bugs: threads could leak (if the join loop exited
due to some error) or non existent threads could be joined (we add the
created struct member to avoid this)
Can delete a sequence of files of directories in the command line, no
recursion yet.
Assumes remove(3) is fast and reasonable:
unlink(); if (errno==EISDIR) rmdir();