Switching to a shared semaphore allows multi-build operations (compiler tests, package tests, etc.) to use the expected degree of parallelism efficiently.
While refactoring the job runner, the time complexity was also reduced from O(n^2) to O(n+m) (where n is the number of jobs, and m is the number of dependencies).