Erlang NIF with timeslice reductions


Recently, I put together an Erlang asynchronous port driver named keccakf1600 which implements the SHA-3 algorithms used in another one of my projects, jose.

See version 1.0.2 of keccakf1600 for the original port driver implementation.

When interfacing with native C and the Erlang VM, you essentially have 3 options to choose from:

  1. Port Driver — a shared library linked with driver_entry (I/O heavy operations are typically best suited for this type)
  2. NIF — a shared library linked with ERL_NIF_INIT (fast synchronous operations are typically best suited for this type)
  3. Port — an external program which typically communicates with the Erlang VM over stdin and stdout

My goal was to have a fast and asynchronous way to call blocking functions without disrupting the Erlang VM schedulers from carrying out their work. The original plan was to use driver_async combined with ready_async to perform the blocking operations on “a thread separate from the emulator thread.” I used the ei library in order to communicate between the Erlang VM and the port driver written in C.

Having accomplished my goal, I decided to run a simple benchmark against the equivalent SHA-2 algorithms out of curiosity as to how my implementation might stack up against the native Erlang crypto library.

The results were not terribly impressive:

The two main concerns I had with the results were:

  1. Was the SHA-3 implementation I used (based on ed448goldilocks) really 5-7 times slower than the SHA-2 algorithms?
  2. Why was there so much variance between the SHA-3 algorithms versus the variance observed between the SHA-2 algorithms?

Concern #1 was ruled out by directly testing the C version of the algorithms, for small message sizes they were typically within 1-2μs of each other.

Concern #2 required more research, which eventually led me to the bitwise project by Steve Vinoski. The project explores some of the strategies for dealing with the synchronous nature of a NIF without blocking the scheduler by keeping track of reductions during a given timeslice. It also explores strategies using the experimental dirty NIF feature.

I highly recommend reading the two presentations from the bitwise project: vinoski-opt-native-code.pdf and vinoski-schedulers.pdf.

After experimenting with the two options, I decided to use enif_consume_timeslice combined with enif_schedule_nif to yield control back to the main Erlang VM on larger inputs to prevent blocking other schedulers.

I rewrote the port driver as a NIF and released it as version 2.0.0 of keccakf1600 and ran the same benchmark again:

These results are much more consistent and closer to my original expectations. I plan on refactoring the erlang-libsodium project using the same technique.