Benchmarks¶

Settings¶

Machine¶

Ubuntu 22.04 instance on GitHub Actions
- The benchmarking suits are implemented as GitHub Actions workflows.
- 4 processors, 16 GB RAM, 14 GB storage (SSD)
- According to profiling with pyperf, each processor operates at a frequency between 2.4 and 3.3 GHz.
Tuning by the folloiwing settings:
- All tests in a benchmarking suite are executed within the same GitHub Actions job. For more details, refer to Rodríguez-Guerra (2021).
- CPU pinning to isolate the benchmarking process.
- See the documentation of pyperf for more details on the following settings:
  - Stop irqbalance.
  - Set /proc/irq/default_smp_affinity to 3 (CPU 0 and 1), where the benchmarking processes are pinned to CPU 2 and 3.
  - Set /proc/sys/kernel/perf_event_max_sample_rate to 1.
  - /proc/sys/kernel/randomize_va_space = 2 (default)

Software¶

Python environment:
- CPython 3.12.5 (64-bit)
Hash libraries:
- mmh3 5.0.0-dev
- python-xxhash 3.5.0
- hashlib (Standard library)
  - md5 is tested for lambda x: hashlib.md5(x).digest(), and so is sha1. therefore, the results for these functions include the overhead of creating the hash object and a function call.
Benchmarking library:
- pyperf 2.7.0
  - Used the bench_time_func interface to eliminate the overhead of the function call.
  - Processed time are measured by time.perf_counter() in nanoseconds.

Method¶

A benchmarking test is performed for each specified byte size, which is derived from the Fibonacci sequence.
For each input size, the test generates a set of 10 bytes instances, where each instance’s size is pseudo-randomly selected from the range [ceil(input * 0.9), floor(input * 1.1)].
- This randomization is crucial as it increases the difficulty of branch predictions, creating a more realistic scenario. For further details, see xxHash: Performance comparison.
This inner loop of 10 iterations is repeated for a certain number of cycles, referred to as the outer loop, which is auto-calibrated by pyperf.
To avoid the overhead during the loop, iterators are pre-generated using itertools.repeat() outside the loop. See Peters (2002) and the real code of the timeit module in the Python Standard Library.
The final result are measured using the median, as it is more robust than the mean, especially on untuned or unstable environments such as GitHub Actions. For more details, see pyperf: Analyze benchmark results.

Results¶

The resulting graphs are plotted using the pandas and matplotlib libraries.

Comparison of Version Improvements¶

JSON files containing the benchmark results are available at: hajimes/mmh3-benchmarks/results_basic-hash/2024-09-17_6bb9987

Latency for hash() in version 4.1.0 and 5.0.0. — Figure 1: Latency for `mmh3.hash()` in version 4.1.0 and 5.0.0. Smaller is better.¶

Comparison of Hash Functions Across Libraries¶

JSON files containing the benchmark results are available at: hajimes/mmh3-benchmarks/results/2024-09-17_30da46e

In the following graphs:

mmh32_32 refers to mmh3.mmh3_32_digest(). 32-bit output using 32-bit arithmetic. Developed in 2011.
mmh3_128 refers to mmh3.mmh3_x64_128_digest(). 128-bit output using 64-bit arithmetic. Developed in 2011.
xxh_32 refers to xxhash.xxh32_digest(). 32-bit output using 32-bit arithmetic. Developed in 2014.
xxh_64 refers to xxhash.xxh64_digest(). 64-bit output using 64-bit arithmetic. Developed in 2014.
xxh3_64 refers to xxhash.xxh3_64_digest(). 64-bit output using vectorized arithmetic. Developed in 2020.
xxh3_128 refers to xxhash.xxh3_64_digest(). 128-bit output using vectorized arithmetic. Developed in 2020.
md5 refers to hashlib.md5(). 128-bit output using a cryptogprahic algorithm. Developed in 1992.
sha1 refers to hashlib.sha1(). 160-bit output using a cryptogprahic algorithm. Developed in 1995.

Figure 2: Latency for small data. Smaller is better.¶

Figure 3: Latency for large data. Smaller is better.¶

The following graphs show the throughput, measured as the size of hash output generated per second by each function.

Figure 4: Throughput for small data. Larger is better.¶

Figure 5: Throughput for large data. Larger is better. The y-axis is logscale.¶

Concluding Remarks¶

Version 5.0.0 of the mmh3 library has improved the performance of the hash() function and other new functions by adopting METH_FASTCALL. This enhancement reduces the overhead of function calls. For data sizes between 1–2 KB (such as 48x48 favicons), performance has improved by 10%–20%. For smaller data (~500 bytes, like 16x16 favicons), performance increases by approximately 30%. However, the performance gain from this revision remains constant, meaning the relative improvement diminishes as data size increases.

When comparing hash functions across libraries, mmh3 5.0.0 is the most performant for small data sizes, while the xxh3 families in xxhash 3.5.0 excel with larger data. This is largely due to the new version of mmh3 utilizing METH_FASTCALL, which reduces the overhead of function calls. However, xxhash may adopt the same interface in the future, potentially making this advantage temporary. To further improve mmh3 performance, the core algorithm itself would need an overhaul.

Overall, these benchmarking results serve as a useful reference when selecting a hash function for your application, and they provide a solid foundation for future performance enhancements to our library.

References¶

Python Standard Library. timeit.
pyperf: Analyze benchmark results.
pyperf: Tune the system for benchmarks.
pyperf issues #1: Use a better measures than average and standard deviation #1.
pyperf issues #75: Reconsidering min()?.
pytest-benchmark: Usage.
xxHash: Performance comparison.
xxHash benchmark program.
Manuel Bernhardt. 2023. On pinning and isolating CPU cores.
Micha Gorelick and Ian Ozsvald. 2020. High Performance Python: Practical Performant Programming for Humans, 2nd ed. O’Reilly Media. ISBN: 978-1492055020. Chapter 2.
Tim Peters. 2002. Chapter 17. Algorithms: Introduction in Python Cookbook, 3rd ed. O’Reilly Media. ISBN: 978-0596001674.
Jaime Rodríguez-Guerra. 2021. Is GitHub Actions suitable for running benchmarks?.
Victor Stinner. 2016. My journey to stable benchmark, part 1 (system).
Victor Stinner. 2016. My journey to stable benchmark, part 3 (average).