Contributing¶
Thank you for your interest in contributing to the mmh3 project. We
appreciate your support and look forward to your contributions.
Please read README to
get an overview of the mmh3 project, and follow our
Code of Conduct (ACM Code of Ethics and Professional
Conduct).
Submitting issues¶
We welcome your contributions, whether it’s submitting a bug report or suggesting a new feature through the issue tracker.
Before creating a new issue, please check the Known Issues section in README to see if the problem has already been noted.
Project structure¶
As of version 5.0.0-dev, the project layout is structured as follows:
src/mmh3mmh3module.c: the main file that serves as the interface between Python and the MurmurHash3 c implementations.murmurhash.c: implementations of the MurmurHash3 family. Auto-generated from Austin Appleby’s original code. DO NOT edit this file manually. See README in the util directory for details.murmurhash.h: headers and macros for MurmurHash3. Auto-generated fromutil/refresh.py. DO NOT edit this file manually.hashlib.h: taken from CPython’s code base.
utilrefresh.py: file that generatessrc/mmh3/murmurhash.candsrc/mmh3/murmurhash.hfrom the original MurmurHash3 C++ code. Edit this file to modify the contents of these files.
benchmarkbenchmark.py: script to run benchmarks.plot_graph.py: script to plot benchmark results.
docs: project documentation directory.github/workflows: GitHub Actions workflows
Project setup¶
Run:
git clone https://github.com/hajimes/mmh3.git
This project uses tox to automate testing and other tasks. You can install
tox by running:
pipx install tox
In addition, npx (included with npm >= 5.2.0) is required within the tox
environments to run linters.
Testing and linting¶
Before submitting your changes, make sure to run the project’s tests to ensure everything is working as expected.
To run all tests, use the following command:
tox
During development, you can run the tests for a specific environment by specifying the environment name. For example, to run tests for a specific version of Python (e.g., Python 3.12), use:
tox -e py312
For type checking, run:
tox -e type
To run linters with automated formatting, use:
tox -e lint
(Optional) Testing on s390x¶
When you have modified the code in a way which may cause endian issues, you may want to locally test on s390x, the only big-endian platform officially supported by Python.
Emulating a big-endian s390x with QEMU by Simon Willison is a good introduction to Docker/QEMU settings for emulating s390x.
If the above does not work, you may also want to try the following:
docker run --rm --privileged tonistiigi/binfmt --install all
docker buildx create --name mybuilder --use
docker run -it multiarch/ubuntu-core:s390x-focal /bin/bash
Pull request¶
Once you’ve pushed your changes to your fork, you can create a pull request (PR) on the main project repository. Please provide a clear and detailed description of your changes in the PR, and reference any related issues.
util directory¶
Algorithm implementations used by the mmh3 module¶
The util directory contains C files that were generated from the
SMHasher C++ project by Austin Appleby.
The idea of the subproject directory loosely follows the
hashlib implementation of CPython.
Updating mmh3 core C code¶
Run tox -e build_cfiles. This will fetch Appleby’s original SMHasher project
as a git submodule and then generate PEP 7-compliant C code from the original
project.
To perform further edits, add transformation code to the refresh.py script,
instead of editing murmurhash3.* files manually.
Then, run tox -e build_cfiles again to update the murmurhash3.* files.
Local files¶
./util/README.md./util/refresh.py./util/FILE_HEADER
Generated files¶
./src/mmh3/murmurhash3.c./src/mmh3/murmurhash3.h
Benchmarking¶
To run benchmarks locally, try the following command:
tox -e benchmark -- -o OUTPUT_FILE \
--test-hash HASH_NAME --test-buffer-size-max HASH_SIZE
where OUTPUT_FILE is the output file name (json formatted), HASH_NAME is
the name of the hash, and HASH_SIZE is the maximum buffer size to be tested
in bytes.
For example,
mkdir -p _results
tox -e benchmark -- -o _results/mmh3_128.json \
--test-hash mmh3_128 --test-buffer-size-max 262144
As of version 4.2.0, the following hash function identifiers are available for
benchmarking: mmh3_32, mmh3_128, xxh_32, xxh_64, xxh3_64, xxh3_128,
pymmh3_32, pymmh3_128, md5, and sha1.
The owner of the repository can run the benchmark on GitHub Actions by using
the workflow defined in .github/workflows/benchmark.yml.
After obtaining the benchmark results, you can plot graphs by plot_graph.py.
The following is an example of how to run the script:
tox -e plot -- --output-dir docs/_static RESULT_DIR/*.json
where RESULT_DIR is the directory containing the benchmark results.
The names of json files should be in the format of HASH_IDENTIFER.json, e.g.,
mmh3_128.json.
Documentation¶
Project documentation files are mainly written in the Markdown format and are
located in the docs. The documentation is automatically built and
hosted on the Read the Docs.
To build the documentation locally, use the following command:
tox -e docs
To check the result of the built documentation, open
docs/_build/html/index.html in your browser.