FastHistograms
Documentation for FastHistograms.
FastHistograms.FastHistogramsFastHistograms.ArithmeticFastHistograms.BinSearchAlgorithmFastHistograms.BinTypeFastHistograms.BinarySearchFastHistograms.FixedWidthFastHistograms.HashFunctionFastHistograms.HistogramParallelizationFastHistograms.NoParallelizationFastHistograms.PrivateThreadsFastHistograms.SIMDFastHistograms.UnboundedWidthFastHistograms.VariableWidthFastHistograms.bin_searchFastHistograms.countsFastHistograms.create_fast_histogramFastHistograms.create_fast_histogramFastHistograms.create_fast_histogramFastHistograms.create_fast_histogramFastHistograms.get_subweightsFastHistograms.get_weightsFastHistograms.increment_bins!FastHistograms.increment_bins!FastHistograms.zero!
FastHistograms.FastHistograms — ModuleFastHistograms declares and implements a minimal histogram interface with a focus on speed.
julia> using FastHistograms, Random
# Create a 2D histogram for 8-bit integer data.
julia> h = create_fast_histogram(
# Use fixed-width bins with an optimized bin search algorithm (Arithmetic)
# for fixed-width bins.
FastHistograms.FixedWidth(),
FastHistograms.Arithmetic(),
# Don't use any parallelization because our data are small.
FastHistograms.NoParallelization(),
[(0x00, 0xff, 4), (0x00, 0xff, 4)],
);
# Create two random images to compute the joint histogram for
julia> img1 = rand(0x00:0xff, 32, 32);
julia> img2 = rand(0x00:0xff, 32, 32);
# Compute the histogram bin counts
julia> increment_bins!(h, img1, img2)
# Get the bin counts
julia> counts(h)
4×4 Matrix{Int64}:
61 64 67 64
65 59 72 65
61 66 71 61
53 67 63 65FastHistograms.Arithmetic — TypeBasic arithmetic to determine the bin to update, compatible only with the FixedWidth bin type.
Requires these functions to be defined:
binmin(hist, axis)::IntReturns the value of the lowest bin edge for the axis. The implementation should use@propagate_inboundsfor good performance.norm(hist, axis)::Float32Returns the inverse of the size of the bin range for the axis (1 / (last_bin - first_bin)). The implementation should use@propagate_inboundsfor good performance.nbins(hist, axis)::IntReturns the number of bins for the axis. The implementation should use@propagate_inboundsfor good performance.
FastHistograms.BinSearchAlgorithm — TypeA trait for the ways the bin search step can be implemented.
Histograms that operate on real-valued data must implement the following functions, in addition to any trait-specific functions:
get_weights(hist)::AbstractArray{Int,N}Returns the weights (i.e. counts) array for an N-dimensional histogram.
Histograms that operate on text data must implement the following functions, in addition to any trait-specific functions:
get_table(hist)::AbstractDict{String,Int}Returns the table for the histogram.
FastHistograms.BinType — TypeA trait for the type of bins a histogram may have.
FastHistograms.BinarySearch — TypeUses binary search to find the bin to update. Meant to be used with the VariableWidth bin type.
Requires these functions to be defined:
bin_edges(hist, axis)::Vector{Int}Returns a sorted vector of the bin edges for the axis. The implementation should use@propagate_inboundsfor good performance.
FastHistograms.FixedWidth — TypeEach bin has the same predetermined width.
FastHistograms.HashFunction — TypeUses a hash function to find the bin to update. Compatible only with the UnboundedWidth bin type.
FastHistograms.HistogramParallelization — TypeA trait for the ways the bin search and bin update steps can be parallelized.
FastHistograms.NoParallelization — TypeNo threading nor vectorization.
FastHistograms.PrivateThreads — TypeThreads that have private bin data structures that are reduced after their private updates.
Requires these functions to be defined for real-valued histograms:
get_subweights(hist)::AbstractArray{Int,N+1}Returns the weights (i.e. counts) array for an N-dimensional histogram.
Requires these functions to be defined for text histograms:
get_subtable(hist)::AbstractVector{AbstractDict{String,Int}}Returns a vector of independent tables.
FastHistograms.SIMD — TypeSIMD vectorization.
Requires these functions to be defined for real-valued histograms:
get_subweights(hist)::AbstractArray{Int,N+1}Returns the weights (i.e. counts) array for an N-dimensional histogram.
Requires these functions to be defined for text histograms:
get_subtable(hist)::AbstractVector{AbstractDict{String,Int}}Returns a vector of independent tables.
FastHistograms.UnboundedWidth — TypeBin widths are not known before computing the histogram (i.e. text data). Only 1D histograms are supported.
FastHistograms.VariableWidth — TypeBins have possibly different predetermined widths.
FastHistograms.bin_search — Methodbin_search(h, axis, data)Returns the index of the bin to increment.
FastHistograms.counts — Methodcounts(h)Returns the bin counts of the histogram h. All histograms must implement this.
FastHistograms.create_fast_histogram — Functioncreate_fast_histogram(
::BinType,
::BinSearchAlgorithm,
::HistogramParallelization,
args...
)Creates a histogram with the given BinType, BinSearchAlgorithm, and HistogramParallelization traits. Methods of this function will also require additional arguments (here args...) that depend on the combination of traits selected.
FastHistograms.create_fast_histogram — Methodcreate_fast_histogram(::UnboundedWidth, ::HashFunction, ::P) where {P<:HistogramParallelization}Creates a histogram for 1D text data. P can be any parallelization scheme.
FastHistograms.create_fast_histogram — Methodcreate_fast_histogram(
::VariableWidth,
::BinarySearch,
::P,
edges::AbstractVector{<:AbstractVector}, # Vector of edges, one edge vector per dimension
) where {P<:HistogramParallelization}Creates a histogram with variable-width bins (i.e. bins of possibly different widths). P can be any parallelization scheme. The edges define the bin edges for each axis of the histogram. Provide one element for each dimension. Each element has the form (first edge, second edge, ..., nth edge).
FastHistograms.create_fast_histogram — Methodcreate_fast_histogram(
::FixedWidth,
::S,
::P,
axes_data::AbstractVector{Tuple{E,E,Int}}, # first, last, nbins
) where {E<:Real,S<:BinSearchAlgorithm,P<:HistogramParallelization}Creates a histogram with fixed-width bins. S and P can be any bin search algorithm or parallelization scheme, respectively. The axes_data define the range of each axis of the histogram. Provide one element for each dimension. Each element has the form (first_bin, last_bin, nbins).
FastHistograms.get_subweights — Functionget_subweights(h)Returns the subweights array. All histograms implementing SIMD and PrivateThreads parallelization must implement this.
FastHistograms.get_weights — Functionget_weights(h)Returns the weights array. All histograms must implement this.
FastHistograms.increment_bins! — Methodincrement_bins!(h, data1, data2)Increments the bin counts for a 2D histogram h using the data data1 and data2. Elements of data that are outside the range of the histogram's bins will NOT be filtered out, they will be considered as members of the closest bin.
FastHistograms.increment_bins! — Methodincrement_bins!(h, data)Increments the bin counts for a 1D histogram h using the data. Elements of data that are outside the range of the histogram's bins will NOT be filtered out, they will be considered as members of the closest bin.
FastHistograms.zero! — Methodzero!(h)Sets all bin counts of the histogram h to zero. All histograms must implement this.