FastHistograms
Documentation for FastHistograms.
FastHistograms.FastHistograms
FastHistograms.Arithmetic
FastHistograms.BinSearchAlgorithm
FastHistograms.BinType
FastHistograms.BinarySearch
FastHistograms.FixedWidth
FastHistograms.HashFunction
FastHistograms.HistogramParallelization
FastHistograms.NoParallelization
FastHistograms.PrivateThreads
FastHistograms.SIMD
FastHistograms.UnboundedWidth
FastHistograms.VariableWidth
FastHistograms.bin_search
FastHistograms.counts
FastHistograms.create_fast_histogram
FastHistograms.create_fast_histogram
FastHistograms.create_fast_histogram
FastHistograms.create_fast_histogram
FastHistograms.get_subweights
FastHistograms.get_weights
FastHistograms.increment_bins!
FastHistograms.increment_bins!
FastHistograms.zero!
FastHistograms.FastHistograms
— ModuleFastHistograms declares and implements a minimal histogram interface with a focus on speed.
julia> using FastHistograms, Random
# Create a 2D histogram for 8-bit integer data.
julia> h = create_fast_histogram(
# Use fixed-width bins with an optimized bin search algorithm (Arithmetic)
# for fixed-width bins.
FastHistograms.FixedWidth(),
FastHistograms.Arithmetic(),
# Don't use any parallelization because our data are small.
FastHistograms.NoParallelization(),
[(0x00, 0xff, 4), (0x00, 0xff, 4)],
);
# Create two random images to compute the joint histogram for
julia> img1 = rand(0x00:0xff, 32, 32);
julia> img2 = rand(0x00:0xff, 32, 32);
# Compute the histogram bin counts
julia> increment_bins!(h, img1, img2)
# Get the bin counts
julia> counts(h)
4×4 Matrix{Int64}:
61 64 67 64
65 59 72 65
61 66 71 61
53 67 63 65
FastHistograms.Arithmetic
— TypeBasic arithmetic to determine the bin to update, compatible only with the FixedWidth bin type.
Requires these functions to be defined:
binmin(hist, axis)::Int
Returns the value of the lowest bin edge for the axis. The implementation should use@propagate_inbounds
for good performance.norm(hist, axis)::Float32
Returns the inverse of the size of the bin range for the axis (1 / (last_bin - first_bin)
). The implementation should use@propagate_inbounds
for good performance.nbins(hist, axis)::Int
Returns the number of bins for the axis. The implementation should use@propagate_inbounds
for good performance.
FastHistograms.BinSearchAlgorithm
— TypeA trait for the ways the bin search step can be implemented.
Histograms that operate on real-valued data must implement the following functions, in addition to any trait-specific functions:
get_weights(hist)::AbstractArray{Int,N}
Returns the weights (i.e. counts) array for an N-dimensional histogram.
Histograms that operate on text data must implement the following functions, in addition to any trait-specific functions:
get_table(hist)::AbstractDict{String,Int}
Returns the table for the histogram.
FastHistograms.BinType
— TypeA trait for the type of bins a histogram may have.
FastHistograms.BinarySearch
— TypeUses binary search to find the bin to update. Meant to be used with the VariableWidth bin type.
Requires these functions to be defined:
bin_edges(hist, axis)::Vector{Int}
Returns a sorted vector of the bin edges for the axis. The implementation should use@propagate_inbounds
for good performance.
FastHistograms.FixedWidth
— TypeEach bin has the same predetermined width.
FastHistograms.HashFunction
— TypeUses a hash function to find the bin to update. Compatible only with the UnboundedWidth bin type.
FastHistograms.HistogramParallelization
— TypeA trait for the ways the bin search and bin update steps can be parallelized.
FastHistograms.NoParallelization
— TypeNo threading nor vectorization.
FastHistograms.PrivateThreads
— TypeThreads that have private bin data structures that are reduced after their private updates.
Requires these functions to be defined for real-valued histograms:
get_subweights(hist)::AbstractArray{Int,N+1}
Returns the weights (i.e. counts) array for an N-dimensional histogram.
Requires these functions to be defined for text histograms:
get_subtable(hist)::AbstractVector{AbstractDict{String,Int}}
Returns a vector of independent tables.
FastHistograms.SIMD
— TypeSIMD vectorization.
Requires these functions to be defined for real-valued histograms:
get_subweights(hist)::AbstractArray{Int,N+1}
Returns the weights (i.e. counts) array for an N-dimensional histogram.
Requires these functions to be defined for text histograms:
get_subtable(hist)::AbstractVector{AbstractDict{String,Int}}
Returns a vector of independent tables.
FastHistograms.UnboundedWidth
— TypeBin widths are not known before computing the histogram (i.e. text data). Only 1D histograms are supported.
FastHistograms.VariableWidth
— TypeBins have possibly different predetermined widths.
FastHistograms.bin_search
— Methodbin_search(h, axis, data)
Returns the index of the bin to increment.
FastHistograms.counts
— Methodcounts(h)
Returns the bin counts of the histogram h
. All histograms must implement this.
FastHistograms.create_fast_histogram
— Functioncreate_fast_histogram(
::BinType,
::BinSearchAlgorithm,
::HistogramParallelization,
args...
)
Creates a histogram with the given BinType
, BinSearchAlgorithm
, and HistogramParallelization
traits. Methods of this function will also require additional arguments (here args...
) that depend on the combination of traits selected.
FastHistograms.create_fast_histogram
— Methodcreate_fast_histogram(::UnboundedWidth, ::HashFunction, ::P) where {P<:HistogramParallelization}
Creates a histogram for 1D text data. P
can be any parallelization scheme.
FastHistograms.create_fast_histogram
— Methodcreate_fast_histogram(
::VariableWidth,
::BinarySearch,
::P,
edges::AbstractVector{<:AbstractVector}, # Vector of edges, one edge vector per dimension
) where {P<:HistogramParallelization}
Creates a histogram with variable-width bins (i.e. bins of possibly different widths). P
can be any parallelization scheme. The edges
define the bin edges for each axis of the histogram. Provide one element for each dimension. Each element has the form (first edge, second edge, ..., nth edge)
.
FastHistograms.create_fast_histogram
— Methodcreate_fast_histogram(
::FixedWidth,
::S,
::P,
axes_data::AbstractVector{Tuple{E,E,Int}}, # first, last, nbins
) where {E<:Real,S<:BinSearchAlgorithm,P<:HistogramParallelization}
Creates a histogram with fixed-width bins. S
and P
can be any bin search algorithm or parallelization scheme, respectively. The axes_data
define the range of each axis of the histogram. Provide one element for each dimension. Each element has the form (first_bin, last_bin, nbins)
.
FastHistograms.get_subweights
— Functionget_subweights(h)
Returns the subweights array. All histograms implementing SIMD and PrivateThreads parallelization must implement this.
FastHistograms.get_weights
— Functionget_weights(h)
Returns the weights array. All histograms must implement this.
FastHistograms.increment_bins!
— Methodincrement_bins!(h, data1, data2)
Increments the bin counts for a 2D histogram h
using the data data1
and data2
. Elements of data
that are outside the range of the histogram's bins will NOT be filtered out, they will be considered as members of the closest bin.
FastHistograms.increment_bins!
— Methodincrement_bins!(h, data)
Increments the bin counts for a 1D histogram h
using the data
. Elements of data
that are outside the range of the histogram's bins will NOT be filtered out, they will be considered as members of the closest bin.
FastHistograms.zero!
— Methodzero!(h)
Sets all bin counts of the histogram h
to zero. All histograms must implement this.