FastHistograms

Documentation for FastHistograms.

FastHistograms.FastHistogramsModule

FastHistograms declares and implements a minimal histogram interface with a focus on speed.

julia> using FastHistograms, Random

# Create a 2D histogram for 8-bit integer data.
julia> h = create_fast_histogram(
    # Use fixed-width bins with an optimized bin search algorithm (Arithmetic)
    #  for fixed-width bins.
    FastHistograms.FixedWidth(),
    FastHistograms.Arithmetic(),
    # Don't use any parallelization because our data are small.
    FastHistograms.NoParallelization(),
    [(0x00, 0xff, 4), (0x00, 0xff, 4)],
);

# Create two random images to compute the joint histogram for
julia> img1 = rand(0x00:0xff, 32, 32);

julia> img2 = rand(0x00:0xff, 32, 32);

# Compute the histogram bin counts
julia> increment_bins!(h, img1, img2)

# Get the bin counts
julia> counts(h)
4×4 Matrix{Int64}:
 61  64  67  64
 65  59  72  65
 61  66  71  61
 53  67  63  65
source
FastHistograms.ArithmeticType

Basic arithmetic to determine the bin to update, compatible only with the FixedWidth bin type.

Requires these functions to be defined:

  • binmin(hist, axis)::Int Returns the value of the lowest bin edge for the axis. The implementation should use @propagate_inbounds for good performance.
  • norm(hist, axis)::Float32 Returns the inverse of the size of the bin range for the axis (1 / (last_bin - first_bin)). The implementation should use @propagate_inbounds for good performance.
  • nbins(hist, axis)::Int Returns the number of bins for the axis. The implementation should use @propagate_inbounds for good performance.
source
FastHistograms.BinSearchAlgorithmType

A trait for the ways the bin search step can be implemented.

Histograms that operate on real-valued data must implement the following functions, in addition to any trait-specific functions:

  • get_weights(hist)::AbstractArray{Int,N} Returns the weights (i.e. counts) array for an N-dimensional histogram.

Histograms that operate on text data must implement the following functions, in addition to any trait-specific functions:

  • get_table(hist)::AbstractDict{String,Int} Returns the table for the histogram.
source
FastHistograms.BinarySearchType

Uses binary search to find the bin to update. Meant to be used with the VariableWidth bin type.

Requires these functions to be defined:

  • bin_edges(hist, axis)::Vector{Int} Returns a sorted vector of the bin edges for the axis. The implementation should use @propagate_inbounds for good performance.
source
FastHistograms.PrivateThreadsType

Threads that have private bin data structures that are reduced after their private updates.

Requires these functions to be defined for real-valued histograms:

  • get_subweights(hist)::AbstractArray{Int,N+1} Returns the weights (i.e. counts) array for an N-dimensional histogram.

Requires these functions to be defined for text histograms:

  • get_subtable(hist)::AbstractVector{AbstractDict{String,Int}} Returns a vector of independent tables.
source
FastHistograms.SIMDType

SIMD vectorization.

Requires these functions to be defined for real-valued histograms:

  • get_subweights(hist)::AbstractArray{Int,N+1} Returns the weights (i.e. counts) array for an N-dimensional histogram.

Requires these functions to be defined for text histograms:

  • get_subtable(hist)::AbstractVector{AbstractDict{String,Int}} Returns a vector of independent tables.
source
FastHistograms.create_fast_histogramFunction
create_fast_histogram(
    ::BinType,
    ::BinSearchAlgorithm,
    ::HistogramParallelization,
    args...
)

Creates a histogram with the given BinType, BinSearchAlgorithm, and HistogramParallelization traits. Methods of this function will also require additional arguments (here args...) that depend on the combination of traits selected.

source
FastHistograms.create_fast_histogramMethod
create_fast_histogram(::UnboundedWidth, ::HashFunction, ::P) where {P<:HistogramParallelization}

Creates a histogram for 1D text data. P can be any parallelization scheme.

source
FastHistograms.create_fast_histogramMethod
create_fast_histogram(
    ::VariableWidth,
    ::BinarySearch,
    ::P,
    edges::AbstractVector{<:AbstractVector}, # Vector of edges, one edge vector per dimension
) where {P<:HistogramParallelization}

Creates a histogram with variable-width bins (i.e. bins of possibly different widths). P can be any parallelization scheme. The edges define the bin edges for each axis of the histogram. Provide one element for each dimension. Each element has the form (first edge, second edge, ..., nth edge).

source
FastHistograms.create_fast_histogramMethod
create_fast_histogram(
    ::FixedWidth,
    ::S,
    ::P,
    axes_data::AbstractVector{Tuple{E,E,Int}}, # first, last, nbins
) where {E<:Real,S<:BinSearchAlgorithm,P<:HistogramParallelization}

Creates a histogram with fixed-width bins. S and P can be any bin search algorithm or parallelization scheme, respectively. The axes_data define the range of each axis of the histogram. Provide one element for each dimension. Each element has the form (first_bin, last_bin, nbins).

source
FastHistograms.get_subweightsFunction
get_subweights(h)

Returns the subweights array. All histograms implementing SIMD and PrivateThreads parallelization must implement this.

source
FastHistograms.increment_bins!Method
increment_bins!(h, data1, data2)

Increments the bin counts for a 2D histogram h using the data data1 and data2. Elements of data that are outside the range of the histogram's bins will NOT be filtered out, they will be considered as members of the closest bin.

source
FastHistograms.increment_bins!Method
increment_bins!(h, data)

Increments the bin counts for a 1D histogram h using the data. Elements of data that are outside the range of the histogram's bins will NOT be filtered out, they will be considered as members of the closest bin.

source