Supported and tested quantizers

BaseQuantizer

class quantizer.quantizers.BaseQuantizer[source]

Bases: Module

Base quantizer

Defines behavior all quantizers should follow.

build(var_name=None, use_variables=False)[source]
property non_trainable_variables

Sequence of non-trainable variables owned by this module and its submodules.

Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don’t expect the return value to change.

Returns:

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

property trainable_variables

Sequence of trainable variables owned by this module and its submodules.

Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don’t expect the return value to change.

Returns:

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

update_qnoise_factor(qnoise_factor)[source]

Update qnoise_factor.

property variables

Sequence of variables owned by this module and its submodules.

Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don’t expect the return value to change.

Returns:

A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).

quantized_bits

class quantizer.quantizers.quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=True, alpha=1, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False, elements_per_scale=None, min_po2_exponent=None, max_po2_exponent=None)[source]

Bases: BaseQuantizer

Legacy quantizer: Quantizes the number to a number of bits.

In general, we want to use a quantization function like:

a = (pow(2,bits) - 1 - 0) / (max(x) - min(x)) b = -min(x) * a

in the equation:

xq = a x + b

This requires multiplication, which is undesirable. So, we enforce weights to be between -1 and 1 (max(x) = 1 and min(x) = -1), and separating the sign from the rest of the number as we make this function symmetric, thus resulting in the following approximation.

  1. max(x) = +1, min(x) = -1

  2. max(x) = -min(x)

a = pow(2,bits-1) b = 0

Finally, just remember that to represent the number with sign, the largest representation is -pow(2,bits) to pow(2, bits-1)

Symmetric and keep_negative allow us to generate numbers that are symmetric (same number of negative and positive representations), and numbers that are positive.

Note

the behavior of quantized_bits is different than Catapult HLS ac_fixed or Vivado HLS ap_fixed. For ac_fixed<word_length, integer_lenth, signed>, when signed = true, it is equavlent to quantized_bits(word_length, integer_length-1, keep_negative=True)

bits

number of bits to perform quantization.

integer

number of bits to the left of the decimal point.

symmetric

if true, we will have the same number of values for positive and negative numbers.

alpha

a tensor or None, the scaling factor per channel. If None, the scaling factor is 1 for all channels.

keep_negative

if true, we do not clip negative numbers.

use_stochastic_rounding

if true, we perform stochastic rounding.

scale_axis

int or List[int] which axis/axes to calculate scale from.

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the tf.Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic tf.Variables or not.

elements_per_scale

if set to an int or List[int], we create multiple scales per axis across scale_axis, where ‘elements_per_scale’ represents the number of elements/values associated with every separate scale value. It is only supported when using “auto_po2”.

min_po2_exponent

if set while using “auto_po2”, it represents the minimum allowed power of two exponent.

max_po2_exponent

if set while using “auto_po2”, it represents the maximum allowed power of two exponent.

Returns:

Function that computes fixed-point quantization with bits.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get maximum value that quantized_bits class can represent.

min()[source]

Get minimum value that quantized_bits class can represent.

range()[source]

Returns a list of all values that quantized_bits can represent ordered by their binary representation ascending.

quantized_relu

class quantizer.quantizers.quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0, use_stochastic_rounding=False, relu_upper_bound=None, is_quantized_clip=True, qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False)[source]

Bases: BaseQuantizer

Computes a quantized relu to a number of bits.

Modified from:

[https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow]

Assume h(x) = +1 with p = sigmoid(x), -1 otherwise, the expected value of h(x) is:

E[h(x)] = +1 P(p <= sigmoid(x)) - 1 P(p > sigmoid(x))

= +1 P(p <= sigmoid(x)) - 1 ( 1 - P(p <= sigmoid(x)) ) = 2 P(p <= sigmoid(x)) - 1 = 2 sigmoid(x) - 1, if p is sampled from a uniform distribution U[0,1]

If use_sigmoid is 0, we just keep the positive numbers up to 2**integer * (1 - 2**(-bits)) instead of normalizing them, which is easier to implement in hardware.

bits

number of bits to perform quantization.

integer

number of bits to the left of the decimal point.

use_sigmoid

if true, we apply sigmoid to input to normalize it.

negative_slope

slope when activation < 0, needs to be power of 2.

use_stochastic_rounding

if true, we perform stochastic rounding.

relu_upper_bound

A float representing an upper bound of the unquantized relu. If None, we apply relu without the upper bound when “is_quantized_clip” is set to false (true by default). Note: The quantized relu uses the quantization parameters (bits and integer) to upper bound. So it is important to set relu_upper_bound appropriately to the quantization parameters. “is_quantized_clip” has precedence over “relu_upper_bound” for backward compatibility.

is_quantized_clip

A boolean representing whether the inputs are clipped to the maximum value represented by the quantization parameters. This parameter is deprecated, and the default is set to True for backwards compatibility. Users are encouraged to use “relu_upper_bound” instead.

qnoise_factor

float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.

var_name

String or None. A variable name shared between the tf.Variables created in the build function. If None, it is generated automatically.

use_ste

Bool. Whether to use “straight-through estimator” (STE) method or not.

use_variables

Bool. Whether to make the quantizer variables to be dynamic tf.Variables or not.

Returns:

Function that performs relu + quantization to bits >= 0.

classmethod from_config(config)[source]
get_config()[source]
max()[source]

Get the maximum value that quantized_relu can represent.

min()[source]

Get the minimum value that quantized_relu can represent.

range()[source]

Returns a list of all values that quantized_relu can represent

ordered by their binary representation ascending.

quantized_linear

class quantizer.quantizers.quantized_linear(bits=8, integer=0, symmetric=1, keep_negative=True, alpha=1, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_variables=False)[source]

Bases: BaseQuantizer

Linear quantization with fixed number of bits.

This quantizer maps inputs to the nearest value of a fixed number of outputs that are evenly spaced, with possible scaling and stochastic rounding. This is an updated version of the legacy quantized_bits.

The core computation is:
  1. Divide the tensor by a quantization scale

  2. Clip the tensor to a specified range

  3. Round to the nearest integer

  4. Multiply the rounded result by the quantization scale

This clip range is determined by
  • The number of bits we have to represent the number

  • Whether we want to have a symmetric range or not

  • Whether we want to keep negative numbers or not

The quantization scale is defined by either the quantizer parameters or the data passed to the __call__ method. See documentation for the alpha parameter to find out more.

For backprop purposes, the quantizer uses the straight-through estimator for the rounding step (https://arxiv.org/pdf/1903.05662.pdf). Thus the gradient of the __call__ method is 1 on the interval [quantization_scale * clip_min, quantization_scale * clip_max] and 0 elsewhere.

The quantizer also supports a number of other optional features: - Stochastic rounding (see the stochastic_rounding parameter) - Quantization noise (see the qnoise_factor parameter)

Notes on the various “scales” in quantized_linear:

  • The quantization scale is the scale used in the core computation (see above). You can access it via the quantization_scale attribute.

  • The data type scale is the scale is determined by the type of data stored on hardware on a small device running a true quantized model. It is the quantization scale needed to represent bits bits, integer of which are integer bits, and one bit is reserved for the sign if keep_negative is True. It can be calculated as 2 ** (integer - bits + keep_negative). You can access it via the data_type_scale attribute.

  • The scale attribute stores the quotient of the quantization scale and the data type scale. This is also the scale that can be directly specified by the user, via the alpha parameter.

These three quantities are related by the equation scale = quantization_scale / data_type_scale.

See the diagram below of scale usage in a quantized conv layer. +————————————————————————+ | data_type_scale —————> stored_weights | | (determines decimal point) | | | V | | conv op | | | | | V | | accumulator | | | | | determines quantization V | | range and precision —————> quantization_scale | | (per channel) | | | V | | activation | +————————————————————————+

# TODO: The only fundamentally necessary scale is the quantization scale. # We should consider removing the data type scale and scale attributes, # but know that this will require rewriting much of how qtools and HLS4ML # use these scale attributes.

Note on binary quantization (bits=1):

The core computation is modified here when keep_negative is True to perform a scaled sign function. This is needed because the core computation as defined above requires that 0 be mapped to 0, which does not allow us to keep both positive and negative outputs for binary quantization. Special shifting operations are used to achieve this.

Example usage:

# 8-bit quantization with 3 integer bits >>> q = quantized_linear(8, 3) >>> x = tf.constant([0.0, 0.5, 1.0, 1.5, 2.0]) >>> q(x).numpy() array([0., 0., 1., 2., 2.], dtype=float32)

# 2-bit quantization with “auto” and tensor alphas >>> q_auto = quantized_linear(2, alpha=”auto”) >>> x = tf.constant([0.0, 0.5, 1.0, 1.5, 2.0]) >>> q_auto(x).numpy() array([0., 0., 0., 2., 2.], dtype=float32) >>> q_auto.scale.numpy() array([4.], dtype=float32) >>> q_auto.quantization_scale.numpy() array([2.], dtype=float32) >>> q_fixed = quantized_linear(2, alpha=q_auto.scale) >>> q_fixed(x) array([0., 0., 0., 2., 2.], dtype=float32)

Parameters:
  • bits (int) – Number of bits to represent the number. Defaults to 8.

  • integer (int) – Number of bits to the left of the decimal point, used for data_type_scale. Defaults to 0.

  • symmetric (bool) – If true, we will have the same number of values for positive and negative numbers. Defaults to True.

  • alpha (str, Tensor, None) –

    Instructions for determining the quantization scale. Defaults to None.

    • If None: the quantization scale is the data type scale, determined by integer, bits, and keep_negative.

    • If “auto”, the quantization scale is calculated as the minimum floating point scale per-channel that does not clip the max of x.

    • If “auto_po2”, the quantization scale is chosen as the power of two per-channel that minimizes squared error between the quantized x and the original x.

    • If Tensor: The quantization scale is the Tensor passed in multiplied by the data type scale.

  • keep_negative (bool) – If false, we clip negative numbers. Defaults to True.

  • use_stochastic_rounding (bool) – If true, we perform stochastic rounding (https://arxiv.org/pdf/1502.02551.pdf).

  • scale_axis (int, None) – Which axis to calculate scale from. If None, we perform per-channel scaling based off of the image data format. Note that each entry of a rank-1 tensor is considered its own channel by default. See _get_scaling_axis for more details. Defaults to None.

  • qnoise_factor (float) – A scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor) * unquantized_x + qnoise_factor * quantized_x. Defaults to 1.0, which means that the result is fully quantized.

  • use_variables (bool) – If true, we use tf.Variables to store certain parameters. See the BaseQuantizer implementation for more details. Defaults to False. If set to True, be sure to use the special attribute update methods detailed in the BaseQuantizer.

  • var_name (str or None) – A variable name shared between the tf.Variables created in on initialization, if use_variables is true. If None, the variable names are generated automatically based on the parameter names along with a uid. Defaults to None.

Returns:

Function that computes linear quantization.

Return type:

function

Raises:

ValueError

  • If bits is not positive, or is too small to represent integer. - If integer is negative. - If alpha is a string but not one of (“auto”, “auto_po2”).

ALPHA_STRING_OPTIONS = ('auto', 'auto_po2')
property auto_alpha

Returns true if using a data-dependent alpha

property bits
property data_type_scale

Quantization scale for the data type

property default_quantization_scale

Calculate and set quantization_scale default

classmethod from_config(config)[source]
get_clip_bounds()[source]

Get bounds of clip range

get_config()[source]
property integer
property keep_negative
max()[source]

Get maximum value that quantized_linear class can represent.

min()[source]

Get minimum value that quantized_linear class can represent.

range()[source]

Returns a list of all values that quantized_linear can represent }.

property scale
property scale_axis
property use_sign_function

Return true if using sign function for quantization

property use_stochastic_rounding
property use_variables