Supported and tested quantizers
BaseQuantizer
- class quantizer.quantizers.BaseQuantizer[source]
Bases:
Module
Base quantizer
Defines behavior all quantizers should follow.
- property non_trainable_variables
Sequence of non-trainable variables owned by this module and its submodules.
Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don’t expect the return value to change.
- Returns:
A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).
- property trainable_variables
Sequence of trainable variables owned by this module and its submodules.
Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don’t expect the return value to change.
- Returns:
A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).
- property variables
Sequence of variables owned by this module and its submodules.
Note: this method uses reflection to find variables on the current instance and submodules. For performance reasons you may wish to cache the result of calling this method if you don’t expect the return value to change.
- Returns:
A sequence of variables for the current module (sorted by attribute name) followed by variables from all submodules recursively (breadth first).
quantized_bits
- class quantizer.quantizers.quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=True, alpha=1, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False, elements_per_scale=None, min_po2_exponent=None, max_po2_exponent=None)[source]
Bases:
BaseQuantizer
Legacy quantizer: Quantizes the number to a number of bits.
In general, we want to use a quantization function like:
a = (pow(2,bits) - 1 - 0) / (max(x) - min(x)) b = -min(x) * a
in the equation:
xq = a x + b
This requires multiplication, which is undesirable. So, we enforce weights to be between -1 and 1 (max(x) = 1 and min(x) = -1), and separating the sign from the rest of the number as we make this function symmetric, thus resulting in the following approximation.
max(x) = +1, min(x) = -1
max(x) = -min(x)
a = pow(2,bits-1) b = 0
Finally, just remember that to represent the number with sign, the largest representation is -pow(2,bits) to pow(2, bits-1)
Symmetric and keep_negative allow us to generate numbers that are symmetric (same number of negative and positive representations), and numbers that are positive.
Note
the behavior of quantized_bits is different than Catapult HLS ac_fixed or Vivado HLS ap_fixed. For ac_fixed<word_length, integer_lenth, signed>, when signed = true, it is equavlent to quantized_bits(word_length, integer_length-1, keep_negative=True)
- bits
number of bits to perform quantization.
- integer
number of bits to the left of the decimal point.
- symmetric
if true, we will have the same number of values for positive and negative numbers.
- alpha
a tensor or None, the scaling factor per channel. If None, the scaling factor is 1 for all channels.
- keep_negative
if true, we do not clip negative numbers.
- use_stochastic_rounding
if true, we perform stochastic rounding.
- scale_axis
int or List[int] which axis/axes to calculate scale from.
- qnoise_factor
float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.
- var_name
String or None. A variable name shared between the tf.Variables created in the build function. If None, it is generated automatically.
- use_ste
Bool. Whether to use “straight-through estimator” (STE) method or not.
- use_variables
Bool. Whether to make the quantizer variables to be dynamic tf.Variables or not.
- elements_per_scale
if set to an int or List[int], we create multiple scales per axis across scale_axis, where ‘elements_per_scale’ represents the number of elements/values associated with every separate scale value. It is only supported when using “auto_po2”.
- min_po2_exponent
if set while using “auto_po2”, it represents the minimum allowed power of two exponent.
- max_po2_exponent
if set while using “auto_po2”, it represents the maximum allowed power of two exponent.
- Returns:
Function that computes fixed-point quantization with bits.
quantized_relu
- class quantizer.quantizers.quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0, use_stochastic_rounding=False, relu_upper_bound=None, is_quantized_clip=True, qnoise_factor=1.0, var_name=None, use_ste=True, use_variables=False)[source]
Bases:
BaseQuantizer
Computes a quantized relu to a number of bits.
Modified from:
[https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow]
Assume h(x) = +1 with p = sigmoid(x), -1 otherwise, the expected value of h(x) is:
- E[h(x)] = +1 P(p <= sigmoid(x)) - 1 P(p > sigmoid(x))
= +1 P(p <= sigmoid(x)) - 1 ( 1 - P(p <= sigmoid(x)) ) = 2 P(p <= sigmoid(x)) - 1 = 2 sigmoid(x) - 1, if p is sampled from a uniform distribution U[0,1]
If use_sigmoid is 0, we just keep the positive numbers up to 2**integer * (1 - 2**(-bits)) instead of normalizing them, which is easier to implement in hardware.
- bits
number of bits to perform quantization.
- integer
number of bits to the left of the decimal point.
- use_sigmoid
if true, we apply sigmoid to input to normalize it.
- negative_slope
slope when activation < 0, needs to be power of 2.
- use_stochastic_rounding
if true, we perform stochastic rounding.
- relu_upper_bound
A float representing an upper bound of the unquantized relu. If None, we apply relu without the upper bound when “is_quantized_clip” is set to false (true by default). Note: The quantized relu uses the quantization parameters (bits and integer) to upper bound. So it is important to set relu_upper_bound appropriately to the quantization parameters. “is_quantized_clip” has precedence over “relu_upper_bound” for backward compatibility.
- is_quantized_clip
A boolean representing whether the inputs are clipped to the maximum value represented by the quantization parameters. This parameter is deprecated, and the default is set to True for backwards compatibility. Users are encouraged to use “relu_upper_bound” instead.
- qnoise_factor
float. a scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor)*unquantized_x + qnoise_factor*quantized_x.
- var_name
String or None. A variable name shared between the tf.Variables created in the build function. If None, it is generated automatically.
- use_ste
Bool. Whether to use “straight-through estimator” (STE) method or not.
- use_variables
Bool. Whether to make the quantizer variables to be dynamic tf.Variables or not.
- Returns:
Function that performs relu + quantization to bits >= 0.
quantized_linear
- class quantizer.quantizers.quantized_linear(bits=8, integer=0, symmetric=1, keep_negative=True, alpha=1, use_stochastic_rounding=False, scale_axis=None, qnoise_factor=1.0, var_name=None, use_variables=False)[source]
Bases:
BaseQuantizer
Linear quantization with fixed number of bits.
This quantizer maps inputs to the nearest value of a fixed number of outputs that are evenly spaced, with possible scaling and stochastic rounding. This is an updated version of the legacy quantized_bits.
- The core computation is:
Divide the tensor by a quantization scale
Clip the tensor to a specified range
Round to the nearest integer
Multiply the rounded result by the quantization scale
- This clip range is determined by
The number of bits we have to represent the number
Whether we want to have a symmetric range or not
Whether we want to keep negative numbers or not
The quantization scale is defined by either the quantizer parameters or the data passed to the __call__ method. See documentation for the alpha parameter to find out more.
For backprop purposes, the quantizer uses the straight-through estimator for the rounding step (https://arxiv.org/pdf/1903.05662.pdf). Thus the gradient of the __call__ method is 1 on the interval [quantization_scale * clip_min, quantization_scale * clip_max] and 0 elsewhere.
The quantizer also supports a number of other optional features: - Stochastic rounding (see the stochastic_rounding parameter) - Quantization noise (see the qnoise_factor parameter)
Notes on the various “scales” in quantized_linear:
The quantization scale is the scale used in the core computation (see above). You can access it via the quantization_scale attribute.
The data type scale is the scale is determined by the type of data stored on hardware on a small device running a true quantized model. It is the quantization scale needed to represent bits bits, integer of which are integer bits, and one bit is reserved for the sign if keep_negative is True. It can be calculated as 2 ** (integer - bits + keep_negative). You can access it via the data_type_scale attribute.
The scale attribute stores the quotient of the quantization scale and the data type scale. This is also the scale that can be directly specified by the user, via the alpha parameter.
These three quantities are related by the equation scale = quantization_scale / data_type_scale.
See the diagram below of scale usage in a quantized conv layer. +————————————————————————+ | data_type_scale —————> stored_weights | | (determines decimal point) | | | V | | conv op | | | | | V | | accumulator | | | | | determines quantization V | | range and precision —————> quantization_scale | | (per channel) | | | V | | activation | +————————————————————————+
# TODO: The only fundamentally necessary scale is the quantization scale. # We should consider removing the data type scale and scale attributes, # but know that this will require rewriting much of how qtools and HLS4ML # use these scale attributes.
- Note on binary quantization (bits=1):
The core computation is modified here when keep_negative is True to perform a scaled sign function. This is needed because the core computation as defined above requires that 0 be mapped to 0, which does not allow us to keep both positive and negative outputs for binary quantization. Special shifting operations are used to achieve this.
Example usage:
# 8-bit quantization with 3 integer bits >>> q = quantized_linear(8, 3) >>> x = tf.constant([0.0, 0.5, 1.0, 1.5, 2.0]) >>> q(x).numpy() array([0., 0., 1., 2., 2.], dtype=float32)
# 2-bit quantization with “auto” and tensor alphas >>> q_auto = quantized_linear(2, alpha=”auto”) >>> x = tf.constant([0.0, 0.5, 1.0, 1.5, 2.0]) >>> q_auto(x).numpy() array([0., 0., 0., 2., 2.], dtype=float32) >>> q_auto.scale.numpy() array([4.], dtype=float32) >>> q_auto.quantization_scale.numpy() array([2.], dtype=float32) >>> q_fixed = quantized_linear(2, alpha=q_auto.scale) >>> q_fixed(x) array([0., 0., 0., 2., 2.], dtype=float32)
- Parameters:
bits (int) – Number of bits to represent the number. Defaults to 8.
integer (int) – Number of bits to the left of the decimal point, used for data_type_scale. Defaults to 0.
symmetric (bool) – If true, we will have the same number of values for positive and negative numbers. Defaults to True.
alpha (str, Tensor, None) –
Instructions for determining the quantization scale. Defaults to None.
If None: the quantization scale is the data type scale, determined by integer, bits, and keep_negative.
If “auto”, the quantization scale is calculated as the minimum floating point scale per-channel that does not clip the max of x.
If “auto_po2”, the quantization scale is chosen as the power of two per-channel that minimizes squared error between the quantized x and the original x.
If Tensor: The quantization scale is the Tensor passed in multiplied by the data type scale.
keep_negative (bool) – If false, we clip negative numbers. Defaults to True.
use_stochastic_rounding (bool) – If true, we perform stochastic rounding (https://arxiv.org/pdf/1502.02551.pdf).
scale_axis (int, None) – Which axis to calculate scale from. If None, we perform per-channel scaling based off of the image data format. Note that each entry of a rank-1 tensor is considered its own channel by default. See _get_scaling_axis for more details. Defaults to None.
qnoise_factor (float) – A scalar from 0 to 1 that represents the level of quantization noise to add. This controls the amount of the quantization noise to add to the outputs by changing the weighted sum of (1 - qnoise_factor) * unquantized_x + qnoise_factor * quantized_x. Defaults to 1.0, which means that the result is fully quantized.
use_variables (bool) – If true, we use tf.Variables to store certain parameters. See the BaseQuantizer implementation for more details. Defaults to False. If set to True, be sure to use the special attribute update methods detailed in the BaseQuantizer.
var_name (str or None) – A variable name shared between the tf.Variables created in on initialization, if use_variables is true. If None, the variable names are generated automatically based on the parameter names along with a uid. Defaults to None.
- Returns:
Function that computes linear quantization.
- Return type:
function
- Raises:
ValueError –
If bits is not positive, or is too small to represent integer. - If integer is negative. - If alpha is a string but not one of (“auto”, “auto_po2”).
- ALPHA_STRING_OPTIONS = ('auto', 'auto_po2')
- property auto_alpha
Returns true if using a data-dependent alpha
- property bits
- property data_type_scale
Quantization scale for the data type
- property default_quantization_scale
Calculate and set quantization_scale default
- property integer
- property keep_negative
- property scale
- property scale_axis
- property use_sign_function
Return true if using sign function for quantization
- property use_stochastic_rounding
- property use_variables