Skip to content

Quantizer(Automatic Quantization)

Description

netspresso.quantizer.quantizer.Quantizer

Bases: NetsPressoBase

automatic_quantization(input_model_path, output_dir, dataset_path, weight_precision=QuantizationPrecision.INT8, activation_precision=QuantizationPrecision.INT8, metric=SimilarityMetric.SNR, threshold=0, input_layers=None, wait_until_done=True, sleep_interval=30)

Apply automatic quantization to a model, specifying precision for weight & activation.

This method quantizes layers in the model based on the specified precision levels for weights and activations, while evaluating the quality of quantization using the defined metric. Only layers that meet the specified quality threshold are quantized; layers that do not meet this threshold remain unquantized to preserve model accuracy.

Parameters:

Name Type Description Default
input_model_path str

The file path where the model is located.

required
output_dir str

The local folder path to save the quantized model.

required
dataset_path str

Path to the dataset. Useful for certain quantizations.

required
weight_precision QuantizationPrecision

Weight precision

INT8
activation_precision QuantizationPrecision

Activation precision

INT8
metric SimilarityMetric

Quantization quality metrics.

SNR
threshold Union[float, int]

Quality threshold for quantization. Layers that do not meet this threshold based on the metric are not quantized.

0
input_layers List[InputShape]

Target input shape for quantization (e.g., dynamic batch to static batch).

None
wait_until_done bool

If True, wait for the quantization result before returning the function. If False, request the quantization and return the function immediately.

True

Raises:

Type Description
e

If an error occurs during the model quantization.

Returns:

Name Type Description
QuantizerMetadata QuantizerMetadata

Quantize metadata.

Examples

from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision


netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")

quantizer = netspresso.quantizer()
quantization_result = quantizer.automatic_quantization(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/automatic_quantization",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    weight_precision=QuantizationPrecision.INT8,
    activation_precision=QuantizationPrecision.INT8,
    threshold=0,
)