Skip to content

Quantizer(Custom Precision Quantization by Layer Name)

Description

netspresso.quantizer.quantizer.Quantizer

Bases: NetsPressoBase

custom_precision_quantization_by_layer_name(input_model_path, output_dir, dataset_path, precision_by_layer_name, default_weight_precision=QuantizationPrecision.INT8, default_activation_precision=QuantizationPrecision.INT8, metric=SimilarityMetric.SNR, input_layers=None, wait_until_done=True, sleep_interval=30)

Apply custom precision quantization to a model, specifying precision for each layer name.

This function allows precise control over the quantization process by enabling the user to specify quantization precision (e.g., INT8, FP16) for each named layer within the model. The precision_by_layer_name parameter provides a list where each item details the target precision for a specific layer name, enabling customized quantization that can enhance model performance or compatibility.

Users can target specific layers to be quantized to lower precision for optimized model size and performance while keeping critical layers at higher precision for accuracy. Layers not explicitly listed in precision_by_layer_name will use default_weight_precision and default_activation_precision.

Parameters:

Name Type Description Default
input_model_path str

The file path where the model is located.

required
output_dir str

The local folder path to save the quantized model.

required
dataset_path str

Path to the dataset. Useful for certain quantizations.

required
precision_by_layer_name List[PrecisionByLayer]

List of PrecisionByLayer objects that specify the desired precision for each layer name in the model. Each entry includes: name (str): The layer name (e.g., /backbone/conv_first/block/act/Mul_output_0). precision (QuantizationPrecision): The quantization precision level.

required
default_weight_precision QuantizationPrecision

Weight precision.

INT8
default_activation_precision QuantizationPrecision

Activation precision.

INT8
metric SimilarityMetric

Quantization quality metrics.

SNR
input_layers List[InputShape]

Target input shape for quantization (e.g., dynamic batch to static batch).

None
wait_until_done bool

If True, wait for the quantization result before returning the function. If False, request the quantization and return immediately.

True
sleep_interval int

Interval in seconds between checks when wait_until_done is True.

30

Raises:

Type Description
e

If an error occurs during the model quantization.

Returns:

Name Type Description
QuantizerMetadata QuantizerMetadata

Quantization metadata containing status, paths, etc.

Examples

from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision


netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")

quantizer = netspresso.quantizer()

recommendation_metadata = quantizer.get_recommendation_precision(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/automatic_quantization",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    weight_precision=QuantizationPrecision.INT8,
    activation_precision=QuantizationPrecision.INT8,
    threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)

quantization_result = quantizer.custom_precision_quantization_by_layer_name(
    input_model_path="./examples/sample_models/test.onnx",
    output_dir="./outputs/quantized/custom_precision_quantization_by_layer_name",
    dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
    precision_by_layer_name=recommendation_precisions.layers,
)