Quantizer(Custom Precision Quantization by Layer Name)¶
Description¶
netspresso.quantizer.quantizer.Quantizer
¶
Bases: NetsPressoBase
custom_precision_quantization_by_layer_name(input_model_path, output_dir, dataset_path, precision_by_layer_name, default_weight_precision=QuantizationPrecision.INT8, default_activation_precision=QuantizationPrecision.INT8, metric=SimilarityMetric.SNR, input_layers=None, wait_until_done=True, sleep_interval=30)
¶
Apply custom precision quantization to a model, specifying precision for each layer name.
This function allows precise control over the quantization process by enabling the user to
specify quantization precision (e.g., INT8, FP16) for each named layer within the model.
The precision_by_layer_name
parameter provides a list where each item details the target
precision for a specific layer name, enabling customized quantization that can enhance
model performance or compatibility.
Users can target specific layers to be quantized to lower precision for optimized model
size and performance while keeping critical layers at higher precision for accuracy.
Layers not explicitly listed in precision_by_layer_name
will use
default_weight_precision
and default_activation_precision
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_model_path |
str
|
The file path where the model is located. |
required |
output_dir |
str
|
The local folder path to save the quantized model. |
required |
dataset_path |
str
|
Path to the dataset. Useful for certain quantizations. |
required |
precision_by_layer_name |
List[PrecisionByLayer]
|
List of |
required |
default_weight_precision |
QuantizationPrecision
|
Weight precision. |
INT8
|
default_activation_precision |
QuantizationPrecision
|
Activation precision. |
INT8
|
metric |
SimilarityMetric
|
Quantization quality metrics. |
SNR
|
input_layers |
List[InputShape]
|
Target input shape for quantization (e.g., dynamic batch to static batch). |
None
|
wait_until_done |
bool
|
If True, wait for the quantization result before returning the function. If False, request the quantization and return immediately. |
True
|
sleep_interval |
int
|
Interval in seconds between checks when |
30
|
Raises:
Type | Description |
---|---|
e
|
If an error occurs during the model quantization. |
Returns:
Name | Type | Description |
---|---|---|
QuantizerMetadata |
QuantizerMetadata
|
Quantization metadata containing status, paths, etc. |
Examples¶
from netspresso import NetsPresso
from netspresso.enums import QuantizationPrecision
netspresso = NetsPresso(email="YOUR_EMAIL", password="YOUR_PASSWORD")
quantizer = netspresso.quantizer()
recommendation_metadata = quantizer.get_recommendation_precision(
input_model_path="./examples/sample_models/test.onnx",
output_dir="./outputs/quantized/automatic_quantization",
dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
weight_precision=QuantizationPrecision.INT8,
activation_precision=QuantizationPrecision.INT8,
threshold=0,
)
recommendation_precisions = quantizer.load_recommendation_precision_result(recommendation_metadata.recommendation_result_path)
quantization_result = quantizer.custom_precision_quantization_by_layer_name(
input_model_path="./examples/sample_models/test.onnx",
output_dir="./outputs/quantized/custom_precision_quantization_by_layer_name",
dataset_path="./examples/sample_datasets/pickle_calibration_dataset_128x128.npy",
precision_by_layer_name=recommendation_precisions.layers,
)