PoolFormer

We provide an implementation and pretrained weights for the PoolFormer models.

Paper: PoolFormer: MetaFormer is Actually What You Need for Vision. [arXiv:2111.11418].

Original pytorch code and weights from poolformer repository.

The following models are available.

poolformer_s12
poolformer_s24
poolformer_s36
poolformer_m36
poolformer_m48

class PoolFormerConfig(name='', url='', nb_classes=1000, in_channels=3, input_size=(224, 224), embed_dim=(64, 128, 320, 512), nb_blocks=(2, 2, 6, 2), mlp_ratio=(4.0, 4.0, 4.0, 4.0), drop_rate=0.0, drop_path_rate=0.0, norm_layer='group_norm_1grp', act_layer='gelu', init_scale=1e-05, crop_pct=0.95, interpolation='bicubic', mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225), first_conv='patch_embed/proj', classifier='head')[source]

Configuration class for PoolFormer models.

Parameters:

name (str) – Name of the model.
url (str) – URL for pretrained weights.
nb_classes (int) – Number of classes for classification head.
in_channels (int) – Number of input image channels.
input_size (Tuple[int, int]) – Input image size (height, width)
embed_dim (Tuple) – Feature dimensions at each stage.
nb_blocks (Tuple) – Number of blocks at each stage.
mlp_ratio (Tuple) – Ratio of mlp hidden dim to embedding dim
drop_rate (float) – Dropout rate.
drop_path_rate (float) – Dropout rate for stochastic depth.
norm_layer (str) – Normalization layer. See norm_layer_factory() for possible values.
act_layer (str) – Activation function. See act_layer_factory() for possible values.
init_scale (float) – Inital value for layer scale weights.
crop_pct (float) – Crop percentage for ImageNet evaluation.
interpolation (str) – Interpolation method for ImageNet evaluation.
mean (Tuple[float, float, float]) – Defines preprocessing function. If x is an image with pixel values in (0, 1), the preprocessing function is (x - mean) / std.
std (Tuple[float, float, float]) – Defines preprpocessing function.
first_conv (str) – Name of first convolutional layer. Used by create_model() to adapt the number in input channels when loading pretrained weights.
classifier (str) – Name of classifier layer. Used by create_model() to adapt the classifier when loading pretrained weights.

class PoolFormer(*args, **kwargs)[source]

Class implementing a PoolFormer network.

Paper: PoolFormer: MetaFormer is Actually What You Need for Vision. [arXiv:2111.11418].

Parameters:

cfg (PoolFormerConfig) – Configuration class for the model.
**kwargs – Arguments are passed to tf.keras.Model.

call(x, training=False, return_features=False)[source]

Forward pass through the full model.

Parameters:

x – Input to model
training (bool) – Training or inference phase?
return_features (bool) – If True, we return not only the model output, but a dictionary with intermediate features.

Returns:

If return_features=True, we return a tuple (y, features), where y is the model output and features is a dictionary with intermediate features.

If return_features=False, we return only y.

property dummy_inputs: Tensor[source]: Returns a tensor of the correct shape for inference.

property feature_names: List[str][source]: Names of features, returned when calling call with return_features=True.

forward_features(x, training=False, return_features=False)[source]

Forward pass through model, excluding the classifier layer. This function is useful if the model is used as input for downstream tasks such as object detection.

Parameters:

x – Input to model
training (bool) – Training or inference phase?
return_features (bool) – If True, we return not only the model output, but a dictionary with intermediate features.

Returns:

If return_features=True, we return a tuple (y, features), where y is the model output and features is a dictionary with intermediate features.

If return_features=False, we return only y.