Skip to content

NormalizingFeaturizer

src.geostat.model.NormalizingFeaturizer

NormalizingFeaturizer class for producing normalized feature matrices (F matrix) with an intercept.

The NormalizingFeaturizer takes raw location data and applies a specified featurization function. It normalizes the resulting features and remembers normalization parameters using the mean and standard deviation calculated from the original data and adds an intercept feature (a column of ones) to the matrix.

Parameters:

  • featurization (Callable) –

    A function or callable that defines how the input location data should be featurized.

  • locs (array - like or Tensor) –

    The input location data used for calculating normalization parameters (mean and standard deviation) and featurizing new data.

Examples:

Creating a NormalizingFeaturizer using a custom featurization function and location data:

import tensorflow as tf
from geostat.model import NormalizingFeaturizer

# Define a simple featurization function
def custom_featurizer(x, y):
    return x, y, x * y

# Sample location data
locs = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

# Create the NormalizingFeaturizer
norm_featurizer = NormalizingFeaturizer(custom_featurizer, locs)

Using the NormalizingFeaturizer to featurize new location data:

new_locs = tf.constant([[7.0, 8.0], [9.0, 10.0]])
F_matrix = norm_featurizer(new_locs)
print(F_matrix) # F_matrix will contain normalized features with an additional intercept column
# tf.Tensor(
# [[1.        2.4494898 2.4494898 3.5676992]
#  [1.        3.6742349 3.6742349 6.50242  ]], shape=(2, 4), dtype=float32)

Notes:

  • The normalization parameters (unnorm_mean and unnorm_std) are calculated based on the initial locs data provided during initialization.
  • The __call__ method applies the normalization and adds an intercept feature when used to featurize new location data.
Source code in src/geostat/model.py
class NormalizingFeaturizer:
    """
    NormalizingFeaturizer class for producing normalized feature matrices (F matrix) with an intercept.

    The `NormalizingFeaturizer` takes raw location data and applies a specified featurization function.
    It normalizes the resulting features and remembers normalization parameters using the mean and standard deviation calculated from the 
    original data and adds an intercept feature (a column of ones) to the matrix.

    Parameters:
        featurization (Callable):
            A function or callable that defines how the input location data should be featurized.
        locs (array-like or Tensor):
            The input location data used for calculating normalization parameters (mean and standard 
            deviation) and featurizing new data.

    Examples:
        Creating a `NormalizingFeaturizer` using a custom featurization function and location data:

        ```python
        import tensorflow as tf
        from geostat.model import NormalizingFeaturizer

        # Define a simple featurization function
        def custom_featurizer(x, y):
            return x, y, x * y

        # Sample location data
        locs = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

        # Create the NormalizingFeaturizer
        norm_featurizer = NormalizingFeaturizer(custom_featurizer, locs)
        ```

        Using the `NormalizingFeaturizer` to featurize new location data:

        ```python
        new_locs = tf.constant([[7.0, 8.0], [9.0, 10.0]])
        F_matrix = norm_featurizer(new_locs)
        print(F_matrix) # F_matrix will contain normalized features with an additional intercept column
        # tf.Tensor(
        # [[1.        2.4494898 2.4494898 3.5676992]
        #  [1.        3.6742349 3.6742349 6.50242  ]], shape=(2, 4), dtype=float32)
        ```

    Examples: Notes:
        - The normalization parameters (`unnorm_mean` and `unnorm_std`) are calculated based on the 
        initial `locs` data provided during initialization.
        - The `__call__` method applies the normalization and adds an intercept feature when used 
        to featurize new location data.
    """

    def __init__(self, featurization, locs):
        self.unnorm_featurizer = Featurizer(featurization)
        F_unnorm = self.unnorm_featurizer(locs)
        self.unnorm_mean = tf.reduce_mean(F_unnorm, axis=0)
        self.unnorm_std = tf.math.reduce_std(F_unnorm, axis=0)

    def __call__(self, locs):
        ones = tf.ones([tf.shape(locs)[0], 1], dtype=tf.float32)
        F_unnorm = self.unnorm_featurizer(locs)
        F_norm = (F_unnorm - self.unnorm_mean) / self.unnorm_std
        return tf.concat([ones, F_norm], axis=1)