Skip to content

Featurizer

src.geostat.model.Featurizer

Featurizer class for producing feature matrices (F matrix) from location data.

The Featurizer applies a specified featurization function to the input location data and generates the corresponding feature matrix. If no featurization function is provided, it produces a matrix with appropriate dimensions containing only ones.

Parameters:

  • featurization (Callable or None) –

    A function that takes in the individual components of location data and returns the features. If set to None, the featurizer will produce an empty feature matrix (i.e., only ones).

Examples:

Creating a Featurizer using a custom featurization function:

import tensorflow as tf
from geostat.model import Featurizer

# Define a custom featurization function
def simple_featurizer(x, y):
    return x, y, x * y

# Initialize the Featurizer
featurizer = Featurizer(simple_featurizer)

Using the Featurizer to transform location data:

locs = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
F_matrix = featurizer(locs)
print(F_matrix) # F_matrix will contain the features: (x, y, x*y) for each location
# tf.Tensor(
# [[ 1.  2.  2.]
#  [ 3.  4. 12.]
#  [ 5.  6. 30.]], shape=(3, 3), dtype=float32)

Handling the case where no featurization is provided:

featurizer_no_feat = Featurizer(None)
F_matrix = featurizer_no_feat(locs)
print(F_matrix) # Since no featurization function is provided, F_matrix will have shape (3, 0)
# tf.Tensor([], shape=(3, 0), dtype=float32)

Notes:

  • The __call__ method is used to apply the featurization to input location data.
  • If featurization returns a tuple, it is assumed to represent multiple features, which will be stacked to form the feature matrix.
Source code in src/geostat/model.py
class Featurizer:
    """
    Featurizer class for producing feature matrices (F matrix) from location data.

    The `Featurizer` applies a specified featurization function to the input location data 
    and generates the corresponding feature matrix. If no featurization function is provided, 
    it produces a matrix with appropriate dimensions containing only ones.

    Parameters:
        featurization (Callable or None):
            A function that takes in the individual components of location data and returns the features.
            If set to `None`, the featurizer will produce an empty feature matrix (i.e., only ones).

    Examples:
        Creating a `Featurizer` using a custom featurization function:

        ```python
        import tensorflow as tf
        from geostat.model import Featurizer

        # Define a custom featurization function
        def simple_featurizer(x, y):
            return x, y, x * y

        # Initialize the Featurizer
        featurizer = Featurizer(simple_featurizer)
        ```

        Using the `Featurizer` to transform location data:

        ```python
        locs = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
        F_matrix = featurizer(locs)
        print(F_matrix) # F_matrix will contain the features: (x, y, x*y) for each location
        # tf.Tensor(
        # [[ 1.  2.  2.]
        #  [ 3.  4. 12.]
        #  [ 5.  6. 30.]], shape=(3, 3), dtype=float32)
        ```

        Handling the case where no featurization is provided:

        ```python
        featurizer_no_feat = Featurizer(None)
        F_matrix = featurizer_no_feat(locs)
        print(F_matrix) # Since no featurization function is provided, F_matrix will have shape (3, 0)
        # tf.Tensor([], shape=(3, 0), dtype=float32)
        ```

    Examples: Notes:
        - The `__call__` method is used to apply the featurization to input location data.
        - If `featurization` returns a tuple, it is assumed to represent multiple features, 
        which will be stacked to form the feature matrix.
    """

    def __init__(self, featurization):
        self.featurization = featurization

    def __call__(self, locs):
        locs = tf.cast(locs, tf.float32)
        if self.featurization is None: # No features.
            return tf.ones([tf.shape(locs)[0], 0], dtype=tf.float32)

        feats = self.featurization(*tf.unstack(locs, axis=1))
        if isinstance(feats, tuple): # One or many features.
            if len(feats) == 0:
                return tf.ones([tf.shape(locs)[0], 0], dtype=tf.float32)
            else:
                feats = self.featurization(*tf.unstack(locs, axis=1))
                feats = [tf.broadcast_to(tf.cast(f, tf.float32), [tf.shape(locs)[0]]) for f in feats]
                return tf.stack(feats, axis=1)
        else: # One feature.
            return e(feats)