Normalisation

Summary

Overview

The normalization step is a clean-up phase aimed at preparing data for integration into the Product Information Management (PIM) system. It ensures that product data is correctly formatted, validated, and ready for further processing.

During normalization, four main operations are carried out:

  1. Type Checking: Ensures that attributes have the correct data type. For example, if a field is expected to be a number but the value is a string ("19"), it will be converted to a number. If the value is non-convertible (e.g., "AA"), an error is raised.
  2. Requirement Level Verification: Validates that mandatory fields are filled in according to their requirement level. If a field marked as "required" is empty, an error is raised.
  3. Transformations: Executes predefined data transformations to modify or enhance field values.
  4. Rule and Warning Validation: Ensures that fields comply with specific business rules and warnings, raising alerts or errors where necessary.

This step helps ensure the integrity and quality of data before it moves on to subsequent processes.

Use Cases

  1. Basic data normalization: The primary use of the normalization step is to make sure that product data have the expected types, rules (what needs to be completed or not), and requirement levels. This prepares the data for subsequent steps such as categorization or export.
  2. Error and warning handling: The system then, automatically flags any errors or warnings for missing required fields or incorrect data types, allowing users to quickly address and fix these issues.
  3. Transformation execution: When users update fields, the system runs transformation logic automatically, ensuring that data follows any necessary modifications or formatting rules.
  4. Selective normalization: Administrators can decide which fields to include or exclude from the normalization process, using the exclude_fields param, depending on the context or the specific data requirements for each product group.

Interface

The user interface for the normalization step provides three key tabs:

  • Rows to Check: Displays all rows containing errors or warnings that need attention.
  • Rows Checked by You: Shows rows that have been validated and corrected by the user.
  • Rows Checked by the AI: Displays rows with no issues detected.

When an error is detected, a counter in the first column will indicate how many fields have issues. A warning or error icon appears next to the affected fields.

When modifying a row:

  • By default, only the fields with errors are displayed. Users can choose to display all fields by clicking on the "Show all the fields in the row" button.
  • If a field with a transformation rule is updated, all transformations for that field will automatically re-execute. An information icon alerts the user when this occurs.
  • Warnings can be bypassed by clicking on the "Ignore Warnings" button if only warnings are present (i.e., no critical errors).

More info in our dedicated guide

 

Configuration

Always be sure to refer to the API docs. However the API docs are always up to date and should remain a source of truth. It is recommended to double check the info presented here against it.

 

The normalization step can be configured using the following parameters:

You can choose to run this step in manual mode or in fast mode

  • In fast mode, if the normalization step is finished without any errors or warnings, the user will automatically be sent to the next step.
  • In manual mode, the user will have to review what happened during the normalization step, regardless if it has errors or not.

Field Inclusion/Exclusion

  • exclude_fields and include_fields: Specify which fields should be excluded or included in the normalization process. This accepts a list of objects with:
    • name: The name of the field. If null, all fields are included.
    • groups[]: A list of groups. If null, all groups are included. [null] selects fields in the null group (applies to all products).

Fields Configuration

  • fields: Overrides for specific fields, allowing rule removal for just this normalization step. However, fields/groups can only be overridden if they already exist in the project—no new fields can be added. All field rules must be reset, as partial overrides are not allowed.

Additional Parameters

  • exclude_fields/include_fields logic: If both parameters are used, the system first includes the specified fields in include_fields and then excludes fields listed in exclude_fields from the included fields.
    • groups: null includes all fields.
    • groups: [null] includes only fields from the null group. The null group is a default group that contains all fields.
    • groups: [] includes no fields (equivalent to not specifying the field).
  • keep_excluded_fields: Default is true. This determines whether to retain fields even if they are excluded from the normalization process. For example, if a field such as "rpm" is filled in for a t-shirt but the attribute is inactive, it will still be retained if this option is set to true. Otherwise, it will be discarded.
  • max_requirement_level: Limits the highest requirement level for fields. For instance, if a field is marked as "required," but the max_requirement_level is set to "optional" for this step, the field becomes optional, and no error will be raised if it is not filled.
  • sources: Allows fields to be displayed (but not edited) in the side panel to help users fill in error-prone fields.

Transformations

The normalization step supports various transformations that can be applied to field values. These transformations help clean and standardize data before further processing.

Refer to the following article to see which transformations are available: Transformation types

 

Rules & Warnings

The normalization step supports various field rules and warnings.

  • A rule is a hard requirement, fields that fail to pass that requirement will block the job and users will not be able to go beyond the normalization step
  • A warning is a soft requirement. Users can acknowledge that the field does not pass the configured rules, and then go to the next step.

Refer to the following article to see which rules and warnings are available: Rule & warning types

 

Limitations and Known Issues

  1. Partial overrides not supported: The system does not support partial overrides for fields or groups. If you need to override rules for a field, you must re-specify all the field rules instead of modifying only specific aspects.
  2. Exclusion behavior: By default, excluded_fields are retained in the data unless explicitly configured otherwise. This may lead to retaining irrelevant or unwanted data unless properly configured.
  3. Requirement level override: When max_requirement_level is used to override field requirements, fields marked as "required" may no longer raise an error if left empty, which can potentially lead to incomplete data if not properly managed.
  4. Transformation execution: When a user modifies a field, the associated transformations run automatically. This can sometimes lead to unexpected results if the user is unaware of the transformation rules applied to the field.