Match History

Summary

Overview

The module is used to accomplish three main tasks:

  1. Filter out products, either automatically or manually from the job
  2. Enrich the products by joining them with a database (to create new attributes or override existing attributes)
  3. Import product into a Reference Base (usually for filtering down the line)

The filtered-out products are available in a data frame called "matched" that is available if we want to create an output format to target it.

Note that for filtering or enriching, the products will only be matched on non-empty cells. Two empty values are counted as different in this module (this is due to technical limitations and not business reasons).

 

Reference Base

What we call a Reference Base is an object inside SDM that stores dynamic data. A reference based has :

  • a name to identify it
  • a set of unique columns used to uniquely identify each entry in the base
  • a set of columns used to store any kind of data (it's JSON)
  • some rows where the unique columns must be filled

Interface

The UI is based on the table layout with 4 filters :

  • Forced kept rows : rows that were automatically filtered out but that the user decided to keep
  • Originally kept rows : rows marked as kept automatically
  • Originally skipped rows : rows filtered out automatically
  • Force skipped rows : rows that were kept but that the user changed as filtered out.

The only actions available in this step are toggling a product or products in bulk (based on a filter).

Configuration

Always be sure to refer to the API docs. However, the API documentation is always kept up to date and should be considered the primary source of truth. It is recommended to verify the information presented here with the documentation.

 

The configuration of the module is split into three parts matching its three features :

  • filters: list of filters to apply to incoming data. Each filter has the following params
    • mode : (blacklist or whitelist) whether we want to filter out products that match the base or products that don't match.
    • reference_base is optional and the name of the Reference Base used in the filter. The base has to belong to the same organization. This is used if we want to have a dynamic filter. The filtering will be done using the unique_columns of the Reference Base. Alternatively, if the filter is supposed to be static, then we can set up a list of records directly using :
    • record_list: list of key/values representing records, for exemple [{”sku”: “100”}, {”sku”: “200”}]
    • unique_keys : keys that are used to match each product with the record list, for exemple [”sku”]

In this case either reference_base or record_list and unique_keys need to be filled.

  • filter_mode : (and or or) how to combine the different filters.
  • enrichments : list of enrichments to carry.
    • columns: list of attributes to fill. The attribute needs to be present in the Reference Base / record list
    • override : boolean to decide whether we should override non-empty values in the product if there is a conflict with the matched record
    • reference_base / record_list / unique_columns have the same definition as for filters.
  • imports : this is the list of reference base that we want to update. Keep in mind that the import happen at the end of the step.
    • reference_base : the reference base to fill
    • which: can have one of the following values:
      • kept : only import products that were not filtered out
      • dropped : only import products that were filtered out
      • all : import all products

Exemple & Recipe

I want to filter out products previously imported

This is a common use case for this module : the customer only wants to use SDM to create products but not for updates so we need to filter out existing products.

To do that we will use let's say the SKU to identify products (we could also use a combination for columns such as supplier name and supplier reference).

The first step is to create a Reference Base that will store the know products. In this case we can just set unique_columns ["sku"] and name the Reference Base "known_products"

Then we will create a MatchHistory step and configure it that way :

{
    "filters": [
        {
            "mode": "blacklist",
            "reference_base": "known_products"
        }
    ],
    "filter_mode": "and",
    "imports": [
        {
            "reference_base": "known_products",
            "which": "kept"
        }
    ]
}

What is means is when we arrive at this step, we remove all products that are in the "known_products" base (based on the SKU) and, when the step is finalized, we update the "known_products" base with the products kept (for use in future jobs).

Typically we would place this step as soon as possible, i.e. just after the mapping to ensure a customer won't have to work on products that will be filtered out.

My products SKU are generated by my ERP system

This scenario is observed with some customers: the SKU is not generated by the PIM, but by the ERP. Before creating products in the PIM, the SKU needs to be filled, but it is not available to the supplier. We would handle this by using two workflows: one to update a Reference Base with the list of SKUs, and the other (the main one) to actually work on the product and add the SKU.

The first step would be to identify a set of attributes that describe uniquely a product AND are available from the supplier file. Usually this is a combination of "supplier name" and "supplier reference".

We will first create a Reference Base with the unique_columns "supplier name" and "supplier reference" and the columns "sku". Let's name it "SKU List"

The second step is to create a dedicated project to feed this Reference Base. The project will only have three attributes ("supplier name", "supplier reference", "sku") and two steps :

  1. A Mapping
  2. A MatchHistory with the following configuration
{
    "imports": [
        {
            "reference_base": "SKU List",
            "which": "kept"
        }
    ]
}

Both steps in this project can be marked as Fast as they will not need any human input. Usually, this workflow will be fed automatically via an ERP sending data to SDM via FTP.

The last step to achieve the desired outcome is to set a MatchHistory step in our main project with the following configuration

{
    "filters": [
        {
            "reference_base": "SKU List",
            "mode": "whitelist"
        }
    ],
    "filter_mode": "and",
    "enrichments": [
        {
            "reference_base": "SKU List",
            "columns": ["sku"],
            "override": true
        }
    ]
}

That way products coming in will be filtered if we don't recognize the combination of "supplier name" and "supplier reference". And we will add the matching SKUs for products we do recognize.