Classification & Hierarchies

Summary

Overview

The Classification module in Supplier Data Manager (SDM) automates product classification within a predefined hierarchical structure. It can use AI to categorize products automatically, reducing manual effort. When AI is not enabled, all product rows are displayed for manual classification.

The module is configured via the SDM admin panel. Configuration covers field definitions, hierarchy setup, and AI model behavior.

Use cases

The SDM Classification module covers three main scenarios:

  1. AI-assisted classification: The module uses AI to suggest and assign categories to products based on their attributes. Products are grouped by specified fields (for example, product_code or color) before classification, so size variations of the same product are grouped and classified together.
  2. Manual validation: After AI classification, users can verify and correct classifications through the SDM UI.
  3. Hierarchical storage: Categorized data can be stored across multiple hierarchy levels and written to specific output fields for downstream processes such as ERP export.

Interface

The SDM Classification module UI is divided into three tabs:

  • Rows to Check: Products that need classification or manual verification.
  • Rows Checked by You: Products that have been manually validated by a user.
  • Rows Checked by the AI: Products categorized by the AI without human intervention.

When a user validates a classification, the internal dataframe is updated (internal columns use the _UNiFAi suffix), and the product moves from Rows to Check to Rows Checked by You.

Classification module frontend interface showing the three tabs: Rows to Check, Rows Checked by You, and Rows Checked by the AI

The screenshot above shows the SDM Classification module interface with the three classification tabs visible. The red tab contains rows awaiting manual review, the green tab shows rows validated by the user, and the blue tab shows rows automatically classified by the AI.

The automated_audit_rate parameter (see Business Configuration below) controls what proportion of AI-classified rows are surfaced to users for review.

Configuration

The SDM API documentation is the authoritative source for classification configuration. Cross-check any details here against the API docs, which are always up to date.

Classification in SDM requires two configuration sections:

The Hierarchy configuration must also be set up before classification can run.

Business configuration (Params)

The params object stores the business-level configuration for the SDM Classification module. All fields below are set in the SDM admin panel.

  • groupby — List of fields whose values must match for products to share a category. For example, products with the same CODE_MODELE and CODE_COULEUR are grouped together, so size variants of the same product all receive the same category. Defaults to [] (no grouping).

  • automated_audit_rate — A float between 0 and 1 defining the proportion of high-confidence AI predictions that are surfaced to the user for review. A value of 0 means no audit; a value of 1 means all AI-classified rows are reviewed. Defaults to 0.0.

  • categorisation_fields — One or more fields used for classification. Fields are flexible and not bound to specific types. A product can belong to more than one category.

    • name — Internal identifier for the category field (used as a column name in the internal dataframe).
    • label — Localizable display label shown in the SDM frontend.
    • hierarchy — The name of the hierarchy to use for this categorisation field. SDM looks up the hierarchy by name (not by ID).
    • replace_existing — When set to true (the default), the AI classifies all products. When set to false, the AI skips products that already have a value in this field in the source file — only products with missing category data are classified. Use this option when the source file already contains reliable category or family information.
    • output — Describes how classified categories are written to output columns. Defined as a list of output level objects:
      • name — The output column name. Set to null to discard that hierarchy level.
      • label — Display label for the output column, visible in the SDM frontend.
      • required — Defaults to true. If true and a hierarchy level is missing for a product, an error is raised. If false, the output column is left empty when the level is absent.
      • nb_levels — The number of hierarchy levels to include in this output column. Accepts a positive integer or null. When null on the first output entry, SDM uses it as a placeholder and counts the remaining levels for subsequent entries. When nb_levels > 1, a separator must be provided.
      • separator — Joins multiple hierarchy levels into a single string. Required when nb_levels > 1.
    • check_representation_conflicts — Defaults to true. Validates that the output configuration maps to an unambiguous hierarchy path. For example, if two hierarchy branches both contain a leaf node called ONE (e.g., A/AA/ONE and B/BB/ONE) and the output only writes the leaf, SDM raises an error at configuration save time because the output column would be ambiguous in downstream systems such as an ERP.
    • multiple — When true, allows a product to be assigned to more than one category.
    • multiple_separator — The string used to join multiple category values in the output when multiple is true.
    • allow_no_category — Defaults to false. When true, products with no matching category can advance to the next step without an error.

Output configuration example

Example of an output configuration with three levels for a FEDAS category code hierarchy

The screenshot above shows an example category list used for the output configuration example below.

Given a category list Code_FEDAS_UNIFAI: [["2", "200", "20012", "200124", "HO"]] and the following output configuration:

{
  "output": [
    {
      "name": null,
      "label": {},
      "required": true,
      "nb_levels": 1
    },
    {
      "name": "Code FEDAS",
      "label": {},
      "required": true,
      "nb_levels": 3
    },
    {
      "name": "Code Genre",
      "label": {},
      "required": false,
      "nb_levels": 1
    }
  ]
}

SDM discards the first level ("2"), writes "200", "20012", "200124" to the Code FEDAS column, and writes "HO" to the Code Genre column. If "HO" were missing and required were true, an error would be raised.

AI configuration (Model config)

The model_config object controls AI behavior for the SDM Classification module. See the API documentation for the full schema.

  • sources — List of field names used as input for AI prediction. Defaults to [].
  • use_model — Selects the AI backend. Accepted values: null (no AI), coreai_api (production AI), demo (simulation mode).
  • model_results — Stores simulated results when use_model is demo. Defaults to [].
  • additional_config — Optional dictionary for extra model configuration. Defaults to {}.
  • model_field_mappings — List of per-field AI configurations used with the coreai_api backend. Each entry has the following fields:

    • field_name — The source field this mapping applies to.
    • model_type — Whether to use a pre-trained model (trained) or a zero-shot model (zero_shot). The zero-shot model uses AI-powered classification without requiring training data. See the AI Classification guide for more on the zero-shot approach.
    • ai_provider — The AI provider for the zero-shot model. Accepted values: openai, dummy. Defaults to openai.
    • model_capability — The model capability tier to use with the zero-shot model. Accepted values:
      • cost_effective (default) — Optimized for cost and speed.
      • most_capable — Higher accuracy, higher cost.
      • most_capable_gpt_5_1 — Uses the GPT-5.1 model tier.
      • ensemble_experimental — Experimental ensemble approach.
    • categorisation_language — The locale used to interpret product data during AI classification. Optional; defaults to fr-FR. Set this to match the language of the source product data for best results.
    • custom_prompt — Additional instructions for the zero-shot AI model to refine its classification behavior. Optional. Use this to provide category-specific context, examples, or rules that the generic AI model would not know. Only available when model_type is zero_shot.
    • confidence_status — Determines how AI-classified results are presented in the SDM UI:
      • automated (default) — AI predictions with high confidence are validated automatically and appear in the Rows Checked by the AI tab. Only unmatched rows appear in Rows to Check.
      • to_check — All AI predictions require manual review; all rows appear in Rows to Check.

AI configuration example

"model_field_mappings": [
  {
    "field_name": "categories",
    "model_type": "zero_shot",
    "ai_provider": "openai",
    "model_capability": "cost_effective",
    "categorisation_language": "en-US",
    "confidence_status": "to_check"
  }
]

Hierarchy configuration

The SDM Hierarchy feature represents category trees. It uses a recursive model to organize data into a multi-level structure. Hierarchies are referenced by name in the categorisation_fields configuration.

Each node in a hierarchy is defined as follows:

  • Name — A unique identifier for the node, equivalent to a category code.
  • Label — The display name for the node. Can be localized to support multiple languages.
  • Selectable — Controls whether the node can be selected as a classification target. Leaf nodes (nodes without children) are selectable by default. Parent nodes can be made selectable but are not by default.
  • Children — Sub-nodes under a parent node. Hierarchies are recursive, so any node can itself have children.

Hierarchy configuration screen in the SDM admin panel showing a multi-level category tree

The screenshot above shows the SDM hierarchy configuration in the admin panel, displaying a multi-level category tree with expandable parent nodes and selectable leaf nodes.

Limitations and known issues

  1. Maximum number of products (rows) per job: 30,000.
  2. Recommended maximum number of attributes as AI sources: 50. Using more than 50 attributes may reduce classification accuracy.
  3. Optimal batch size: For best performance, send fewer than 200 product rows with fewer than 20 fields per job.
  4. Editing categories outside of classification: It is not currently possible to edit a category value outside of the SDM Classification module itself.