Expand your Catalog with the Data Architect Agent

Summary

Overview

Already have a data model in your PIM and need to onboard a new product line? The ‘Catalog Expansion’ mode lets the Data Architect Agent build on top of your existing catalog structure, keeping your current families, attributes, reference entities, and asset families intact while suggesting new ones tailored to your uploaded data.

What Catalog Expansion does:

Generate entities for new product lines (families, attributes, reference entities, asset families).
Reuse and extend your current structure (e.g., apply existing modeling patterns/attributes when creating new families).
Keep your current model safe: existing entities aren’t modified or deleted.

Starting from scratch with a brand-new PIM?

If you are starting from scratch with an empty PIM, we would recommend to use the dedicated ‘Create’ mode to let the Data Architect Agent design your initial catalog structure. This mode generates your very first set of families, attributes, and reference entities based on your product data, providing a clean foundation for your PIM.
Check out our dedicated Help Center article on the ‘Create’ mode for empty PIMs.

When should you use the ‘Create New Model’ or the ‘Expand Existing Model’:

Feature	Create New Model	Expand Existing Model
Ideal for…	New projects with empty PIMs	• Adding new ranges or product lines to an existing PIM structure • Building a new data model with a MVP approach with small iterations • And for our partners, building Pre Sales demos without erasing your current instance
PIM context	Ignores any existing attributes/families	Analyzes your current PIM structure first
Requirement	Need to reset your existing instance	/
Data source	Starts from a blank slate	Uses your existing data as a blueprint
End goal	Defining a brand-new data architecture	Iterative growth without breaking consistency

Start by creating your data model

Upload your product data directly from your ERP or e-commerce platform. The Data Architect Agent (DAA) analyzes your files to build a custom model aligned with Akeneo’s industry best practices.

With Catalog Expansion, you can scale your catalog with total confidence. The DAA treats your current PIM as the 'source of truth,' ensuring your existing structure remains untouched while proposing smart, complementary additions to support your new product lines

How does it work?

Check that you have full permission access to the Data Architect Agent under System > Roles > Administrator (or your own role) > Data Architect Agent

Then, click onto Settings to find the Data Architect Agent under the Automation section, all the way to the bottom of the page and select ‘Expand PIM structure’.

Name your data model and select your locales

Start by giving your data model a name, then define the locales the Data Architect Agent will use to generate your catalog structure. This configuration ensures that all generated families, attributes, and options are properly labeled in the right languages, ready for your team to use and consistent with your PIM setup.

You have the possibility to choose several locales to generate entities labels, but only one for the technical codes.

Locales to generate entities labels

This setting determines the labels for your generated entities.You can select multiple locales (e.g., English, French, German etc.) to generate labels in several languages simultaneously. These labels will appear in the corresponding languages for your families, attributes, attribute options, reference entities and assets.

Locale used for technical codes

This setting identifies the primary language used to generate the unique technical codes for your entities. Codes are typically generated in a single language (often English) to maintain a clean and predictable naming convention for your database and API integrations.

Fill in the ‘Advanced Prompt’ text box to guide the AI model and get the most of it.

The Advanced Prompt field allows you to provide additional context to guide the AI when generating your data model. While the Data Architect Agent can build a structure without it, adding a prompt helps the AI better understand your business specificities and produce more accurate, tailored results.

We have added some examples of the kind of information that will help the AI model to create the best version of your data, under the section ‘What kind of context should I provide to the AI model?’

Upload your files

Then, upload the product data file(s) that have been extracted from your ERP or your e-commerce platform. Below is an example of the kind of data structure we expect to receive:

Please note that we recommend the following structure for files to get the best results:

Format: XLSX and CSV files containing product information (title, description, some attributes that are already here, etc.), and nothing else.
Data must start on the first row (do not begin with an empty row).
One product per row / One attribute per column.
Names of the attributes in the first row. Ensure your column headers match the attribute names.
Attribute values start from the second row.
Max size of imported files: 120 MB per file.
Max number of imported files: 10.
Total cell count must not exceed 25,000,000 across all uploaded files (calculated as rows x columns per file). When uploading your files in the DAA, you will see an automatic calculation of your current cell count.
Total number of columns across all uploaded files : 2000

Please note that the model usually samples these files to build the data model. Your file does not need to be exhaustive, a few hundreds of products are more than enough.

We recommend uploading an UTF-8 encoded file to prevent special characters (like accents or currency symbols) from being corrupted during import.

We also recommend removing unused or empty columns from your file before uploading. Each column triggers additional processing work, so removing columns that won't become PIM attributes (such as internal IDs, warehouse codes, or columns filled mostly with empty values) will improve both the speed and the quality of your generated data model.

Why Product Data is required

We do not recommend uploading non-product data sheets (such as lists of technical specifications or structure-only sheets).

The DAA is designed to analyze the link between your products and their characteristics. Without product information, the AI cannot accurately determine your family structures or attribute options.

Family Composition

With the new Catalog Expansion, we’ve introduced several family modeling strategies to ensure the DAA aligns with your specific business logic.
When applying your model, you can now choose between three levels of granularity to ensure these new product lines will be structured exactly how you want:

Consolidated (fewer, broader families)
Balanced
or Granular (highly specific, detailed families)

Once you have answered your Family Composition strategy, the Data Architect Agent will begin generating your data model.

Depending on your file size and the complexity of your catalog, generating a data model typically takes several minutes. In rare cases involving very large files or high system load, processing may take up to several hours. A waiting message "Model generation in progress" will be displayed on your screen, and the page will automatically refresh once the model is ready.

A confirmation email is also sent when generation is complete.

Key takeaways from Catalog Expansion

The DAA follows a strict "non-destructive" policy: it will not modify or delete your existing attributes or families. When you click ‘Apply’, the agent only creates the new entities proposed in your current DAA model.

Conflict resolution:
- If the DAA suggests an attribute code that already exists in your PIM, the PIM version will always be the source of truth and there will be no automatic overwrite. In the event of a conflict between your live PIM structure and the current DAA modeling, conflicts will be identified under the ‘Modeling’ tab for your resolution. You will be able to ‘Resolve’ these issues by choosing among three options :
  - Rename the attribute label in the DAA model
  - Remove the attribute from the DAA model
  - Use the attribute from the existing PIM structure
Reference entities & Assets: Existing asset families and reference entities are preserved. The DAA only generates new ones if it’s necessary for the new product line.

Handling families & validation

Existing families: If the DAA suggests adding new attributes to a family you already have, those attributes will be added to that family as soon as the model is applied.
Validation rules: The Data Architect Agent only runs validation checks on the newly suggested entities. We assume your existing PIM data is already valid, so the DAA won't flag errors on your pre-existing attributes.

Explore and refine

After processing, the tool creates a data model with families, attributes, reference entities and options. You can explore and make adjustments to every entity generated (families, attributes, reference entities, codes and labels etc.).

The data model generated by the DAA is a starting point, not a final output. We strongly recommend refining it with your business and catalog expertise to ensure it accurately reflects your needs before proceeding.

Reviewing suggestions

To review existing entities, you can easily check them under the ‘Family Assignment’ tool, accessible from the ‘Configuration’ tab. You then have two different choices, to either show the new attributes from the DAA model or to show a consolidated view of both new attributes from the DAA model and the existing PIM structure.

Source File

The 'Source File' tab to provide greater visibility between the input files' columns and the data model generated by DAA, to help you understand DAA's modeling decisions.

Audit

The ‘Audit’ tab is designed to give you a comprehensive overview of the data model generated in the DAA. This feature breaks down key metrics, including attribute distribution and properties, and gives you a detailed view of both shared and family-specific attributes.

NEW - Collaboration

The DAA supports collaborative work on a data model: you can now edit and review data models created by other users in the same instance.

To help you keep track of changes, any modifications made by other users are clearly highlighted in the History tab of your data model with a visual indicator, so you can quickly identify what has been updated without having to review the entire model.

A visual indicator keeps you informed of how many changes other users have made to the data model. You can review these changes at any time and restore a previous version if needed.

Key capabilities and support

We currently support 15 types of attributes within the Data Architect Agent. The model is capable of generating these attributes, whether as suggestions or during the re-processing of the input file.

Entity	Status
Asset collection	Supported
Date	Supported
File	Supported
Identifier	Supported
Image	Supported
Metric	Supported
Multi select	Supported
Number	Supported
Price	Supported
~~Product link~~	Not supported
Reference entity multiple link	Supported
Reference entity single link	Supported
Simple select	Supported
~~Table~~	Not supported
Text	Supported
Text Area	Supported
Yes/No (boolean)	Supported

As of today, we don’t support the following entities in the Data Architect Agent.

Product models
Categories
Channels
Attribute groups
Association types
Groups
Workflows
Rules

Learn more at Akeneo Akademy

Everything you need to know about the Data Architect Agent in a dedicated practice environment.

Take the course

Product previews

The Product previews tab gives you a first overview at your products, showing you how they'll appear in the PIM.

We generate a sample set of products from your uploaded file, based on the families and attributes that have been suggested. You can easily navigate through your products by using the family filter or the search bar.

The attribute preview is designed to give you a quick projection of your data, by creating a sample set of your products.

Please note that:

The preview currently supports text, text area, simple select, and multi select attributes. Other attribute types may not be displayed as expected in the preview.
The product preview does not guarantee that every attribute will have a value, nor that every line in your uploaded file will necessarily result in a product in the preview tab.
The product preview might take some seconds to load.

Importing the data model

When you are satisfied with your data structure, you can import your model into your PIM to create your entities, using the ‘Import your data model’ button on the top right of the screen.

With the 'Catalog Expansion' mode, you don't need to reset your instance to apply the model.

You have three choices when importing the data model:

Importing the model only: This will import the families, attributes, reference entities, and assets, without any products.
Importing the model and a product sample: This will import the families, attributes, reference entities, assets, and a sample of your products (approx. 20 products for the moment). This provides users with an immediate, visualized first version of their catalog structure, accelerating feedback and project iteration.
Importing the model and create a Tailored import profile : This will import the families, attributes, reference entities, assets. It also creates a Tailored import profile, which gives you complete visibility into how your source file columns map to the generated PIM attributes. You can then refine and save this mapping for maximum accuracy and future product imports.

On this screen, confirm your choice to initiate the data model import and profile creation.

If your select the third option, you will have a modale screen where you will be able to access the 'Tailored Imports' tab in the Data Architect Agent when the processes finish. You have two actions to choose from ‘View Import Structure’ or ‘Launch Import’.

If you select ‘View Import Structure’, you will be redirected to your Tailored Import Profile to configure your Product Mapping:

On this page, you can review one last time how the mapping has been done by checking the columns of your input file (the Source) and the attributes in your PIM (the Target). You can also review and/or add operations to improve the data quality and minimize errors during import.

Conditional settings : depending on the target you selected, some other parameters need to be selected:
- For example, for scopable and/or localizable attributes, you must select to which channel and/or locale you want to import your data.
Advanced mapping operations:
You can use operations to transform the data in order to make the import process easier.

Advanced Mapping operations

To explore the advanced mapping operations offered by the Tailored Import, check out our Help Center article on the ‘Tailored Import’.

Please note that adding the operations in the correct order is essential. For instance, if you want to import the column "Main color" that contains multiple values in each cell (e.g. "Black, Crow, Charcoal, Obsidian") from your Excel spreadsheet into one multi-select attribute "Color" in our PIM, you need to add the operations in that order: first Split, then Replacement.

Asset Mapping:

If your uploaded file contained media links, you can use our powerful new Asset Mapping to convert those entities into PIM 'Assets'. This new addition is divided into two panels.

Asset Mapping Structure (Left panel):
- In the left panel, the ‘Asset mapping structure’ selects the Asset Attribute (the target) that will link the product to the corresponding assets.
- You can select the same Asset Attribute multiple times, enabling a single product attribute to link to various assets.
  
  Scopable and localized attributes are fully supported in the Asset Mapping structure.

Asset Configuration (Right panel):
- Asset code configuration: Define the strategy for how the final asset code will be generated. You can choose between two strategies.
  - Product identifier + suffix : The asset code will use the product’s main identifier, or the import ID if the main identifier is empty. A suffix is optional.
  - Column value : The asset code will be the value of the column selected.
- Attribute mappings: Map the asset family's codes to specific columns in your source file. If you already have a column with the asset code in your uploaded file(s), this will ensure that assets are created with the necessary information.
  
  If you need to map several URLs from your input files in a single asset family, you can create different mappings for the same asset family.

Reference Entities Records Mapping

You can now define how new Reference Entities records will be created and linked to your products during the import process.

First, select the relevant Reference Entity attribute, then specify how the source file data should populate each record field.

Record Mapping Structure (Left panel):
- In the left panel, the ‘Record mapping structure’ selects the Reference Entity Attribute (the target) that will link the product to the corresponding records.
- You can select the same Reference Entity Attribute multiple times, enabling a single product attribute to link to various records.
  
  Scopable and localized attributes are fully supported in the Reference Entities Records Mapping structure.

Record Configuration (Right panel):
- Record code configuration: Define the strategy for how the final record code will be generated. You can choose between two strategies.
  - Product identifier + suffix : The record code will use the product’s main identifier, or the import ID if the main identifier is empty. A suffix is optional.
  - Column value : The record code will be the value of the column selected.
- Attribute mappings: Map the record attribute's codes to specific columns in your source file. If you already have a column with the record code in your uploaded file(s), this will ensure that records are created with the necessary information.

NEW - Automatic Family Categorization from AI

Automatic Family Categorization from AI: If your file lacks a dedicated column for PIM Family assignment, you can use this operation to let the AI determine the correct family. The AI will analyze the entire row of data to decide where the product belongs.

If you want to deepen your knowledge of ‘Tailored Imports’, we can also recommend to check our dedicated Help Center article here.

Next action to import your products: launching the ‘Quick Import’

Before proceeding, we strongly recommend reviewing the structure of the generated mapping. For each PIM attribute (Target), you can select one or multiple sources from your input file, depending on the attribute type.

At this stage, your data model structure (Families, Attributes, Reference Entities and Assets) has been already created in your PIM but you will need to select the ‘Quick Import’ button (top right of your screen) to import your products.

You then have two choices, either:

Using the current file provided during the DAA modeling process
Or, you can also upload a new file, which will be imported using the saved structure. This option supports both testing with smaller datasets to validate the structure, and scaling up to import larger datasets (bigger than the DAA's original file for ex.).

The import is finished! All of your products have been uploaded to your PIM! Now is the perfect time to explore your new product listings and start refining your data model. Get started!

We can highly recommend reviewing your data model with a Professional Services consultant or Partner before applying the model to your PIM to make sure it fits your business, and future catalog evolution.

Additional capabilities

Data Architect Chat

Get instant guidance on your Akeneo PIM data modeling. Use the chat to ask questions about best practices and receive advice on your attribute definitions.

Please note: This experimental tool provides guidance only and does not have the permissions to modify your live taxonomy or data model.

Access & Permissions : To ensure a consistent experience, only the original creator of the data model can interact with the chat. All other users can view the conversation in read-only mode.

Conversation Context : Each chat is unique to its specific data model, allowing you to keep your modeling discussions organized. While the full context of your conversation is preserved indefinitely, the chat interface will display the last 100 messages in your history for optimal performance.

Download

In the top right corner of the DAA, you'll see additional actions, including the ability to download your files.

You have several options to choose from, depending on your needs.

The full Data Model in a XLSX format: this option provides a single Excel file containing a comprehensive overview of the data model that has been generated in the Data Architect Agent.

The file contains dedicated sheets for Attributes, Families, Family variants, Attribute groups, Asset families, Reference entities, and Select options.

We have also included sheets for Attribute definitions, Metric Family codes, and Metric units. These are intended as technical guides for your modeling and are not part of the importable structure.

This export is best used with the collaboration of a partner or Professional consultant, when first implementing a PIM, to report those changes in the PIM.

If you would like to re-export these files, you have the possibility to download individual files in CSV formats for the following entities:
- Attributes
- Attributes Options
- Families
- Family Variants

Specifications

Depending on several factors, generating a data model can take anywhere from minutes to hours. A waiting message “Model generation in progress” will be displayed on your screen, and the page will be automatically refreshed to show the model generated. A confirmation email is also sent when the model has been generated.
The LLM used for data model generation is Gemini Flash 2.5. Data sent to Google is neither stored on their servers nor used to train their models.

What kind of context should I give to the AI model?

It is important to note that custom instructions always take the priority over a default behavior. It means that if you write something specific, even if it breaks usual PIM rules, the model will follow it.

The goal is not necessarily to be fully exhaustive, but it is to remove uncertainty and make the output fit better with your customer’s business, tools and their way of modeling data.

If you don't give any instructions in the ‘Business & Data Modeling Context’, the AI will still work, as:
- It understands how the PIM works
- It applies best practices by default
- It reads and understands the structure of your input product file
- It produces a valid, coherent taxonomy using standard logic

Keep in mind that prompts need to be simple and easy to understand.

To learn more about prompting, check out our dedicated Akademy course. Additionally, your Professional Services consultant or Partner can help you create an accurate prompt for your need.

Examples of Custom Instructions you can use (with rationale)

There’s no fixed structured that we would particularly recommend but you can find example prompts you can use to make the model behave exactly how you want - to improve consistency, resolve ambiguities, and align the results with the logic of your catalog, your tools or your business.

Use the variant_axis column to define variant products.

Rationale:

The model can infer good variant axes from your product data — for example, if it sees multiple products with identical names but different colors, it may suggest color as a variant axis. But if you want to enforce a specific structure or match another system (like an ERP or spreadsheet you’re importing from), you can specify it directly. This is especially helpful when your file contains edge cases that could mislead the model, like inconsistent naming or overlapping dimensions.

Treat the details field as a select-type attribute.

Rationale:

If the details field contains recurring values these could be modeled as options in a simple select attribute. But if the field has a lot of different values and not many of them are recurring for different products, the model might otherwise treat it as free text. Adding this instruction helps resolve ambiguity and structure your data better.

I want the data model to include the following families: spare_parts, retail_accessories, e_bike_components.

Rationale:

The model usually creates families by analyzing product clusters, but some families might not be created as you want them, or if the model chooses a more granular or more general group instead. Listing the families here ensures they appear as-is. This is also helpful if some products that belong to these families are missing from your file (e.g. partial exports or subset samples). It also helps resolve granularity mismatches (e.g. if you want “retail_accessories” to be a consistent macro-family, even if the model would have split it into “bottle_holders”, “lights”, etc.).

I plan to add a new outdoor_gear product line to my catalog in the future, so create a family for it that should mirror retail_accessories.

Rationale:

This helps to future-proof the structure. The model won’t generate structure for products that aren’t in your file, but if you know you’ll add a product line (like “outdoor_gear”) you can mention it. You can also give the model a reference: saying it should “mirror retail_accessories” tells the model to use the same kind of attribute set, variant logic, etc. This is useful when retail_accessories is well structured in your file and you want the new family to inherit its logic.

Use ALL CAPS for attribute labels. Limit them to 120 characters.

Rationale:

This enforces internal style conventions. While best practices would usually lean towards sentence casing (i.e. capitalizing only the first letter of the first word in a sentence and any other proper nouns), you have the possibility to override to fit other conventions. The model will mostly follow the constraint (especially for casing), but character limits may not always be perfectly respected - still, setting the expectation ensures labels are short and clean in most cases.

Attribute codes should include the unit at the end (e.g. weight_g, length_mm) and the label should match: “Weight (g)”.

Rationale:

In PIM best practices, including units in codes or labels is discouraged as units are usually handled using the ‘Metrics’ attribute type. But if your systems (e.g. Excel exports, ERP schemas) require it for clarity or consistency, you can enforce it using custom instructions. The AI will follow this logic even if it wouldn’t propose it by default.

There should be an “Unknown” option for the material attribute.

Rationale:

The model usually expect you to leave attributes empty when products are missing a value for this attribute. If your system requires placeholder values for completeness, or if blank fields break exports or validation rules, you can enforce fallbacks.

Example of a full Custom Instruction

I’m a B2B distributor. Use the product_type column to define families and variant_axis for variant grouping. Attribute codes should always end in a unit (_g, mm), and the labels should reflect that (e.g. “Weight (g)”). Group all technical attributes under a tech prefix. Do not introduce any new attributes that aren't already in the file. For when material is missing, add an “Unknown” option.