Accelerate data initialization with the Data Architect Agent

Summary

Overview

The Data Architect Agent (DAA) helps new customers rapidly build their product data structure in Akeneo PIM.

Upload a product extract from your ERP or e-commerce platform, and the AI will analyze your catalog to automatically generate families, attributes, reference entities and asset families, following Akeneo best practices from the start.

Already have a data model in your PIM and need to onboard a new product line?

The ‘Catalog Expansion’ mode lets the Data Architect Agent build on top of your existing catalog structure, keeping your current families, attributes, reference entities, and asset families intact while suggesting new ones tailored to your uploaded data.
Check out our dedicated Help Center article on the ‘Catalog Expansion' mode.

We release new improvements regularly!

To find out more about all the new improvements we've released on the Data Architect Agent, check out our change log here.

Start by creating your data model

Upload your product catalog in a flat file from an e-commerce platform or ERP system. The AI uses this data to generate your custom model, based on Akeneo’s good practices and key concepts.

How does it work?

Check that you have full permission access to the Data Architect Agent under System > Roles > Administrator (or your own role) > Data Architect Agent

Already have a data model in your PIM and need to onboard a new product line?

Check out our Help Center article about the ‘Catalog Expansion’ mode lets the Data Architect Agent build on top of your existing catalog structure, keeping your current families, attributes, reference entities, and asset families intact while suggesting new ones tailored to your uploaded data.

Then, click onto Settings to find the Data Architect Agent under the Automation section, all the way to the bottom of the page.

Name your data model and select your locales

Start by giving your data model a name, then define the locales the Data Architect Agent will use to generate your catalog structure. This configuration ensures that all generated families, attributes, and options are properly labeled in the right languages, ready for your team to use and consistent with your PIM setup.

You have the possibility to choose several locales to generate entities labels, but only one for the technical codes.

Locales to generate entities labels

This setting determines the labels for your generated entities.You can select multiple locales (e.g., English, French, German etc.) to generate labels in several languages simultaneously. These labels will appear in the corresponding languages for your families, attributes, attribute options, reference entities and assets.

Locale used for technical codes

This setting identifies the primary language used to generate the unique technical codes for your entities. Codes are typically generated in a single language (often English) to maintain a clean and predictable naming convention for your database and API integrations.

Fill in the ‘Advanced Prompt’ text box to guide the AI model and get the most of it.

The Advanced Prompt field allows you to provide additional context to guide the AI when generating your data model. While the Data Architect Agent can build a structure without it, adding a prompt helps the AI better understand your business specificities and produce more accurate, tailored results.

We have added some examples of the kind of information that will help the AI model to create the best version of your data, under the section ‘What kind of context should I provide to the AI model?’

Upload your files

Then, upload the product data file(s) that have been extracted from your ERP or your e-commerce platform. Below is an example of the kind of data structure we expect to receive:

Please note that we recommend the following structure for files to get the best results:

Format: XLSX and CSV files containing product information (title, description, some attributes that are already here, etc.), and nothing else.
Data must start on the first row (do not begin with an empty row).
One product per row / One attribute per column.
Names of the attributes in the first row. Ensure your column headers match the attribute names.
Attribute values start from the second row.
Max size of imported files: 120 MB per file.
Max number of imported files: 10.
Total cell count must not exceed 25,000,000 across all uploaded files (calculated as rows x columns per file). When uploading your files in the DAA, you will see an automatic calculation of your current cell count.
Total number of columns across all uploaded files : 2000

Please note that the model usually samples these files to build the data model. Your file does not need to be exhaustive, a few hundreds of products are more than enough.

We recommend uploading an UTF-8 encoded file to prevent special characters (like accents or currency symbols) from being corrupted during import.

We also recommend removing unused or empty columns from your file before uploading. Each column triggers additional processing work, so removing columns that won't become PIM attributes (such as internal IDs, warehouse codes, or columns filled mostly with empty values) will improve both the speed and the quality of your generated data model.

Why Product Data is required

We do not recommend uploading non-product data sheets (such as lists of technical specifications or structure-only sheets).

The DAA is designed to analyze the link between your products and their characteristics. Without product information, the AI cannot accurately determine your family structures or attribute options.

Contextual follow-up questions

After uploading your file(s) and clicking 'Start', the Data Architect Agent will ask you a short series of follow-up questions before generating your data model.

These questions are generated by the AI based on the content of your uploaded file. Their purpose is to clarify any ambiguities the AI has detected in your data, so it can generate the most accurate data model possible and reduce interpretation errors.

What to expect

You can expect to answer between 4 and 10 questions, depending on the content and complexity of your file. You can either select one option or answer a free-text field for you to provide more specific context.

Here is an example of the kind of question you might be asked: “Beyond your existing columns (once parsed and structured), how much should AI suggest additional attributes?”

Can I skip the questions?

Answering the follow-up questions is optional and you can skip them to proceed directly to model generation. However, we strongly recommend taking the time to answer them. The more context you provide, the more accurate your generated data model will be, and the less manual refinement you will need to do afterwards.

Once you have answered the questions (or chosen to skip), the Data Architect Agent will begin generating your data model.

Depending on your file size and the complexity of your catalog, generating a data model typically takes several minutes. In rare cases involving very large files or high system load, processing may take up to several hours. A waiting message "Model generation in progress" will be displayed on your screen, and the page will automatically refresh once the model is ready.

A confirmation email is also sent when generation is complete.

Explore and refine

After processing, the tool creates a data model with families, attributes, reference entities and options. You can explore and make adjustments to every entity generated (families, attributes, reference entities, codes and labels etc.).

The data model generated by the DAA is a starting point, not a final output. We strongly recommend refining it with your business and catalog expertise to ensure it accurately reflects your needs before proceeding.

We have implemented a few improvements to ease the refinement of your data model.

Attribute Assignment

The ‘Attribute Assignment’ facilitates quick modifications on the generated data model, allowing you to easily assign attributes to your families more easily.

Source File

The 'Source File' tab to provide greater visibility between the input files' columns and the data model generated by DAA, to help you understand DAA's modeling decisions.

Audit

The ‘Audit’ tab is designed to give you a comprehensive overview of the data model generated in the DAA. This feature breaks down key metrics, including attribute distribution and properties, and gives you a detailed view of both shared and family-specific attributes.

NEW - Collaboration

The DAA supports collaborative work on a data model: you can now edit and review data models created by other users in the same instance.

To help you keep track of changes, any modifications made by other users are clearly highlighted in the History tab of your data model with a visual indicator, so you can quickly identify what has been updated without having to review the entire model.

A visual indicator keeps you informed of how many changes other users have made to the data model. You can review these changes at any time and restore a previous version if needed.

Key concepts

Families

A family is a defined set of attributes that products automatically inherit when assigned to this family. While a product can only be part of one family (or none, if it's a unique item without default attributes), this structure helps to manage and track a product's data completeness.

Find out more with our dedicated Help Center article about Families.

Family Variants

The model is able to generate family variants based on your input file. Family variants help managing products with different variations (i.e. a couch that comes in different colors and sizes) .

Find out more with our dedicated Help Center article about Family Variants.

Attributes

The model is able to generate attributes based on your input file. Attributes help by providing specific characteristics and details for each product, enabling richer descriptions, improved searchability, and better organization of your product information within the PIM system.

Find out more with our dedicated Help Center article about Attributes.

Reference Entities

A reference entity allows you to manage common product information (like brands or materials) with their own dedicated attributes. Reference entities help to centralize enrichment by adding rich content as images or detailed descriptions in one place, which will automatically apply to every product.

Find out more with our dedicated Help Center article about Reference Entities.

Assets

The Data Architect Agent identifies image and file URLs within your input file and creates dedicated asset collections for them. This allows you to visualize all product-related information during the data initialization phase with the DAA.
We don't support media files in the asset collections, but you will be able to update your files after applying the model in the Asset Manager.

Find out more with our dedicated Help Center article about Assets.

We currently support 15 types of attributes within the Data Architect Agent. The model is capable of generating these attributes, whether as suggestions or during the re-processing of the input file.

Entity	Status
Asset collection	Supported
Date	Supported
File	Supported
Identifier	Supported
Image	Supported
Metric	Supported
Multi select	Supported
Number	Supported
Price	Supported
~~Product link~~	Not supported
Reference entity multiple link	Supported
Reference entity single link	Supported
Simple select	Supported
~~Table~~	Not supported
Text	Supported
Text Area	Supported
Yes/No (boolean)	Supported

As of today, we don’t support the following entities in the Data Architect Agent.

Product models
Categories
Channels
Attribute groups
Association types
Groups
Workflows
Rules

Learn more at Akeneo Akademy

Everything you need to know about the Data Architect Agent in a dedicated practice environment.

Take the course

Issues

We've introduced a new Issues tab to help you quickly identify and address problems within your catalog's structure. This dedicated tab centralizes all identified issues, making them easier to manage.

For each issue, you'll see details about the affected entity and a clear error message. To resolve an issue, simply click the ‘EDIT’ button. This will automatically take you to the specific location of the problem, allowing you to identify and correct it.

Product previews

The Product previews tab gives you a first overview at your products, showing you how they'll appear in the PIM.

We generate a sample set of products from your uploaded file, based on the families and attributes that have been suggested. You can easily navigate through your products by using the family filter or the search bar.

The attribute preview is designed to give you a quick projection of your data, by creating a sample set of your products.

Please note that:

The preview currently supports text, text area, simple select, and multi select attributes. Other attribute types may not be displayed as expected in the preview.
The product preview does not guarantee that every attribute will have a value, nor that every line in your uploaded file will necessarily result in a product in the preview tab.
The product preview might take some seconds to load.

Importing the data model

When you are satisfied with your data structure, you can import your model into your PIM to create your entities, using the ‘Import your data model’ button on the top right of the screen.

What should I do if my instance is not empty?

If your instance is a sandbox and already contains entities, you can leverage the new 'reset' functionality to remove all previous entities.
Please note that this feature is available only on sandbox instances when you apply a new data model to an environment that already contains entities (i.e. attributes, families, etc.), allowing you to easily facilitate data initialization.

Already have a data model in your PIM and need to onboard a new product line?

You have three choices when importing the data model:

Importing the model only: This will import the families, attributes, reference entities, and assets, without any products.
Importing the model and a product sample: This will import the families, attributes, reference entities, assets, and a sample of your products (approx. 20 products for the moment). This provides users with an immediate, visualized first version of their catalog structure, accelerating feedback and project iteration.
Importing the model and create a Tailored import profile : This will import the families, attributes, reference entities, assets. It also creates a Tailored import profile, which gives you complete visibility into how your source file columns map to the generated PIM attributes. You can then refine and save this mapping for maximum accuracy and future product imports.

On this screen, confirm your choice to initiate the data model import and profile creation.

If your select the third option, you will have a modale screen where you will be able to access the 'Tailored Imports' tab in the Data Architect Agent when the processes finish. You have two actions to choose from ‘View Import Structure’ or ‘Launch Import’.

If you select ‘View Import Structure’, you will be redirected to your Tailored Import Profile to configure your Product Mapping:

On this page, you can review one last time how the mapping has been done by checking the columns of your input file (the Source) and the attributes in your PIM (the Target). You can also review and/or add operations to improve the data quality and minimize errors during import.

Conditional settings : depending on the target you selected, some other parameters need to be selected:
- For example, for scopable and/or localizable attributes, you must select to which channel and/or locale you want to import your data.
Advanced mapping operations:
You can use operations to transform the data in order to make the import process easier.

Advanced Mapping operations

To explore the advanced mapping operations offered by the Tailored Import, check out our Help Center article on the ‘Tailored Import’.

Please note that adding the operations in the correct order is essential. For instance, if you want to import the column "Main color" that contains multiple values in each cell (e.g. "Black, Crow, Charcoal, Obsidian") from your Excel spreadsheet into one multi-select attribute "Color" in our PIM, you need to add the operations in that order: first Split, then Replacement.

Asset Mapping:

If your uploaded file contained media links, you can use our powerful new Asset Mapping to convert those entities into PIM 'Assets'. This new addition is divided into two panels.

Asset Mapping Structure (Left panel):
- In the left panel, the ‘Asset mapping structure’ selects the Asset Attribute (the target) that will link the product to the corresponding assets.
- You can select the same Asset Attribute multiple times, enabling a single product attribute to link to various assets.
  
  Scopable and localized attributes are fully supported in the Asset Mapping structure.

Asset Configuration (Right panel):
- Asset code configuration: Define the strategy for how the final asset code will be generated. You can choose between two strategies.
  - Product identifier + suffix : The asset code will use the product’s main identifier, or the import ID if the main identifier is empty. A suffix is optional.
  - Column value : The asset code will be the value of the column selected.
- Attribute mappings: Map the asset family's codes to specific columns in your source file. If you already have a column with the asset code in your uploaded file(s), this will ensure that assets are created with the necessary information.
  
  If you need to map several URLs from your input files in a single asset family, you can create different mappings for the same asset family.

Reference Entities Records Mapping

You can now define how new Reference Entities records will be created and linked to your products during the import process.

First, select the relevant Reference Entity attribute, then specify how the source file data should populate each record field.

Record Mapping Structure (Left panel):
- In the left panel, the ‘Record mapping structure’ selects the Reference Entity Attribute (the target) that will link the product to the corresponding records.
- You can select the same Reference Entity Attribute multiple times, enabling a single product attribute to link to various records.
  
  Scopable and localized attributes are fully supported in the Reference Entities Records Mapping structure.

Record Configuration (Right panel):
- Record code configuration: Define the strategy for how the final record code will be generated. You can choose between two strategies.
  - Product identifier + suffix : The record code will use the product’s main identifier, or the import ID if the main identifier is empty. A suffix is optional.
  - Column value : The record code will be the value of the column selected.
- Attribute mappings: Map the record attribute's codes to specific columns in your source file. If you already have a column with the record code in your uploaded file(s), this will ensure that records are created with the necessary information.

NEW - Automatic Family Categorization from AI

Automatic Family Categorization from AI: If your file lacks a dedicated column for PIM Family assignment, you can use this operation to let the AI determine the correct family. The AI will analyze the entire row of data to decide where the product belongs.

If you want to deepen your knowledge of ‘Tailored Imports’, we can also recommend to check our dedicated Help Center article here.

Next action to import your products: launching the ‘Quick Import’

Before proceeding, we strongly recommend reviewing the structure of the generated mapping. For each PIM attribute (Target), you can select one or multiple sources from your input file, depending on the attribute type.

At this stage, your data model structure (Families, Attributes, Reference Entities and Assets) has been already created in your PIM but you will need to select the ‘Quick Import’ button (top right of your screen) to import your products.

You then have two choices, either:

Using the current file provided during the DAA modeling process
Or, you can also upload a new file, which will be imported using the saved structure. This option supports both testing with smaller datasets to validate the structure, and scaling up to import larger datasets (bigger than the DAA's original file for ex.).

The import is finished! All of your products have been uploaded to your PIM! Now is the perfect time to explore your new product listings and start refining your data model. Get started!

We can highly recommend reviewing your data model with a Professional Services consultant or Partner before applying the model to your PIM to make sure it fits your business, and future catalog evolution.

Additional capabilities

Data Architect Chat

Get instant guidance on your Akeneo PIM data modeling. Use the chat to ask questions about best practices and receive advice on your attribute definitions.

Please note: This experimental tool provides guidance only and does not have the permissions to modify your live taxonomy or data model.

Access & Permissions : To ensure a consistent experience, only the original creator of the data model can interact with the chat. All other users can view the conversation in read-only mode.

Conversation Context : Each chat is unique to its specific data model, allowing you to keep your modeling discussions organized. While the full context of your conversation is preserved indefinitely, the chat interface will display the last 100 messages in your history for optimal performance.

Download

In the top right corner of the DAA, you'll see additional actions, including the ability to download your files.

You have several options to choose from, depending on your needs.

The full Data Model in a XLSX format: this option provides a single Excel file containing a comprehensive overview of the data model that has been generated in the Data Architect Agent.

The file contains dedicated sheets for Attributes, Families, Family variants, Attribute groups, Asset families, Reference entities, and Select options.

We have also included sheets for Attribute definitions, Metric Family codes, and Metric units. These are intended as technical guides for your modeling and are not part of the importable structure.

This export is best used with the collaboration of a partner or Professional consultant, when first implementing a PIM, to report those changes in the PIM.

If you would like to re-export these files, you have the possibility to download individual files in CSV formats for the following entities:
- Attributes
- Attributes Options
- Families
- Family Variants

Specifications

Depending on several factors, generating a data model can take anywhere from minutes to hours. A waiting message “Model generation in progress” will be displayed on your screen, and the page will be automatically refreshed to show the model generated. A confirmation email is also sent when the model has been generated.
The LLM used for data model generation is Gemini Flash 2.5. Data sent to Google is neither stored on their servers nor used to train their models.

What kind of context should I give to the AI model?

It is important to note that custom instructions always take the priority over a default behavior. It means that if you write something specific, even if it breaks usual PIM rules, the model will follow it.

The goal is not necessarily to be fully exhaustive, but it is to remove uncertainty and make the output fit better with your customer’s business, tools and their way of modeling data.

If you don't give any instructions in the ‘Business & Data Modeling Context’, the AI will still work, as:
- It understands how the PIM works
- It applies best practices by default
- It reads and understands the structure of your input product file
- It produces a valid, coherent taxonomy using standard logic

Keep in mind that prompts need to be simple and easy to understand.

To learn more about prompting, check out our dedicated Akademy course. Additionally, your Professional Services consultant or Partner can help you create an accurate prompt for your need.

Examples of Custom Instructions you can use (with rationale)

There’s no fixed structured that we would particularly recommend but you can find example prompts you can use to make the model behave exactly how you want - to improve consistency, resolve ambiguities, and align the results with the logic of your catalog, your tools or your business.

Use the variant_axis column to define variant products.

Rationale:

The model can infer good variant axes from your product data — for example, if it sees multiple products with identical names but different colors, it may suggest color as a variant axis. But if you want to enforce a specific structure or match another system (like an ERP or spreadsheet you’re importing from), you can specify it directly. This is especially helpful when your file contains edge cases that could mislead the model, like inconsistent naming or overlapping dimensions.

Treat the details field as a select-type attribute.

Rationale:

If the details field contains recurring values these could be modeled as options in a simple select attribute. But if the field has a lot of different values and not many of them are recurring for different products, the model might otherwise treat it as free text. Adding this instruction helps resolve ambiguity and structure your data better.

I want the data model to include the following families: spare_parts, retail_accessories, e_bike_components.

Rationale:

The model usually creates families by analyzing product clusters, but some families might not be created as you want them, or if the model chooses a more granular or more general group instead. Listing the families here ensures they appear as-is. This is also helpful if some products that belong to these families are missing from your file (e.g. partial exports or subset samples). It also helps resolve granularity mismatches (e.g. if you want “retail_accessories” to be a consistent macro-family, even if the model would have split it into “bottle_holders”, “lights”, etc.).

I plan to add a new outdoor_gear product line to my catalog in the future, so create a family for it that should mirror retail_accessories.

Rationale:

This helps to future-proof the structure. The model won’t generate structure for products that aren’t in your file, but if you know you’ll add a product line (like “outdoor_gear”) you can mention it. You can also give the model a reference: saying it should “mirror retail_accessories” tells the model to use the same kind of attribute set, variant logic, etc. This is useful when retail_accessories is well structured in your file and you want the new family to inherit its logic.

Use ALL CAPS for attribute labels. Limit them to 120 characters.

Rationale:

This enforces internal style conventions. While best practices would usually lean towards sentence casing (i.e. capitalizing only the first letter of the first word in a sentence and any other proper nouns), you have the possibility to override to fit other conventions. The model will mostly follow the constraint (especially for casing), but character limits may not always be perfectly respected - still, setting the expectation ensures labels are short and clean in most cases.

Attribute codes should include the unit at the end (e.g. weight_g, length_mm) and the label should match: “Weight (g)”.

Rationale:

In PIM best practices, including units in codes or labels is discouraged as units are usually handled using the ‘Metrics’ attribute type. But if your systems (e.g. Excel exports, ERP schemas) require it for clarity or consistency, you can enforce it using custom instructions. The AI will follow this logic even if it wouldn’t propose it by default.

There should be an “Unknown” option for the material attribute.

Rationale:

The model usually expect you to leave attributes empty when products are missing a value for this attribute. If your system requires placeholder values for completeness, or if blank fields break exports or validation rules, you can enforce fallbacks.

Example of a full Custom Instruction

I’m a B2B distributor. Use the product_type column to define families and variant_axis for variant grouping. Attribute codes should always end in a unit (_g, mm), and the labels should reflect that (e.g. “Weight (g)”). Group all technical attributes under a tech prefix. Do not introduce any new attributes that aren't already in the file. For when material is missing, add an “Unknown” option.

To learn more about prompting, check out our dedicated Akademy course. Additionally, your Professional Services consultant or Partner can help you create an accurate prompt for your need.

Attribute suggestions

Data Architect Agent (DAA) is in charge of collecting attribute suggestions coming from any source type (PX Insights, Activation…). These proposals appear as Attribute Suggestions, which you can review and act on at your own pace.

Where to find Attribute Suggestions

Open See PIM Audit > Attribute Suggestions tab. The list displays all suggestions generated for your catalog, along with their current status.

Each suggestion moves through the following states:

Generating - DAA is processing and creating the suggestion. You can access it yet.
Ready to review - The suggestion is complete and waiting for your decision.
Applied - You approved the suggestion and the attribute has been created in your catalog.
Declined - You rejected the suggestion. No attribute was created. The rejection reason is also displayed

The list refreshes automatically while suggestions are generating, so there's no need to reload the page.

Reviewing a suggestion

Click any row with a Ready to review status to open the suggestion details. You'll see:

Attribute properties: code, type, label, attribute group, and whether the attribute is scopable or localizable. When possible, properties are pre-filled by an AI agent and can be edited at this stage
AI Reasoning: an explanation from the DAA describing why this attribute was suggested.
Additional comments: any context provided at the time the suggestion was submitted.

Approving a suggestion

Before confirming, you can update the prefilled attribute properties. The code must be unique within your catalog — a validation check runs automatically.
Once ready, click Create. The attribute is immediately added to your catalog, and the suggestion status changes to Applied. The person who submitted the suggestion is notified by email and PIM notification.

Updating the suggested attribute type

Changing the attribute type will remove any automatically suggested AI configurations such as attribute options, localizable property, scopable property…

Declining a suggestion

If the suggestion doesn't fit your catalog structure, click Reject and explain why you refused to create it to give more context to the person who submitted this suggestion. The suggestion is marked as Declined, and no attribute is created. The submitter receives an email notification with the rejection reason.

Suggestion sources

The source is visible directly in the suggestions list.

Suggestions can originate from two Akeneo native sources:

Activation Mapping — based on your existing catalog structure and mapping configurations.
PX Insights – AI Discoverability — based on AI-driven discoverability analysis of your product data.

By leveraging our API endpoint, any solution can provide and push attribute creation suggestions to Data Architect Agent

To find all the attribute suggestions you've made as a user, click on this documentation, it will guide you through the process

Change log of our latest improvements

We release new improvements regularly!

To find out more about all the new improvements we've released on the Data Architect Agent, check out our change log here.

Akeneo Help Center

Overview

Already have a data model in your PIM and need to onboard a new product line?

We release new improvements regularly!

Start by creating your data model

Already have a data model in your PIM and need to onboard a new product line?

Upload your files

Why Product Data is required

Contextual follow-up questions

Explore and refine

Attribute Assignment

Source File

Audit

NEW - Collaboration

Key concepts

Learn more at Akeneo Akademy

Issues

Product previews

Importing the data model

What should I do if my instance is not empty?

Already have a data model in your PIM and need to onboard a new product line?

Advanced Mapping operations

NEW - Automatic Family Categorization from AI

Additional capabilities

Data Architect Chat

Download

Specifications

What kind of context should I give to the AI model?

Attribute suggestions

Where to find Attribute Suggestions

Reviewing a suggestion

Approving a suggestion

Updating the suggested attribute type

Declining a suggestion

Suggestion sources

Change log of our latest improvements

We release new improvements regularly!

Want to find out more? Take a look at these related articles