Accelerate data initialization with the Data Architect Agent

Summary

The Data Architect Agent (DAA) empowers new customers to quickly establish their product data structure in Akeneo PIM.

By leveraging AI, it automatically generates an accurate data model based on your catalog extract, while streamlining the PIM initialisation and reducing the time to go live.

 

Creating your product data structure in the PIM can be detailed under three steps:

Import your products

Upload your product catalog in a flat file from an e-commerce platform or ERP system. The AI uses this data to generate your custom model, based on Akeneo’s good practices and key concepts.

How does it work?

  • Check that you have full permission access to the Data Architect Agent under System > Roles > Administrator (or your own role) > Data Architect Agent

 

  • Then, click onto Settings to find the Data Architect Agent under the Automation section, all the way to the bottom of the page.

 

  • Fill in the context text box to guide the AI model and get the most of it.

 

The Context box will allow the AI model to generate the most accurate version of your catalog according to your business requirements.

We have added some examples of the kind of information that will help the AI model to create the best version of your data, under the section What kind of context should I provide to the AI model?’

 

 

  • Then, upload the product data file(s) that have been extracted from your ERP or your e-commerce platform. The accepted formats are XLS, XLSX and CSV and files can weight up to 20MB.

 

Please note that we recommend the following structure for files:

  • Format: XLS, XLSX and CSV files containing product information (title, description, some attributes that are already here, etc.), and nothing else. Your file does not need to be exhaustive, a few hundreds of products are more than enough.
  • One product per row
  • One attribute per column
  • Names of the attributes in the first row
  • Attribute values start from the second row
  • Max size of imported files: 20 Mb per file
  • Max number of imported files: 10

Please note that the model usually samples these files to build the data model.

 

 

  • Explore and refine

After processing, the tool creates a data model with families, attributes, and options. You can explore and make adjustments to every entity generated (families, attributes, codes and labels etc.).

 

Key concepts

  • Families

A family is a defined set of attributes that products automatically inherit when assigned to this family. While a product can only be part of one family (or none, if it's a unique item without default attributes), this structure helps to manage and track a product's data completeness.

Find out more with our dedicated Help Center article about Families.

  • Family Variants

The model is able to generate family variants based on your input file. Family variants help managing products with different variations (i.e. a couch that comes in different colors and sizes) .

Find out more with our dedicated Help Center article about Family Variants.

  • Attributes

The model is able to generate attributes based on your input file. Attributes help by providing specific characteristics and details for each product, enabling richer descriptions, improved searchability, and better organization of your product information within the PIM system.

Find out more with our dedicated Help Center article about Attributes.

 

We currently support 12 types of attributes within the Data Architect Agent. The model is capable of generating these attributes, whether as suggestions or during the re-processing of the input file.

 

 

Entity Status
Asset collection Not supported
Date Supported
File Supported
Identifier Supported
Image Supported
Metric Supported
Multi select Supported
Number Supported
Price Supported
Product link Not supported
Reference entity multiple links Not supported
Reference entity single link Not supported
Simple select Supported
Table Not supported
Text Supported
Text Area Supported
Yes/No (boolean) Supported

 

As of today, we don’t support the following entities in the Data Architect Agent.

  • Product models
  • Categories
  • Channels
  • Attribute groups
  • Association types
  • Groups
  • Workflows
  • Rules
 

 

Issues

We've introduced a new Issues tab to help you quickly identify and address problems within your catalog's structure. This dedicated tab centralizes all identified issues, making them easier to manage.

For each issue, you'll see details about the affected entity and a clear error message. To resolve an issue, simply click the ‘EDIT’ button. This will automatically take you to the specific location of the problem, allowing you to identify and correct it.

 

Product previews

The Product previews tab gives you a first overview at your products, showing you how they'll appear in the PIM.

We generate a sample set of products from your uploaded file, based on the families and attributes that have been suggested. You can easily navigate through your products by using the family filter or the search bar.

The attribute preview is designed to give you a quick projection of your data, by creating a sample set of your products.

Please note that:

  • The preview currently supports text, text area, simple select, and multi select attributes. Other attribute types may not be displayed as expected in the preview.
  • The product preview does not guarantee that every attribute will have a value, nor that every line in your uploaded file will necessarily result in a product in the preview tab.
  • The product preview might take some seconds to load.

 

We’ll be soon releasing a way to import these products directly in our PIM, to help you with your catalog initialization.

 

 

History

You can find a version history in the last tab of the feature. This allows you to restore the previous version of your change if needed. The history displays a maximum of 25 revisions.

 

Additional capabilities

  • Download

In the top right corner of the DAA, you'll see additional actions, including the ability to download your files. 

You can download separate CSV files for:

  • Families
  • Family variants
  • Attributes
  • Attribute Options

 

  • Cancel and reset

You also have the option to cancel and reset your model.

 

Specifications

  • Depending on several factors, generating a data model can take anywhere from minutes to hours. A waiting message “Model generation in progress” will be displayed on your screen, and the page will be automatically refreshed to show the model generated. A confirmation email is also sent when the model has been generated.
  • Family and attribute codes will be generated in English by the model but their labels will be generated in each of the locales selected in the first page when initializing the model. Please note that if you deselect English as one of the locales, your codes will still be created in English. However, in that case, labels will not be translated in English, but will be in the other locales that you chose.
  • The LLM used for data model generation is Gemini Flash 2.5. Data sent to Google is neither stored on their servers nor used to train their models.
     

Limitations

  • It's important to note that the data model you're working on is only stored in your browser. Accessing a data model generated with the Data Architect Agent on another user’s browser is not possible for the moment. We recommend downloading your files regularly to ensure they're safely stored.
  • Multiple users in the same PIM environment can generate and store separate data models.
     

Please note that if you clear your browser’s data, you might lose the generated data model.

 

 

Implementing the model

When you are satisfied with your data structure, you can implement your model into your PIM to create your entities, using the ‘Implement’ button on the top right of the screen.

 

We can highly recommend reviewing your data model with a Professional Services consultant or Partner before applying the model to your PIM to make sure it fits your business, and future catalog evolution.

 

 

What kind of context should I give to the AI model?

 

It is important to note that custom instructions always take the priority over a default behavior. It means that if you write something specific, even if it breaks usual PIM rules, the model will follow it. 


The goal is not necessarily to be fully exhaustive, but it is to remove uncertainty and make the output fit better with your customer’s business, tools and their way of modeling data.

 

If you don't give any instructions in the ‘Context’, the AI will still work, as:
- It understands how the PIM works
- It applies best practices by default
- It reads and understands the structure of your input product file
- It produces a valid, coherent taxonomy using standard logic

We highly recommend creating custom instructions to customize your data model and ensure the most accurate responses, especially in cases where:
- Your product data file could be ambiguous (i.e. someone might hesitate when interpreting it)
- Your product data file could be missing some key information
- You want to mirror existing systems (ERP, DAM, legacy schemas)
- You have strong rules about your data format (naming, grouping, tone and so on)
 

Keep in mind that prompts need to be simple and easy to understand.

 

To learn more about prompting, check out our dedicated Akademy course. Additionally, your Professional Services consultant or Partner can help you create an accurate prompt for your need.

 

 

Examples of Custom Instructions you can use (with rationale)

There’s no fixed structured that we would particularly recommend but you can find example prompts you can use to make the model behave exactly how you want - to improve consistency, resolve ambiguities, and align the results with the logic of your catalog, your tools or your business.


Use the variant_axis column to define variant products.

Rationale:

The model can infer good variant axes from your product data — for example, if it sees multiple products with identical names but different colors, it may suggest color as a variant axis. But if you want to enforce a specific structure or match another system (like an ERP or spreadsheet you’re importing from), you can specify it directly. This is especially helpful when your file contains edge cases that could mislead the model, like inconsistent naming or overlapping dimensions.


Treat the details field as a select-type attribute.

Rationale:

If the details field contains recurring values these could be modeled as options in a simple select attribute. But if the field has a lot of different values and not many of them are recuring for different products, the model might otherwise treat it as free text. Adding this instruction helps resolve ambiguity and structure your data better.


Do not reuse attributes from the input file, derive new ones from product descriptions and titles.

Rationale:

This is useful when your current attributes are legacy, incomplete, or don’t reflect your ideal structure. You might want to completely restructure your taxonomy and generate a cleaner model. Once the data model is created, you will have access to the feature ‘AI-Enhanced Enrichment’ to extract structured values from unstructured text, so even if the file only has names or short descriptions, it can infer on attributes like material, compatibility, or power_source. This is helpful when migrating from another system or modernizing a messy dataset.


I want the data model to include the following families: spare_parts, retail_accessories, e_bike_components.

Rationale:

The model usually creates families by analyzing product clusters, but some families might not be created as you want them, or if the model chooses a more granular or more general group instead. Listing the families here ensures they appear as-is. This is also helpful if some products that belong to these families are missing from your file (e.g. partial exports or subset samples). It also helps resolve granularity mismatches (e.g. if you want “retail_accessories” to be a consistent macro-family, even if the model would have split it into “bottle_holders”, “lights”, etc.).


I plan to add a new outdoor_gear product line to my catalog in the future, so create a family for it that should mirror retail_accessories.

Rationale:

This helps to future-proof the structure. The model won’t generate structure for products that aren’t in your file, but if you know you’ll add a product line (like “outdoor_gear”) you can mention it. You can also give the model a reference: saying it should “mirror retail_accessories” tells the model to use the same kind of attribute set, variant logic, etc. This is useful when retail_accessories is well structured in your file and you want the new family to inherit its logic.


Use ALL CAPS for attribute labels. Limit them to 120 characters.

Rationale:

This enforces internal style conventions. While best practices would usually lean towards sentence casing (i.e. capitalizing only the first letter of the first word in a sentence and any other proper nouns), you have the possibility to override to fit other conventions. The model will mostly follow the constraint (especially for casing), but character limits may not always be perfectly respected - still, setting the expectation ensures labels are short and clean in most cases.


Attribute codes should include the unit at the end (e.g. weight_g, length_mm) and the label should match: “Weight (g)”.

Rationale:

In PIM best practices, including units in codes or labels is discouraged as units are usually handled using the ‘Metrics’ attribute type. But if your systems (e.g. Excel exports, ERP schemas) require it for clarity or consistency, you can enforce it using custom instructions. The AI will follow this logic even if it wouldn’t propose it by default.


This catalog is used for B2B procurement workflows. Focus on structured, functional attributes, not lifestyle content.

Rationale:

The model cannot always infer that you’re a B2B manufacturer or retailer from the product file. If you want the taxonomy to prioritize specs, filtering attributes, and procurement-relevant data (over images, descriptions, etc.), we can recommend you to mention it. This shifts the AI’s focus away from ‘marketing output’ towards more structured and filterable fields. If you have a more precise idea of what the model should do with this information, it’s still better to write it down rather than just giving the general “I do B2B”.


There should be an “Unknown” option for the material attribute.

Rationale:

The model usually expect you to leave attributes empty when products are missing a value for this attribute. If your system requires placeholder values for completeness, or if blank fields break exports or validation rules, you can enforce fallbacks.


Do not generate marketing descriptions. Only structure the technical attributes.

Rationale:

You may want a clean catalog setup without narrative content - especially when prepping for B2B, integrations, or technical platforms where descriptions aren’t needed. This tells the model to skip all rich text fields.


This sheet contains metadata (like family names, attributes, variant logic), not product data. Do not treat it like a product catalog.

Rationale:

Sometimes you’re not sending products, but the structure itself. Since the model assumes you’re providing a catalog, you should clarify that the sheet defines the schema, not product rows. If only one file contains the structure, specify which.


Split the specs column using the format “X: Y; A: B” to create multiple attributes.

Rationale:

This enables the model to extract structured fields from messy or semi-structured text. If your file has multiple values in a single cell, this lets you break them out cleanly. The model might already do that by default, but writing it in the custom instructions helps ensure it will follow this guideline.


Example of a full Custom Instruction

I’m a B2B distributor. Use the product_type column to define families and variant_axis for variant grouping. Attribute codes should always end in a unit (_g, mm), and the labels should reflect that (e.g. “Weight (g)”). Group all technical attributes under a tech prefix. Do not introduce any new attributes that aren't already in the file. For when material is missing, add an “Unknown” option.

 

To learn more about prompting, check out our dedicated Akademy course. Additionally, your Professional Services consultant or Partner can help you create an accurate prompt for your need.