Training Transparency in the EU: Understanding the EU’s GPAI Training Transparency Template

August 11, 2025

Jelena from Whisperly

As the development and deployment of general-purpose AI (GPAI) systems accelerates, regulatory frameworks across the globe are catching up, demanding greater transparency, accountability, and governance from AI providers. Among the most significant developments in this area is the European Commission’s adoption of a standardized GPAI Training Transparency Template, introduced as part of the EU’s broader strategy to operationalize the EU AI Act.

This new template is not just another bureaucratic form. It’s a signal. For companies building or integrating GPAI models, it marks a clear shift toward deeper scrutiny of the data, processes, and decisions that shape these powerful systems. At the core of this initiative is a simple but critical expectation: know what went into training your AI, and be prepared to explain it.

In this post, we’ll break down what the template requires, who it applies to, and how your company can begin preparing, especially if you’re already working with GPAI systems or planning to do so.

Whether you’re an AI developer, legal advisor, or compliance officer, understanding these new transparency obligations is now essential for operating responsibly and legally in the European market.

What Is the GPAI Training Transparency Template?

The GPAI Training Transparency Template was introduced by the European Commission as part of its efforts to implement the EU AI Act, which outlines obligations for providers of general-purpose AI models.

This template serves as a standardized reporting framework designed to increase transparency around the development and training of GPAI systems. Its core purpose is to ensure that both regulators and downstream deployers can better understand the origins, limitations, and potential risks associated with these models.

The template requires providers to disclose detailed information about the data used for training, including the types and sources of datasets, any applied data curation or filtering methods, and whether copyrighted materials were involved.

It also covers essential aspects of data governance, such as quality assurance processes, risk mitigation measures, and documentation of model iterations. By formalizing these disclosures, the template aims to foster responsible AI development while easing the compliance burden through a uniform, predictable approach.

When Does the Transparency Obligation Come Into Effect?

As of August 2, 2025, the obligation to publish a public summary of the training content for general-purpose AI models is officially in force. This means that any GPAI model placed on the EU market from this date onward must be accompanied by a completed transparency template, in line with the requirements of the EU AI Act.

For models that were placed on the market before August 2, 2025, the rules provide a transitional period: providers must ensure that the required summaries are published no later than August 2, 2027. This gives companies a two-year window to retrospectively document training data for legacy models still in use or being made available.

In cases where it is genuinely not possible to recover all the required information (for example, due to technical limitations or disproportionate effort), providers are not exempt. However, they are allowed to explicitly indicate and justify any missing details in their published summaries. This must be done transparently and in good faith, ensuring that the summary remains as complete and informative as reasonably possible under the circumstances.

Who Needs to Pay Attention – And Why?

The obligation to use the GPAI Training Transparency Template is not optional – it is legally mandated under Article 53(1)(d) of the EU AI Act. According to the European Commission’s latest guidance, any provider of general-purpose AI models that places such models on the EU market must publish a summary of the training content using the official template. This includes not only commercial developers but also those releasing models under free and open-source licenses, as well as providers whose models present systemic risks due to their scale or capabilities.

This requirement affects three main categories of stakeholders:

Developers of GPAI models, whether based in the EU or abroad, who train and place such models on the Union market. They bear primary responsibility for creating and publishing the required summaries.
Companies that integrate GPAI into their services, even if they are not the original developers, must understand the transparency obligations associated with the models they use. For some downstream users, due diligence and documentation may be needed to confirm compliance, especially if the GPAI model forms part of a high-risk AI system.
AI-as-a-Service providers, offering GPAI models via APIs or platforms, also fall under this umbrella. If they act as providers in the legal sense, releasing or making models accessible on the EU market, they, too, must adhere to these requirements.

The Commission emphasizes that the transparency template is intended to be simple, consistent, and effective, helping providers meet their obligations without excessive administrative burden.

However, the stakes remain high: non-compliance can lead to regulatory penalties, reputational damage, and commercial restrictions. Moreover, the extraterritorial reach of the AI Act means these obligations extend to any company worldwide offering GPAI models in the EU.

In short, if your business builds, integrates, or distributes GPAI models in any way within the European market, you need to comply. Now is the time to ensure that your documentation, internal processes, and governance tools are aligned with this new regulatory baseline.

What Happens When Models Are Modified or Fine-Tuned?

In many cases, general-purpose AI models are not used in their original form. They are adapted, fine-tuned, or further trained by downstream developers for specific applications. When such a modification is significant enough that the modifying party qualifies as the new provider under the EU AI Act, that entity inherits the responsibility for publishing a transparency summary.

However, there is no requirement to re-document the entire training history of the base model. Instead, the new provider must disclose only the data used in the additional training or fine-tuning process. The summary should also clearly state the name and version of the modified model to ensure traceability and avoid confusion with the original system.

In situations where multiple versions or variants of a model are built using the same additional training data, they may be covered by a single shared summary, as long as the information is identical across all variants. In such cases, the summary must explicitly list which models and versions it applies to.

If different modifications rely on distinct datasets, then separate summaries are required. Each of these must include a direct reference to the original model and its published summary. This layered approach helps preserve transparency across the full model lifecycle, from foundational training to downstream fine-tuning, without imposing unnecessary duplication of effort.

Key Obligations for Companies

The GPAI Training Transparency Template sets out a clear and standardized framework for disclosing how general-purpose AI models are trained. It is structured around three main categories of information, all of which providers must address when making their summaries public. These categories establish the minimum level of transparency expected under the EU AI Act and form the basis of a model provider’s legal compliance obligations.

The first section, General Information, focuses on identifying the provider and the model, along with describing the types of data used for training, whether text, images, audio, or video, and estimating the volume per data type.
The second section, a List of Data Sources, requires a breakdown of the origins of training content, including public and private datasets, web-scraped material, synthetic data, and any data derived from user interactions. For scraped content, providers must go further by disclosing, among other things, the crawling tools used, collection periods, the types of content gathered, and the most frequently targeted domains. SMEs have a reduced disclosure threshold, but must still provide meaningful insight into their practices.
The third section addresses Relevant Data Processing Aspects, requiring transparency around issues that directly affect third-party rights. This includes identifying whether copyrighted material was used and explaining how lawful text and data mining practices were followed under EU copyright law.

Providers must also confirm if data from their own user interactions was included in training and describe the types of services or platforms where that data was collected, without disclosing personal information. Additionally, providers are expected to report any measures taken to detect and remove illegal content, as well as any technical or organizational steps implemented to mitigate training-related risks.

These documentation and transparency duties are not just about satisfying regulatory checklists. They closely align with broader AI governance principles – accountability, explainability, and fairness—by forcing providers to trace and disclose the building blocks of their systems.

Where and When Must the Transparency Summary Be Published?

The obligation to disclose the training summary is not only about what companies must publish, but it’s also about when and where they must do so. According to the EU Commission’s requirements, the transparency summary must be made publicly available no later than the moment a GPAI model is placed on the EU market. In other words, the summary is not a follow-up document—it is a precondition for market access.

To meet this requirement, the summary must be published on the provider’s official website, in a prominent and easily accessible location. It should clearly indicate which model or model version it refers to, avoiding any ambiguity for regulators, downstream users, or the general public. This level of clarity is crucial for traceability and legal accountability.

In addition to being hosted on the provider’s website, the summary must also be published alongside the model wherever it is distributed, including public platforms such as open-source repositories, developer hubs, or AI model marketplaces. The goal is to ensure that anyone accessing the model, whether for integration, evaluation, or research, can easily access the corresponding transparency information at the point of use.

Do Transparency Summaries Need to Be Updated?

Yes! Publishing a transparency summary is not a one-off exercise. If a general-purpose AI model is further trained on new datasets after its initial release, the provider is required to update the published summary to reflect the changes. This ensures that the disclosed information remains accurate and relevant over time, particularly when additional training could affect the model’s capabilities, risks, or legal implications.

The rules set out two key triggers for updating the summary:

Time-based updates: At a minimum, providers must review and, if necessary, update the summary every six months.
Material updates: If further training introduces significant new data or changes that materially alter the contents of the summary, an update must be made immediately, without waiting for the six-month interval.

Each revised summary must clearly indicate the date of the update and detail the new training data or changes that prompted it. In addition, the updated version must be published at the same time as the modified model, both on the provider’s official website and across any public distribution channels where the updated model is made available.

Common Mistakes and How to Avoid Them

While the GPAI Training Transparency Template provides a structured pathway to compliance, many providers risk falling short by overlooking key details or underestimating the effort required. Below are some of the most common mistakes and how to proactively avoid them.

Vague or Incomplete Disclosures

Statements like “We use publicly available data” may sound sufficient, but they fall well short of the required standard. The template demands specificity, not just the type of data, but its origin, method of collection, and, where applicable, legal basis for use (e.g., under copyright exceptions or licensing agreements). To comply, companies must clearly identify datasets, describe selection criteria, and provide details such as scraping tools used, collection timeframes, and top data sources.

Leaving Legal and Governance Teams Out of the Loop

AI development is often driven by data science and engineering teams, but compliance is a cross-functional effort. Delayed involvement of legal, privacy, and data governance professionals can lead to gaps in documentation, missed risks (especially around intellectual property or personal data), and non-compliance with overlapping legal obligations. Best practice is to involve relevant teams early in the model lifecycle, ensuring that transparency is built into design and documentation, not retrofitted later.

Treating Documentation as a One-Time Task

Training summaries are living documents, not static records. Failing to update them when models are retrained or fine-tuned is a common oversight that can result in outdated or misleading disclosures. To stay compliant, providers should implement version control and documentation workflows that track when and how models are updated, and ensure that summaries are refreshed accordingly, either on a six-month cycle or as soon as materially significant changes occur.

How Whisperly Can Support Your Compliance Journey

Complying with the EU AI Act’s transparency obligations isn’t just about checking boxes—it requires coordinated documentation, cross-team collaboration, and clear versioning as models evolve. That’s where Whisperly comes in.

Whisperly is built specifically to help AI developers, legal teams, and compliance officers document and manage the lifecycle of general-purpose AI models. With Whisperly, you can:

Easily capture and organize training data records, including sources, modalities, filtering processes, and copyright considerations.
Automatically generate structured transparency summaries that align with the EU’s official GPAI Training Transparency Template, ready for publication across all required channels.
Track updates and model versions through built-in version control, ensuring summaries stay up to date with evolving systems.
Centralize compliance activities across teams, giving legal, technical, and governance stakeholders a single source of truth.

Whether you’re fine-tuning models in-house or deploying open-source GPAI solutions at scale, Whisperly helps you move from manual spreadsheets and last-minute disclosures to a streamlined, audit-ready compliance workflow.