The Dimodelo Data Warehouse Studio code generation process

This post discusses in detail how Dimodelo Data Warehouse Studio generates code to create and manage a Data Warehouse, including creating and managing Data Warehouse  Databases, Tables and ETL procedures. While Dimodelo Data Warehouse Studio has a set of predefined Generation Templates and Generation Providers , it  is also  extensible, allowing end users to create their own Generation Templates and Generation Providers.

Generation Components

Several components are involved in the Code Generation process.

  • A Generation Service which drives the generation process.
  • A Dimodelo_Meta_Data.xml file. A de-normalized view of the Data Warehouse design Meta Data stored in a  Dimodelo Data Warehouse Studio project. Dimodelo_Meta_Data.xml makes creating Generation Templates far simpler.
  • Generation Providers. Generation Providers are used by the Generation Service to do the actual transformation of Dimodelo Data Warehouse Studio Meta Data into Code. Dimodelo Data Warehouse Studio provides an API for Generation Providers, so Data Warehouse professionals can create their own Generation Providers.
  • Generation Templates. Generation Templates are passed to the Generation Provider along with Dimodelo_Meta_Data.xml. The Generation Provider knows how to execute the Template to produce the intended Code. For example an XSLT Generation Provider is passed an XSLT Generation Template, and executes the template, passing  Dimodelo_Meta_Data.xml as the input, to produce the output code file. Templates exist for each of the different types of code objects you wish to produce, SSIS Extract packages, SSIS Dimension Transform Packages, Table DDL etc.

The Generation Process

1. The User initiates the Generation process

A user can execute the generation process by selecting the Generation option on the Dimodelo Menu of Dimodelo Data Warehouse Studio.

2. Generate Dimodelo_Meta_Data.xml

Prior to executing the generation process, the Generation Service refreshes the Dimodelo_Meta_Data.xml file so that it contains the latest project meta data.

3. Load Generation Templates

The Generation Service first retrieves a list of Generation templates for the project. Each project has its own set of templates, which are usually stored in the ProjectTemplates folder of the project directory, although this can be configured using the ‘Generation Template path’ in the project Config file. Each Generation Template is described by a simple manifest file in the same directory (.mnf extension). E.g.

<?xml version="1.0"?>
<Generation_Template_Manifest>
  <Output_Relative_Location>SSIS_Project</Output_Relative_Location>
  <Template_Display_Name>Extract Procedures</Template_Display_Name>
  <Generation_Engine_Name> XSLT Generation Engine </Generation_Engine_Name>
   <Generate_For_Each>Staging/*.sg<Generate_For_Each>
  <Generation_Result_File_Name_Pattern>%docName%.dtsx</Generation_Result_File_Name_Pattern>
  <Template_Relative_Location>Extract_SSIS.xslt</Template_Relative_Location>
  <OperatesOn>Staging</OperatesOn>
  <Generates_For_Collection>
    <Generates_For_TargetType>Staging</Generates_For_TargetType>
  </Generates_For_Collection>
</Generation_Template_Manifest>

The manifest, amongst other things, contains the path to the actual Generation Template and designates which Generation Provider should be used to execute it.

4. Call Generation Providers

The Generation Service calls the appropriate Generation Provider for each Generation Template, passing the Template, Template Manifest, Dimodelo_Meta_Data.xml and the active Project Configuration file. Each target environment will have its own Config file.

The Generation Service looks for Generation Providers at the ‘Generation Provider Path’ config variable. Generation Providers are described by simple manifest files in the same directory. E.g.

<?xml version="1.0" encoding="utf-8" ?>
<Generation_Engine_Manifest>
 <Generation_Engine_Name>XSLT Generation Engine</Generation_Engine_Name>
 <Generation_Engine_Class>com.dimodelo.generate.XSLTGenerationEngineProvider</Generation_Engine_Class>
 <Generation_Engine_Assembly>XSLTGenerationEngineProvider.dll</Generation_Engine_Assembly>
</Generation_Engine_Manifest>

The manifest contains the Generation Provider name, class and the dll that contains the class. This information is used by the service to invoke the provider.

5. Generate Code

The internal workings of each Generation Provider is specific to that provider. It must match the prescribed Generation Provider interface, but that is all. The provider should know how to execute the given template. Currently there is an SSIS Package Generation Provider, a generic XSLT Generation Provider, and a SSIS Project Generation Provider.

A typical process flow of a provider follows:

Retrieve each of the files in the project that match the file pattern in the <Generate_For_Each> tag of the Template Manifest.

For each matching file

Execute the Template, passing an identifier (usually the Unique Id of the Staging Table, Dimension or Fact)

Add the output to a Generation Result array. The array contains an output document name and an output string which contains the result of the transformation. The document name is defined by the document name pattern in the <Generation_Result_File_Name_Pattern> tag of the Template Manifest.

End For

Return the Result  array to the Generation Service.

6. Save the Resulting Code

The Generation Service takes the Result array returned by the Generation Provider and saves each output item to a file name defined by document name provided in the array. The Generation Service combines the ‘Generation Output Path’ configuration value in the project Config file along with the <Output_Relative_Location> of the Template manifest to determine the path at which to write the file.

Dimodelo Config Files

A Dimodelo Data Warehouse Studio project can contain multiple Dimodelo Config files in the Config folder of the project. The intention is that each target environment will have its own Dimodelo Config file. The Dimodelo Config files contain the following information:

  • Connection strings for Source systems, and the Staging and Data Warehouse databases.
  • Where the generated code is written to.
  • Where code is deployed to.
  • Other custom Meta Data.
  • Where the Dimodelo Data Warehouse Studio finds Generation Templates, Genration Providers, Deploy Providers and Batch task providers.
When the Generate, Deploy and Batch processes are executed Dimodelo Data Warehouse Studio uses the active Project Configuration to determine which Dimodelo Config file to use in the process. Project Configurations are a standard Visual Studio function.
Visual Studio Project Configuration
Visual Studio Project Configurations
Dimodelo Data Warehouse Studio has extended project configurations to accept a path to a Dimodelo Config fle. For more information about Project Configurations visit MSDN: Build Configurations.
Before executing the Generate, Deploy or Batch processes, select the desired project configuration.

Conclusion

Dimodelo Data Warehouse Studio has a very flexible and extensible generation architecture. While the predefined Generation Templates will suit most situations and skill levels, advanced users who want to implement their own code framework can create Generation Templates and Generation Providers to suit their needs. Shortly we will post an article describing how to create your own generation template for generating SSIS packages.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

One Comment