Run your first ETL batch

You can run an ETL Batch through Visual Studio. Normally you would use this method while developing and modifying your solution. Running a batch in production is scheduled through the Dimodelo Management Console, a utility that sits on your server.

What’s a Batch? It’s an old-school description of a multi-step ETL job. In this case, it contains one task per source extract, persistent staging transform, dimension transform and fact transform.

Run your first batch

  1. Make sure that the correct Project Configuration is selected. The Project Configuration is associated with a Dimodelo Data Warehouse Studio Configuration and Workflow files that are used to orchestrate the batch. By default, the Development project configuration is selected. Make sure the Development project configuration is selected in your project.

  2. Review the Project Configuration. Right-click the Training Dimodelo project node in Solution Explorer then click Properties. The Training Property Pages dialog appears.
  3. In the Configuration drop-down, ensure that the active Project Configuration is selected i.e. Active(Development). Click the Configuration Properties page. Use the default Default_Shift_Workflow.xml workflow and Development.cuf Configuration File.  To modify the workflow that the Batch runs when invoked from Dimodelo Data Warehouse Studio, change the ‘Workflow Location’ property. Batch workflow and execution are discussed more in the ‘Managing and Monitoring the ETL batch’ lesson.

  1. Click Ok to close.
  2. To run the ETL Batch through Visual Studio simply select the Dimodelo>Batch menu option. The Batch menu option invokes the Batch Service, passing the Dimodelo Data Warehouse Studio Configuration File and Workflow file that are specified in the active Visual Studio Project configuration.

  1. The batch progress is reported in the Visual Studio “Dimodelo ETL Batch Progress” output tool window at the bottom of Visual Studio. You know the batch is finished when you see the Completed: BatchExecution. Id: 1. The Id is the Id number of this execution of the batch. This becomes important if you want to review the log for the batch.

Trouble Shooting

If the batch failed for any reason, there are several places you can look to understand what happened.

Dimodelo ETL Batch Progress output window

You will see exception messages in the Batch output window. It’s typically long and repeated, but you can scroll to the top of that message, and see the root cause of the issue.

Dimodelo Batch tables

In the target Data Warehouse database, there are a series of tables that capture the batch progress, duration, and status. The batch tables share the schema zbt.. If any tasks fail, the error message will be written to zbt.Task_Execution table. More details can be found in the lesson, ‘Managing and Monitoring the ETL batch’.

Use this query to get all the tasks for the latest batch:

Select T.* 
FROM 
zbt.Batch_Execution B
JOIN (Select Max(Batch_Execution_Id) AS Batch_Execution_Id FROM zbt.Batch_Execution) B1
ON B.Batch_Execution_Id = B1.Batch_Execution_Id
JOIN zbt.Phase_Execution P
ON P.Batch_Execution_Id = B.Batch_Execution_Id
JOIN zbt.Task_Execution T 
ON T.Phase_Execution_Id = P.Phase_Execution_Id
ORDER BY T.task_start_time

Dimodelo Shift Log Files

Dimodelo Shift generates comprehensive log files. Additional details of any error can also be found in either the info log or the error log. The log files can be found in the {Dimodelo Shift Install Directory}/logs.