So far, every query has been executed in real-time. In some cases, it can be helpful to persist data. Data Virtuality Server has two concepts for this:
- Materialization: Equals to what is commonly understood under caching to improve query performance. The content of a view/table is precalculated and saved in a table in the Analytical Storage. All subsequent queries involving that view/table will then be served from that table. Materializations are transparent to the data consumer and can be enabled or disabled flexibly as needed.
- Replication: Allows copying data to a dedicated target table using different strategies. Typical use cases are appending increments of data to a target table to accumulate a longer history of data over time or tracking changes of data using Slowly Changing Dimensions.
Further details can be found in our documentation.
Materialize Your Data
Setting Up A Materialization Job
To improve the performance of our previously created view
sales.sales_analytics, we are creating a materialization job by right-clicking on the view and selecting “Materialize / Create materialization job”. The simplest method is a complete materialization where the data gets replaced completely on every execution. Our below example will cover the more sophisticated incremental materialization for best performance.
In the dialogue, we choose incremental materialization with the following details:
- Row check field: This is the field which allows identifying if a row has changed, typically a date or timestamp. In our case of
modified_date, the algorithm will determine the local maximum
modified_dateand only query the source for rows where the
modified_dateis greater or equals the local maximum.
- Identity field: This is the primary key of the view, to remove duplicates. In our case
Click “OK” and complete the wizard by specifying when the data is to be updated, e.g. daily at 12:00:
Enabling the Materialization
At this point, the materialization job is configured and waiting to be executed according to the schedule. But it is not enabled yet, meaning the query is still virtualized.
As in our example, the schedule hasn’t triggered yet, we run an initial full load of the data by right-clicking on the view and selecting “Materialize / Now”. When refreshing the list of views (), you should see the icon of the view has changed from green to red to indicate it is now materialized:
Further details of the job, like the execution history, is available in the jobs tab ().
You can disable a materialization at any time by selecting “Find optimization” from the context menu and disabling the shown entry.
Replicate Your Data
We just materialized our view sales.sales_analytics for better performance, but let’s assume the data sources do not retain all orders indefinitely, but only for a certain time window. We now want to create a replication job to regularly insert the current rows into a target table.
To replicate a view or table to the analytical storage, right-click on the view/table and select “Replicate / Create replication job”. In the replication job setup dialogue, we will set up a batch replication:
- New Analytical Storage Table: This is the name of a regular, physical table inside the Analytical Storage, that will automatically be created on the job’s first run if it doesn’t exist yet.
- Identity field: This is equivalent to a primary key, in this case, the
order_id. On each job execution, the algorithm will retrieve all
order_idvalues from the source view (
sales.sales_analytics) and delete all overlapping rows from the target table (
sales_history). Then, all rows returned from the source view are inserted into the target table.
After clicking “OK”, select a schedule when data should be replicated, and click “Finish”. Now the job is ready and waiting to be triggered by the schedule. You can go to the jobs tab for further details of the job, or also to trigger it once immediately ().
Once the job has run, you can open the Analytical Storage, click refresh () and then see the newly created
sales_history table showing up.
For more details on other replication algorithms, please visit our documentation.