The Backup And Archive Process

As in operational systems, the data within the data warehouse is backed up regularly in order to ensure that the data warehouse can always be recovered from data loss, software failure or hardware failure. Backup and recovery strategies need to be put in place. In archiving, older data is removed from the system in a format that allows it to be quickly restored if required. For example, in a retail sales analysis data warehouse there may be a requirement to keep data for 3 years, with the latest 6 months being kept online. In this sort of scenario there is often a requirement to be able to do month-on-month comparisons for this year and last year. This will require some months of data to be temporarily restored from archive.

It is common to archive data as a flat file extract, where the file is in a format that allows the data to be fast-loaded directly into the relevant fact and dimension tables. One issue that needs to be addressed is the fact that as the data warehouse evolves, the reference data, the structure of the fact data and of any related information may change. To ensure that a restored archive is valid, you may need to extract all related data and structures as well. Data warehouses that contain summary data potentially provide a number of distinct data sources to respond to a specific query. These are the detailed information itself, and any number of aggregations that satisfy the query’s information need.

Leave a Reply