The research community is addressing a number of issues in response to an increased reliance of organisations on data warehousing. Most work addresses aspects related to the internal operation of a data warehouse server, such as selection of views to materialise, maintenance of aggregate views and performance of OLAP queries. Issues related to data warehouse maintenance, i.e. how changes to autonomous sources should be detected and propagated to a warehouse, have been addressed in a fragmented manner.
We have shown earlier that a number of maintenance policies based on source characteristics and timing are relevant and meaningful to single source views. In this report we detail how this work has been extended for multiple sources. We focus on exploring policies for data integration from heterogeneous sources. As the number of policies is very large, we first analyse their behaviour intuitively with respect to broader source and policy characteristics. Further, we extend the single source cost model to these policies and incorporate it into a Policy Analyser for Multiple sources (PAM). We use this to analyse the effect of source characteristics and join alternatives on various policies. We have developed a Testbed for Maintenance of Integrated Data (TMID). We report on experiments conducted to validate the policies that are recommended by the tool, and confirm our initial analysis. Finally, we distil a set of heuristics for the selection of multi-source policies based on quality of service and other requirements.