Data Quality Control

The reliability of the data in Global Wave Statistics Online depends both on the validity of the analysis methods and on the quantity and quality of the observations used. It also depends on the reliability of the checking routines incorporated to ensure that the data processing operations were all correctly executed.

The original NMIMET program included an extensive range of quality control procedures, and provided corresponding diagnostic output in both tabular and graphical form. The interpretation of such diagnostic output calls for a degree of skill and experience and for the quantity of data involved in this database the task would have been both tiring and time consuming. In planning the specially adapted version of NMIMET used for generating the data, therefore, considerable effort was devoted to increasing the extent of automation in the quality control process [36].

For each run a summary of numerical diagnostic information was printed, which displayed the results of applying various tests of quality, with predetermined failure criteria based on experience. The failure criteria were subject to adjustment as the work proceeded, and a warning symbol was printed against each failed test.

The tests included monitoring the values of all the modelling parameters and the standard errors of the corresponding data fitting. They also included, for example, checking whether the numbers of joint wave and wind observations available were above an acceptable threshold, and monitoring the comparison between the overall mean values and standard deviations of the 'smoothed' and raw wave heights.

This automated monitoring procedure proved effective in minimizing the risk of human error when checking such large quantities of data. Because of the basic reliability of the NMIMET analysis and the effective quality control of the input data, the number of significant violations of the test criteria detected when the number of observations was acceptable was extremely small. In some areas the number of datasets that have insufficient wave data is quite high, but this simply means that the number of observations in the respective data category was too small. This occurs most frequently in areas where the total observation count is relatively small. From a total of 4680 data sets (9 directional sectors x 5 seasons x 104 sea areas) only 9 are marked as rejected by wave data analysis, indicating violations of the quality control criteria for the modelling parameters in cases where there are sufficient observations.

Another reassuring feature of the quality control monitoring was the low level of the standard errors of the data fitting, and the generally very close agreement between the overall mean values and standard deviations of the raw and NMIMET distributions of wave height.