MeteoIODoc  2.10.0
How to write a data generator

One important thing to keep in mind is that data generators will be used at two totally different stages (see the MeteoIO workflow):

  • in raw data editing, when calling a data creator;
  • when the requested data could not be provided as last resort as data generator.

In the first case, the GeneratorAlgorithm::create() call will be used and the sampling rate will be the original sampling rate none of the data (such as the other parameters) being filtered or resampled. In the second case, most of the time the GeneratorAlgorithm::generate() call will be used and all available data has already been filtered and resampled. In such a case, the goal is to provide reasonable values for the data points that might still be missing. These are either a few isolated periods (a sensor was not functioning) that are too large for performing a statistical temporal interpolation or that a meteorological parameter was not even measured. In such a case, we generate data, generally relying on some parametrization using other meteorological parameters. Sometimes, even fully arbitrary data might be helpful (replacing missing value by a given constant so a model can run over the data gap).

Structure

The selection of which data generator to use at any given time step, for a given parameter is performed by the DataGenerator class. This class acts as an interface, presenting a higher level view to the caller. The data generators themselves derive from the GeneratorAlgorithm class that standardizes their public API. An object factory creates the generator during intialization (keeping all constructed generators in a vector during the whole life time of the DataGenerator object), based on the strings contained in the user's io.ini configuration file.

The API also defines two public "generate" methods, taking a meteorological parameter index (see MeteoData) and either a set of meteo data for one station and at one point in time or a meteo time series for one station. These methods walk through the meteo data looking for nodata values for the requested meteo parameter index. If the generator could successfully generate data for all the nodata values it found, it returns true, false otherwise. If false was returned, the DataGenerator object that manages the process would call the next data generator, in the order that was declared by the user. For a given meteo parameter, the whole process stops as soon as a true is returned or there are no more data generators to try (as declared by the user in his configuration file).

Implementation

It is therefore necessary to create a new class as two new files in the dataGenerators subdirectory (the implementation in the .cc and the declaration in the .h), named after the generator that will be implemented and inheriting GeneratorAlgorithm. You can start by making a copy of the dataGenerators/template.cc (or .h) that you rename according to your generator. Please make sure to update the header guard (the line "#ifndef GENERATORTEMPLATE_H" in the header) to reflect your generator name. Three methods need to be implemented:

  • the constructor with (const std::vector< std::pair<std::string, std::string> >& vecArgs, const std::string& i_algo, const std::string& i_section, const double& TZ)
  • bool generate(const size_t& param, MeteoData& md)
  • bool generate(const size_t& param, std::vector<MeteoData>& vecMeteo)

The constructor is responsible for parsing the arguments as a vector of strings and saving its own name internally, for error messages, warnings, etc. It should set all internal variables it sees fit according to the parsed arguments. The goal is to not do any parsing anywhere else (for performances reasons).

The generate(const size_t& param, MeteoData& md) method compares md(param) with IOUtils::nodata and replaces it by its generated value if necessary. It returns true if no further processing is needed (ie. no replacement was needed or the replacement could be done) or false otherwise.

The generate(const size_t& param, std::vector<MeteoData>& vecMeteo) method compares vecMeteo[ii](param) with IOUtils::nodata for each timestamp in the vector and tries to generate data when necessary. If all missing data points could be generated (or if no data point required to be generated), it returns true, and false otherwise.

Finally, a new entry must be added in the object factory GeneratorAlgorithmFactory::getAlgorithm method at the top of file GeneratorAlgorithms.cc.

Documentation

The newly added data generator must be added to the list of available algorithms in GeneratorAlgorithms.h with a proper description. Its class must be properly documented, similarly to the other data generators. An example can also be given in the example section of the same file. Please feel free to add necessary bibliographic references to the bibliographic section!