In this paper, we analyze the interpretable models from real gasification datasets of the project “Centre for Energy and Environmental Technologies” (CEET) discovered by symbolic regression. To evaluate CEET models based on input data, two different statistical metrics to quantify their accuracy are usually used: Mean Square Error (MSE) and the Pearson Correlation Coefficient (PCC). However, if the testing points and the points used to construct the models are not chosen randomly from the continuum of the input variable, but instead from the limited number of discrete input points, the behavior of the model between such points very possibly will not fit well the physical essence of the modelled phenomenon. For example, the developed model can have unexpected oscillatory tendencies between the used points, while the usually used statistical metrics cannot detect these anomalies. However, using dynamic system criteria in addition to statistical metrics, such suspicious models that do fit well-expected behavior can be automatically detected and abandoned. This communication will show the universal method based on dynamic system criteria which can detect suitable models among all those which have good properties following statistical metrics. The dynamic system criteria measure the complexity of the candidate models using approximate and sample entropy. The examples are given for waste gasification where the output data (percentage of each particular gas in the produced mixture) is given only for six values of the input data (temperature in the chamber in which the process takes place). In such cases instead, to produce expected simple spline-like curves, artificial intelligence tools can produce inappropriate oscillatory curves with sharp picks due to the known tendency of symbolic regression to produce overfitted and relatively more complex models if the nature of the physical model is simple.
Published on 01/01/2022
DOI: 10.3390/axioms11090463
Licence: CC BY-NC-SA license