Since various data corresponding to the photovoltaic power station are also recorded when the photovoltaic power station is closed or in an abnormal power generation state, these data are abnormal. If the abnormal data is used in the training of the prediction model, this usually increases the error of the prediction model. In the model prediction of neural network, the sample data has a great influence on the prediction result, so it is necessary to preprocess the sample data, so that the training process of the model becomes smooth and fast, and its performance is greatly improved. therefore. The sample data is preprocessed to eliminate redundant and abnormal data, which improves the training performance of the Elman neural network prediction model.
- Data preprocessing principles
In data mining, the quality of the training data samples is very important. The sample data not only needs to have good integrity and few elements, but also the correlation between the sample data must be very small. In actual sample data, repeated, Messy, not only data, these data do not meet the requirements of data mining, so it is necessary to remove invalid sample data that is not needed by the prediction model
Different data mining algorithms have different data preprocessing methods and steps. Usually, the data preprocessing steps are as follows.
(1) Data integration
Data integration is to organically gather data from different sources, different units, different formats, and different attributes physically or logically. It is not simply listed together. The job of data integration is to solve the problem of data inconsistency. In addition, it also includes selection and integration of useful data and information.
(2) Data cleaning
The data processing in the previous step is very rough. There are usually data noise and irrelevant data in the original data. These invalid data will have a serious impact on the prediction results. Therefore, the data should be further cleaned to remove invalid data.
(3) Data transformation
After the data are integrated, their attributes will be inconsistent, and the sample data with inconsistent attributes will have different influences on the prediction results, so it is necessary to transform the data sets with different attributes. Data transformation can be carried out by means of dimensional transformation or transformation, or data transformation can be carried out by normalizing, summarizing, and projecting the data.
(4) Integrity analysis
After a series of processing of the original data, it is necessary to check whether it is complete. Only when the sample data set is complete can it have good accuracy in the mining process.
- Prediction model input data normalization processing
The input data of the short-term prediction model of photovoltaic power plants are mainly the historical power generation of photovoltaic power plants, monitored meteorological data and real-time numerical weather forecast data, which include daily power generation, solar irradiance, temperature, etc. Since the power generation at the same time on different days with different types of days is very different, it is necessary to establish short-term photovoltaic forecast models under different weather types, and the sample data should also be divided into different sub-models according to weather types. These sample data, It is divided into training data, validation data and test data in chronological order.
Usually the nonlinear activation function used in the neural network is the S-function, and the output value is limited between (0, 1) or (-1.1), and the unnormalized original data and output data are often not within this interval. During training, it will cause the occurrence of neuron sum phenomenon. In order to avoid this situation, data normalization is required, so that the sample data and output data are limited to the [0, 1] interval, as shown in formula (1);

- Eliminate abnormal data
In this paper, the 3 delta principle in the statistical discrimination method is used to eliminate abnormal data. Its principle is to give a confidence probability and specify a confidence domain. If the error is greater than this error, it is considered that the error does not belong to the random error range, then this value It was identified as abnormal data and was eliminated. The formula is shown in formula (2). The basic idea is to assume that the data as a whole obeys a normal distribution, then the probability of (-∞, μ-3 δ) or (μ+3 δ, +∞) data appearing in the experimental data is very high. Small. Therefore, data outside the interval of (-∞, μ-3 δ) or (μ+3 δ, +∞) are regarded as abnormal data and excluded from the input array.

- Extract feature subsets
In applications in the field of artificial intelligence (such as building regression models, training classifiers, etc.), finding a suitable set of eigenvalues as input variables directly affects the training rate, complexity, and generalization ability of the model, which can improve the classifier. It can improve the training speed, reduce the complexity, and improve the generalization ability. It can also effectively avoid the disaster of dimensionality and improve the recognition accuracy.
In this paper, the principal component analysis method is used to extract feature subsets. The variance of a linear combination, in order, is called the first principal component. The next linear combination will be considered only when the current principal component is not sufficient to represent the original indicator information. In order to avoid the redundancy of content, the information of the previous principal component does not need to appear in the following principal component. The main steps are as follows:
① Establish a correlation matrix R from the normalized original data, and calculate the eigenvalues and eigenvectors of R. As shown in formula (3):

Read more: Photovoltaic R&D