Although unique expected energy models can be generated for a given photovoltaic (PV) site, a standardized model is also needed to facilitate performance comparisons across fleets. Current standardized expected energy models for PV work well with sparse data, but they have demonstrated significant over-estimations, which impacts accurate diagnoses of field operations and maintenance issues. This research addresses this issue by using machine learning to develop a data-driven expected energy model that can more accurately generate inferences for energy production of PV systems. Irradiance and system capacity information was used from 172 sites across the United States to train a series of models using Lasso linear regression. The trained models generally perform better than the commonly used expected energy model from international standard (IEC 61724-1), with the two highest performing models ranging in model complexity from a third-order polynomial with 10 parameters (R2adj= 0.994) to a simpler, second-order polynomial with 4 parameters (R2adj= 0.993), the latter of which is subject to further evaluation. Subsequently, the trained models provide a more robust basis for identifying potential energy anomalies for operations and maintenance activities as well as informing planning-related financial assessments. We conclude with directions for future research, such as using splines to improve model continuity and better capture systems with low (≤1000 kW DC) capacity.
Sampling is an important step in the machine learning process because it prioritizes samples that help the model best summarize the important concepts required for the task at hand. The process of determining the best sampling method has been rarely studied in the context of graph neural networks. In this paper, we evaluate multiple sampling methods (i.e., ascending and descending) that sample based off different definitions of centrality (i.e., Voterank, Pagerank, degree) to observe its relation with network topology. We find that no sampling method is superior across all network topologies. Additionally, we find situations where ascending sampling provides better classification scores, showing the strength of weak ties. Two strategies are then created to predict the best sampling method, one that observes the homogeneous connectivity of the nodes, and one that observes the network topology. In both methods, we are able to evaluate the best sampling direction consistently.
Accurate diagnosis of failures is critical for meeting photovoltaic (PV) performance objectives and avoiding safety concerns. This analysis focuses on the classification of field-collected string-level current-voltage (IV) curves representing baseline, partial soiling, and cracked failure modes. Specifically, multiple neural network-based architectures (including convolutional and long short-term memory) are evaluated using domain-informed parameters across different portions of the IV curve and a range of irradiance thresholds. The analysis identified two models that were able to accurately classify the relatively small dataset (400 samples) at a high accuracy (99%+). Findings also indicate optimal irradiance thresholds and opportunities for improvements in classification activities by focusing on portions of the IV curve. Such advancements are critical for expanding accurate classification of PV faults, especially for those with low power loss (e.g., cracked cells) or visibly similar IV curve profiles.