This study investigates outage statistics in the Swedish power system. More specifically, this paper highlights the critical importance of addressing data quality issues such as inconsistencies and missing values, including unknown outage causes and unidentified faulty equipment. Existing research often overlooks the depth of these data quality challenges, leaving significant gaps in the reliability and utility of outage statistics. This paper reveals noticeable deficiencies in the current data and proposes a structured format for improving outage reporting through a database with three relations: outage summary, outage breakdown, and customer breakdown. To tackle these issues, a detailed qualitative analysis of the data is conducted, complemented by the exploration and testing of various machine learning algorithms. These algorithms are employed to predict unknown values within the dataset, thereby offering a twofold solution: enhancing the accuracy of outage data and enabling more precise analytical capabilities. Specifically, methods such as decision trees and random forests are utilized to address the data gaps. The findings and proposals within this work not only illuminate the current challenges in outage data management but also pave the way for more robust, data-driven decision-making in outage management and policy formation.
QC 20240815