This is the accepted version of a paper presented at 42nd Annual Conference of IEEE Industrial Electronics Society (IECON).

Citation for the original published paper:

Delay-free parallelization for real-time simulation of a large active distribution grid model.
In: IEEE conference proceedings
https://doi.org/10.1109/IECON.2016.7793885

N.B. When citing this work, cite the original published paper.

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Permanent link to this version:
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192954
Delay-Free Parallelization for Real-Time Simulation of a Large Active Distribution Grid Model

Hossein Hooshyar and Luigi Vanfretti
KTH Royal Institute of Technology
Stockholm, Sweden
hosseinh@kth.se, luigiv@kth.se

Christian Dufour
OPAL-RT Technologies
Montréal, Québec, Canada
christian.dufour@opal-rt.com

Abstract—Parallel computations for real-time simulation of large power system models is conventionally performed by using propagation delays embedded in line models. However, using the same approach to simulate distribution grids results in modelling inaccuracies, due to the short lengths of most distribution feeders. This paper illustrates the drawbacks of the delay-based parallelization technique by assessing the dynamic response of a large active distribution grid model and applies a delay-free parallelization technique (i.e. the State-Space-Nodal algorithm) to address inaccuracy issues inherent to the delay-based approach.

Keywords—distribution grid; parallel simulation; real-time simulation; SSN

I. INTRODUCTION

Since 2013, the European Commission has financed the IDE4L (Ideal Grid for All) project which aims to define, develop and demonstrate a distribution grid automation system architecture for active distribution grids [1]. In order to evaluate the merits of the proposed architecture and assess specific functions, a reference distribution grid model was developed in [2] for real-time hardware-in-the-loop simulation studies. The model was implemented in MATLAB/Simulink and was modified for use with the OPAL-RT real-time simulator.

Due to the grid model’s high computational requirements, the parallel computing capabilities made available in OPAL-RT’s simulator were utilized to comply with real-time simulation needs and constraints. This was done by using the classical propagation delay technique, through the delays embedded in line models, which allows us to decouple and parallelize the equations. However, due to the short length of the distribution feeders, this approach induces modeling accuracy problem as it alters the impedance of the grid.

Because of the limitations and inaccuracies that the classical delay-based parallelization technique poses for the simulation of large distribution grids, there have been recent efforts to develop delay-free solvers, one of which has resulted in the State-Space-Nodal (SSN) algorithm [3]. The SSN algorithm is a delay-free solver and is currently part of the ARTEMIS add-on to the SimPowerSystems blockset for Simulink [4]. SSN makes it possible to simulate large distribution grids without the need to add artificial delays that alter the impedance of the grid [5].

This paper discusses the inaccuracy product of using the
c conventional delay-based parallelization technique, and presents the application of the SSN solver for real-time simulation of a large active distribution grid model. The paper begins by presenting a summary on common parallelization techniques used for power system real-time simulation in Section II. In Section III, the IDE4L grid model is described. In Section IV, the delay-based and the delay-free SSN-based approaches, used for real-time simulation of the grid model, are presented. In addition, the two approaches are compared in terms of computational burden and modeling accuracy. Conclusions are drawn and future work are discussed in Section V.

II. PARALLEL COMPUTATION TECHNIQUES IN POWER SYSTEM SIMULATION

Power system real-time simulation is typically challenging because of the large system of equations representing these systems. Various techniques exist to parallelize the computations required for simulation. This section summarizes common techniques used for parallel computation in power system simulation and highlights the problems of their application for the simulation of distribution grids.

A. Parallelization Based on Traveling Waves in Long Transmission Lines

For transmission system models, it is possible to take advantage of existing natural delays on long transmission lines due to traveling waves, as used in the Bergeron line model with losses and wideband frequency-dependent line models [6]. Note that the transmission lines described as “long” are those whose propagation delay is longer than the simulation time-step.

The built-in propagation delay of such models is used to separate the network into subsystems that can be solved independently, and consequently, subsystems with smaller admittance matrices can be solved in much shorter time. Using this technique, large networks can be simulated in real-time. For instance, using the Hypersim power system real-time simulator, grids with one thousand 3-phase buses can be simulated in real-time using this technique [7].

B. Parallelization Using Stubline Blocks

The delay-based parallelization technique cannot be used for power system models that do not contain “long"
transmission lines. In this case, one can add an artificial decoupling element called a “Stubline”, which is basically a Bergeron line model with losses that is adjusted to have a propagation delay of exactly one time-step.

Stublines add capacitance and inductance to the model where they are inserted. The common practice is to substitute an already existing inductance with a Stubline. In this case, the additional capacitance is small and simulation accuracy can be preserved. For example, one can often replace transformer leakage inductance with Stublines, producing a Stubline Transformer that can be used to decouple equations while preserving acceptable accuracy. The application of Stubline blocks is further shown in Section IV.A.

C. Parallelization Using State-Space-Nodal Algorithm

Stublines are difficult to use in power system models where neither long lines nor large inductances exist (that could be replaced by Stublines). This is a common difficulty when modeling typical distribution grids. These types of systems can also be described as lumped. This makes the real-time simulation much more difficult and the parallelization technique must be implicit within the solver itself (i.e. no delay should be added to the model). The State-Space-Nodal method can provide a solution to this problem [3]. This is further discussed in Section IV.B.

III. THE IDE4L REFERENCE DISTRIBUTION GRID MODEL

This section describes the reference distribution grid model of the IDE4L project on which different parallelization techniques have been applied and compared. The grid is a 79 bus multi-phase network including numerous components of 10 different types, each with electrical and mechanical parts, various controllers and protection systems, to emulate the behavior of an active distribution grid. Details of the component models are explained in [2]. As shown in Fig. 1, the grid model includes four different voltage levels: HV (220 kV), MV (36 kV), LV (6.6 kV), and residential LV (0.4 kV).

1) HV section:

The HV section is a 6-bus network adopted from the Roy Billinton Transmission Test System (RBTS) with 50 MW wind farm generation added [8].

2) MV section:

The MV section is based on the IEEE 34 bus test feeder with the main difference being that three constant power loads (total of 110 kVA) are replaced by motor load models to incorporate motor loads dynamics, and two 1.5 MW wind farms are added to the middle and end of the feeder [9]. In addition, a circuit breaker supervised by an overcurrent protection relay and a three-phase recloser are added to the beginning and middle of the MV feeder, respectively.

3) LV and residential LV sections:

The LV and residential LV sections are based upon the IEEE 37 bus test feeder except that two constant power loads (total of 150 kVA) are replaced by motor load models, and two PV farm models (total of 1.05 MW), three residential PV system models (total of 0.77 MW), and two battery storage models (total of 325 Ah) are added to the feeder [9]. In addition, two sets of single-phase reclosers are added to this section.

As can be inferred from the grid model description, it is quite a complex network that requires a high computational effort for real-time simulation. The approach used for the real-time simulation of the grid model is discussed in the next section.

IV. IMPROVED MODEL CONFIGURATION FOR REAL-TIME SIMULATION

This section describes the delay-based approach used for the real-time simulation of the reference distribution grid model and introduces the SSN-based approach that reduces computational requirements and also removes unwanted dynamics from the simulation results.

A. Parallel Computation Using Propagation Delays

The preliminary approach used in [2] to comply with real-time simulation constraints (i.e. in this case 100 μs), is the classical technique based on the use of the propagation delays that are embedded in lines models. Using this approach, the grid model is distributed into total of 11 cores of an OPAL-RT simulator (HV section into 2 cores, MV section into 4 cores, and LV and residential LV sections into 5 cores), as shown in Fig. 2.

Splitting the grid model into 11 sub-models was done by using the ARTEMIS distributed parameters line block (based on Bergeron’s travelling wave line model) for the transmission line section 100-101 allowing for separation between cores 1 and 2, and Stubline blocks for the remainder of required separation points. The Stubline block, modeled based on [10], implements an N-phase transmission line model with exactly one time-step propagation delay. Therefore, it is possible to decouple the state-space equation system of the network on both sides. The main reason to use Stubline blocks for parallelization is that distribution feeders are too short to be modeled as Bergeron lines.

Model splitting at distribution feeder sections 816-824, 854-852, 858-834, 701-702, 713-704, 703-730, and 733-734, is carried out by modeling part of these feeder sections by Stubline blocks. The principle behind such a modeling is to deduct an equal part of the feeder impedance from all three phases so that the deducted part represents a balanced line section and it can be modeled by a Stubline block. The Stubline block can also be used in combination with a transformer model to form a component called Stubline Transformer. The Stubline Transformer exhibits a decoupling delay between its primary and secondary sides. The point of building such a component is to move the secondary winding leakage inductance and resistance to a Stubline block that is in series with the winding itself. Using this principle, model splitting at transformer sections 104-106 and 832-888 was carried out by using Stubline Transformer blocks.
Despite the advantage for parallelization of short lines, the Stubline block has a negative impact on modeling accuracy as it modifies the impedance of the network at the point of insertion, typically by adding an artificial capacitance \([5]\). For instance, the Stubline blocks, used in the LV section of the reference distribution grid model, add a total 249.998 \(\mu F\) artificial capacitance, as shown in Fig. 2. Hence, the use of Stubline blocks for parallelization is not considered an ideal solution. The negative impact of the added capacitance will be better shown in Section IV.E on sample simulation results from the reference distribution grid model.

**B. Parallel Computation Using the Delay-Free SSN Solver**

This section presents a new simulation configuration for real-time simulation of the IDE4L reference distribution grid model in which parallelization is achieved using the SSN solver. SSN is a real-time delay-free solver based on the well-known nodal admittance algorithm with some additional features. Details of the solver can be found in [3].

Following the SSN approach, the grid model was partitioned into 11 SSN groups (HV section into 2 groups, MV section into 4 groups, and LV and residential LV sections into
Fig. 2. Parallelization of the IDE4L reference distribution grid model using Stublines.

5 groups), as shown in Fig. 3. In this way, for each section, the SSN solver is able to use threaded process to compute the groups in parallel without any delay. Hence, there is no need to use Stubline blocks, adding artificial capacitances, for parallelization at short lines. Note that only the two Stubline blocks, used as Stubline Transformers at sections 104-106 and 832-888, are kept. This is because the existing leakage inductance of typical power system transformers is large enough to produce a small equivalent capacitance in the Stubline. Therefore, this usage of Stubline does not create modeling inaccuracy and can be used with any solver, including SSN.

The next three sections compare the two different parallelization configurations, used for real-time simulation of the reference grid model, in terms of computational burden and modeling accuracy.

C. Computational Burden

Table I compares the two parallelization techniques, discussed in the previous two sections, in terms of computational requirements.

As shown in Fig. 2 and 3, while the delay-based model setup needs to be distributed into 11 cores, the SSN-based one requires 7 cores to comply with the same real-time simulation constraints. Note that since the SSN-based setup requires fewer number of cores, it can run on a less costly hardware setup.

In addition, Table I compares the maximum computation time for each time-step for both parallelization techniques while running on Intel-Xeon Processor-E5-2687W (Xeon V3) cores. Running the grid model with the SSN-based parallelization on Xeon V3 cores has resulted in computation times of 17.58 μs, 50.29 μs, and 45.17 μs for HV, MV, and LV sections, respectively.

<table>
<thead>
<tr>
<th>TABLE I. COMPUTATIONAL REQUIREMENTS OF THE MODEL PARALLELIZATIONS</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Maximum Computation Time per Time-Step (μs-microseconds)</strong></td>
</tr>
<tr>
<td>---------------------------------------------------------------</td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td><strong>Total Number of Cores for Real-Time Simulation</strong></td>
</tr>
</tbody>
</table>
### D. SSN In-Step Parallelization Effectiveness

The SSN performance, shown in Table I, was obtained by allocating 3 cores to each of the MV and LV networks. However, the parallelization effectiveness of the SSN solver can be further evaluated by comparing the maximum computation time (as shown in Table I) for different core allocations versus the “serial SSN” case, i.e. allocating only 1 core to each SSN solution.

Table II compares the real-time performance (i.e. the maximum time to compute 1 time-step of a complete simulation) of the LV and MV networks against the number of cores (nb_core) allocated to each one of them. As shown in the table, the in-step parallelization of the SSN solver improves the performance by a factor close to 1.4 at best.

Neglecting the LU factorization part of the SSN solution, one would expect an improvement factor equal to nb_core; however, as the table shows, we are far from it experimentally. In addition, when nb_core = 4, the performance declines and it becomes even worse than the “serial SSN” case (nb_core = 1). This is due to several reasons. First of all, the LU factorization is not negligible because it is in fact an $O(n^3)$ operation where $n$ is the number of nodes and is subsequently the order of nodal admittance matrices to be solved. Note that the LV and MV networks have 15 SSN nodes each.

Another important reason is that all cores share the same memory map, including L3 cache, which is an important part of today’s processors like Intel Xeon. In some cases (and we believe this is one of them), this joint memory mapping will create so-called “cache trashing” (i.e., processes overwrite cache locations of other processes) that causes access to some

---

**Table I:**

<table>
<thead>
<tr>
<th>No. of EMTP nodes</th>
<th>No. of SSN nodes</th>
</tr>
</thead>
<tbody>
<tr>
<td>HV</td>
<td>70</td>
</tr>
<tr>
<td>MV</td>
<td>241</td>
</tr>
<tr>
<td>LV &amp; RLV</td>
<td>269</td>
</tr>
</tbody>
</table>

**Table II:**

<table>
<thead>
<tr>
<th>nb_core</th>
<th>LV real-time performance (s)</th>
<th>MV real-time performance (s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>10.33</td>
<td>10.36</td>
</tr>
<tr>
<td>2</td>
<td>5.16</td>
<td>5.18</td>
</tr>
<tr>
<td>3</td>
<td>3.45</td>
<td>3.47</td>
</tr>
<tr>
<td>4</td>
<td>2.62</td>
<td>2.64</td>
</tr>
</tbody>
</table>

**Fig. 3.** Parallelization of the IDE4L reference distribution grid model using SSN. Circled numbers represent 3 SSN nodes.
data in the slower main memory. Cache systems are automated low-level µprocessor codes that cannot be programmed by users. Nevertheless, some high-level coding styles can lead to better cache performance and this is an on-going work at Opal-RT Technologies.

### Table II

<table>
<thead>
<tr>
<th>nb_core</th>
<th>MV Section</th>
<th>LV Section</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>62.5</td>
<td>71.9</td>
</tr>
<tr>
<td>2</td>
<td>49.0</td>
<td>53.5</td>
</tr>
<tr>
<td>3</td>
<td>45.17</td>
<td>50.2</td>
</tr>
<tr>
<td>4</td>
<td>82.9</td>
<td>61.4</td>
</tr>
</tbody>
</table>

#### E. Modeling Accuracy

In order to assess the impact of Stublines on modeling accuracy, two different tests have been performed on both SSN-based and delay-based setups of the reference distribution grid model. In addition, in order to have a reference response, the grid model was simulated on a computer using pure SimPowerSystems blocks (i.e. all blocks, required for real-time simulation setup, e.g. Stublines, were removed from the model).

1) Impact on fault current profile

Figs. 4 and 5 compare the fault current profiles (RMS values) of phase 'a' for a three-phase bolted fault at buses 709 (in the LV section) and 812 (in the MV section), respectively. Note that in both tests, the feeder overcurrent protection is disabled so that permanent faults could be simulated.

As the figures show, the fault current levels of the two different model setups are quite similar. However, the delay-based model parallelization results in larger transients during the first milli-seconds after the fault occurrence. This effect is more visible at bus 709 as the fault current flows through more Stublines on its way, compared to when the fault occurs at bus 812.

In addition, as shown in Fig. 5, the delay-based model depicts unrealistic dynamics that are not present in the responses obtained from SSN-based model and off-line simulation. The impact of Stubline blocks on system dynamics is further discussed in the next test.

Fig. 4. Fault current profile of phase ‘a’ for a three-phase bolted fault at bus 709.

Fig. 5. Fault current profile of phase ‘a’ for a three-phase bolted fault at bus 812.

2) Impact on steady state operating point and system dynamics

Fig. 6 shows the voltage of the MV section for a test scenario where all motor loads (total of 260 kVA) are switched on at t = 30 s and a 6-cycle three-phase bolted fault occurs on the MV feeder at bus 858 at t = 50 s. After the fault is applied at t = 50 s, the three-phase MV recloser, installed at the feeder section 832-858, detects and isolates the fault within 2 cycles. The fault is cleared during the first open interval of the recloser.

As shown in Fig. 6, during quasi-steady state operation (i.e. before the fault occurrence) the two model setups show quite similar responses. However, after the fault occurrence, a sequence of disconnection and reconnection of loads and distributed generation occurs before the grid returns to normal operation. Although these dynamics look normal due to the operation of voltage-based self-protection at loads and distributed generators, the delay-based model setup depicts drastic voltage variations which are not present in the dynamic response of the SSN-based model setup.

As shown in Fig. 6, the off-line simulation result confirms the modeling accuracy of the SSN-based parallelization technique. This implies that the artificial capacitances, added by the Stubline blocks, adversely impact the dynamic response of the system.

### V. CONCLUSIONS AND FUTURE WORK

The paper showed that the classical delay-based parallelization technique is not adequate for real-time simulation of distribution grids. This has been shown by performing different tests on an active distribution grid model. The SSN algorithm, as a delay-free solver, was used to overcome modeling inaccuracies, created by the delay-based technique. In addition, the paper showed that the SSN solver has lower computational requirements, therefore a less costly hardware setup is needed.

As future work, the SSN version of the reference distribution grid model of the IDE4L project, presented in this paper, will be included as a demo in the next release of ARTEMiS [4].
Fig. 6. System voltage (phase ‘a’) of the MV section obtained from the delay-based real-time, SSN-based real-time, and off-line simulations.

REFERENCES

[1] EU-FP7 IDE4L project official website at http://www.ide4l.eu


