We present algorithms for the solution of banded systems of equations on parallel architectures, in particular ensemble architectures, i.e., architectures that have a large number of processing elements. Each processor has its own local storage. The band is considered dense. Concurrent elimination of a single variable yields a linear speed-up for ensembles configured as tori, or Boolean cubes if N>m, with a maximum ensemble size of m(m+R) (or 2m(m+R)) processors for a banded system of N equations, bandwith 2m + 1 and R right hand sides. The minimum attainable computational complexity is of order O(N). Concurrent elimination of multiple variables as well as concurrent elemination of each such variable yields a minimum complexity of O(m+ m log2N/m) for a total of (2m + R)N ensemble nodes. To attain this complexity the ensemble should be configured as clusters, each in the form of a torus of dimension m by 2m + R, or a Booelan cube of appropiate dimension. Furthermore, corresponding processors in different clusters are assumed to be interconnected to form a binary tree, shuffle-exchange, perfect shuffle, or Boolean cube network. The number of clusters should be of order 0(N/m) for minimum computational comlpexity