`Threaded(balance=true/false, show_distribution=true/false)`
balance=true
is an attempt to implement load balancing w.r.t. the total workload (as determined by the sizes of the submatrices) each thread has to handle.
show_distribution
can be used to see how many submatrices each thread is processing and what their total computational weight is.