https://gist.github.com/aeyakovenko/bf558697a0b3f377f9e8 so i am seeing basically results with N4 that are as good as using sequential computation on my macbook for the matrix multiply algorithm. any idea why? Thanks, Anatoly