Performance is up from 10 times slower than MC to only 3 times. About a third of that time is spent calculating gradients.