Dear insight-users, are there any schemes or figures about gradient descent and regular step gradient descent optimizers? ...or some references to papers where one could find out how the step sizes are exactly computed? Thanks a lot. Daniel Schwarz