In the video titled. ’loss plot’
The loss[ epoch] = mean_squared_error(sigmoid( X) ,Y)was taken which computes loss for a single record.
It’s loss[ epoch] =mean_squared_error(predict(X),Y) which computes loss for the entire data
Please consider this and correct me if I'm wrong