
kai
Programming Machine Learning:
Programming Machine Learning
From Coding to Deep Learning
German edition
Chapter 11
Training the network
Page 191
def back(X, Y, y_hat, w2, h):
w2_gradient = np.matmul(prepend_bias(h).T, y_hat - Y) / X.shape[0]
a_gradient = np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)
w1_gradient = np.matmul(prepend_bias(X).T, a_gradient) / X.shape[0]
return (w1_gradient, w2_gradient)
Hi Paolo,
can you explain me how to come to the decision to multiply this expression:
np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)
element by element?
Sincerely,
Kai
Most Liked

kai
As I understand it, there is no recipe to derive compositions of functions, whose inputs are matrices, according to them, as it is the case with a scalar.
Thank you for giving me an answer that helped my understanding anyway.
I am fascinated by mathematics and its power. But to be able to see its beauty, it requires a lot of time. That is why I asked such a question.
Your book was recommended to me and declared very worthwhile, which I can confirm 100%. It is my introduction to ML. Thank you for it!
Translated with DeepL Translate: The world's most accurate translator (free version)
Popular Prag Prog topics










Other popular topics










Latest in PragProg
Latest (all)
Categories:
Popular Portals
- /elixir
- /opensuse
- /rust
- /erlang
- /ruby
- /python
- /kotlin
- /clojure
- /react
- /go
- /swift
- /haskell
- /debian
- /v
- /gleam
- /crystal
- /agda
- /julia
- /ubuntu
- /adonisjs
- /angular
- /php
- /ash
- /neovim
- /deepseek
- /c
- /grails
- /c-sharp
- /nest
- /diversity
- /revery
- /rubymotion
- /perl
- /giraffe
- /prolog
- /flask
- /amber
- /kemal
- /yew
- /hamler
- /groovy
- /lapis
- /fika
- /java
- /ocaml
- /c-plus-plus
- /security
- /typescript
- /zotonic
- /deno
- /wasm
- /js
- /grisp
- /laravel
- /mysql
- /actix
- /lua
- /keyboards
- /wren
- /centos