
kai
Programming Machine Learning:
Programming Machine Learning
From Coding to Deep Learning
German edition
Chapter 11
Training the network
Page 191
def back(X, Y, y_hat, w2, h):
w2_gradient = np.matmul(prepend_bias(h).T, y_hat - Y) / X.shape[0]
a_gradient = np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)
w1_gradient = np.matmul(prepend_bias(X).T, a_gradient) / X.shape[0]
return (w1_gradient, w2_gradient)
Hi Paolo,
can you explain me how to come to the decision to multiply this expression:
np.matmul(y_hat - Y, w2[1:].T) * sigmoid_gradient(h)
element by element?
Sincerely,
Kai
Most Liked

kai
As I understand it, there is no recipe to derive compositions of functions, whose inputs are matrices, according to them, as it is the case with a scalar.
Thank you for giving me an answer that helped my understanding anyway.
I am fascinated by mathematics and its power. But to be able to see its beauty, it requires a lot of time. That is why I asked such a question.
Your book was recommended to me and declared very worthwhile, which I can confirm 100%. It is my introduction to ML. Thank you for it!
Translated with DeepL Translate: The world's most accurate translator (free version)
Popular Pragmatic topics










Other popular topics










Latest in PragProg
Latest (all)
Categories:
Popular Portals
- /elixir
- /rust
- /wasm
- /ruby
- /erlang
- /phoenix
- /keyboards
- /js
- /rails
- /python
- /security
- /go
- /swift
- /vim
- /clojure
- /java
- /haskell
- /emacs
- /svelte
- /onivim
- /typescript
- /crystal
- /c-plus-plus
- /tailwind
- /kotlin
- /gleam
- /react
- /flutter
- /elm
- /ocaml
- /vscode
- /opensuse
- /ash
- /centos
- /php
- /deepseek
- /scala
- /zig
- /html
- /debian
- /nixos
- /lisp
- /agda
- /sublime-text
- /textmate
- /react-native
- /kubuntu
- /arch-linux
- /revery
- /ubuntu
- /manjaro
- /django
- /spring
- /diversity
- /nodejs
- /lua
- /julia
- /slackware
- /c
- /neovim