How Pytorch tensors’ backward()
accumulates gradient
I was not sure what “accumulated” mean exactly for the behavior of pytorch tensors'backward()
method and .grad
attribute mentioned here:
torch.Tensor
is the central class of the package. If you set its attribute.requires_grad
asTrue
, it starts to track all operations on it. When you finish your computation you can call.backward()
and have all the gradients computed automatically. The gradient for this tensor will be accumulated into.grad
attribute.
Here’s some code to illustrate. Define an input tensor x
with value 1
and tell pytorch that I want it to track the gradients of x
.
import torchx = torch.ones(1, requires_grad=True); x
Output:
tensor([ 1.])
Define two tensors y
and z
that depends on x
.
y = x**2
z = x**3
See how x.grad
is accumulated from y.backward()
then z.backward()
: first 2
then 5 = 2 + 3
, where 2
comes from dy/dx=2x=2 (evaluated at x=1)and 3 comes from dz/dx=3x**2=3 (evaluated at x=1).
y.backward()x.grad
Output:
tensor([ 2.])
Run:
z.backward()x.grad
(Note 5 = 2 + 3.) Output:
tensor([ 5.])
Can switch y.backward()
and z.backward().
Now first 3
then 5 = 3 + 2
.
x = torch.ones(1, requires_grad=True)
y = x**2
z = x**3z.backward()x.grad
Output:
tensor([ 3.])
Run:
y.backward()x.grad
Output:
tensor([ 5.])
Can set x.grad
back to 0 after y.backward()
so that it does not accumulate.
x = torch.ones(1, requires_grad=True)
y = x**2
z = x**3y.backward()x.grad
Output:
tensor([ 2.])
Run:
x.grad.zero_()
Output:
tensor([ 0.])
Run:
z.backward()x.grad
Output:
tensor([ 3.])
Can also look at a vectorized version (y
below is a vector instead of separate y
and z
)
vector version
Run:
x = torch.rand((2, 1), requires_grad = True); x
Output:
tensor([[ 0.3725],
[ 0.4378]])
Run:
y = torch.zeros(3, 1)y[0] = x[0]**2
y[1] = x[1]**3
y[2] = x[1]**4y.backward(gradient=torch.ones(y.size()))
Cumulative grad of x[0]
and x[1]
respectively.
Run:
x.grad
Output:
tensor([[ 0.7450],
[ 0.9105]])
Now manually calculate the gradient and compare. Run:
2*x[0], 3*x[1]**2, 4*x[1]**3
Output:
(tensor([ 0.7450]), tensor([ 0.5749]), tensor([ 0.3356]))
Run:
2*x[0], 3*x[1]**2 + 4*x[1]**3
Output: (compare the gradient from pytorch above)
(tensor([ 0.7450]), tensor([ 0.9105]))
Code is here: http://nbviewer.jupyter.org/github/yang-zhang/yang-zhang.github.io/blob/master/ds_code/pytorch_grad_accum.ipynb