# How Pytorch tensors’ `backward()` accumulates gradient

I was not sure what “accumulated” mean exactly for the behavior of pytorch tensors'`backward()` method and `.grad` attribute mentioned here:

`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into `.grad` attribute.

Here’s some code to illustrate. Define an input tensor `x` with value `1` and tell pytorch that I want it to track the gradients of `x`.

`import torchx = torch.ones(1, requires_grad=True); x`

Output:

`tensor([ 1.])`

Define two tensors `y` and `z` that depends on `x`.

`y = x**2z = x**3`

See how `x.grad` is accumulated from `y.backward()` then `z.backward()` : first `2` then `5 = 2 + 3`, where `2` comes from dy/dx=2x=2 (evaluated at x=1)and 3 comes from dz/dx=3x**2=3 (evaluated at x=1).

`y.backward()x.grad`

Output:

`tensor([ 2.])`

Run:

`z.backward()x.grad`

(Note 5 = 2 + 3.) Output:

`tensor([ 5.])`

Can switch `y.backward()` and `z.backward().` Now first `3` then `5 = 3 + 2`.

`x = torch.ones(1, requires_grad=True)y = x**2z = x**3z.backward()x.grad`

Output:

`tensor([ 3.])`

Run:

`y.backward()x.grad`

Output:

`tensor([ 5.])`

Can set `x.grad` back to 0 after `y.backward()` so that it does not accumulate.

`x = torch.ones(1, requires_grad=True)y = x**2z = x**3y.backward()x.grad`

Output:

`tensor([ 2.])`

Run:

`x.grad.zero_()`

Output:

`tensor([ 0.])`

Run:

`z.backward()x.grad`

Output:

`tensor([ 3.])`

Can also look at a vectorized version (`y` below is a vector instead of separate `y` and `z`)

# vector version

Run:

`x = torch.rand((2, 1), requires_grad = True); x`

Output:

`tensor([[ 0.3725],        [ 0.4378]])`

Run:

`y = torch.zeros(3, 1)y[0] = x[0]**2y[1] = x[1]**3y[2] = x[1]**4y.backward(gradient=torch.ones(y.size()))`

Cumulative grad of `x[0]` and `x[1]` respectively.

Run:

`x.grad`

Output:

`tensor([[ 0.7450],        [ 0.9105]])`

Now manually calculate the gradient and compare. Run:

`2*x[0], 3*x[1]**2, 4*x[1]**3`

Output:

`(tensor([ 0.7450]), tensor([ 0.5749]), tensor([ 0.3356]))`

Run:

`2*x[0], 3*x[1]**2 + 4*x[1]**3`

Output: (compare the gradient from pytorch above)

`(tensor([ 0.7450]), tensor([ 0.9105]))`

