How Pytorch tensors’ backward() accumulates gradient

Yang Zhang
2 min readMay 28, 2018

I was not sure what “accumulated” mean exactly for the behavior of pytorch tensors'backward() method and .grad attribute mentioned here:

torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

Here’s some code to illustrate. Define an input tensor x with value 1 and tell pytorch that I want it to track the gradients of x.

import torchx = torch.ones(1, requires_grad=True); x

Output:

tensor([ 1.])

Define two tensors y and z that depends on x.

y = x**2
z = x**3

See how x.grad is accumulated from y.backward() then z.backward() : first 2 then 5 = 2 + 3, where 2 comes from dy/dx=2x=2 (evaluated at x=1)and 3 comes from dz/dx=3x**2=3 (evaluated at x=1).

y.backward()x.grad

Output:

tensor([ 2.])

Run:

z.backward()x.grad

(Note 5 = 2 + 3.) Output:

tensor([ 5.])

Can switch y.backward() and z.backward(). Now first 3 then 5 = 3 + 2.

x = torch.ones(1, requires_grad=True)
y = x**2
z = x**3
z.backward()x.grad

Output:

tensor([ 3.])

Run:

y.backward()x.grad

Output:

tensor([ 5.])

Can set x.grad back to 0 after y.backward() so that it does not accumulate.

x = torch.ones(1, requires_grad=True)
y = x**2
z = x**3
y.backward()x.grad

Output:

tensor([ 2.])

Run:

x.grad.zero_()

Output:

tensor([ 0.])

Run:

z.backward()x.grad

Output:

tensor([ 3.])

Can also look at a vectorized version (y below is a vector instead of separate y and z)

vector version

Run:

x = torch.rand((2, 1), requires_grad = True); x

Output:

tensor([[ 0.3725],
[ 0.4378]])

Run:

y = torch.zeros(3, 1)y[0] = x[0]**2
y[1] = x[1]**3
y[2] = x[1]**4
y.backward(gradient=torch.ones(y.size()))

Cumulative grad of x[0] and x[1] respectively.

Run:

x.grad

Output:

tensor([[ 0.7450],
[ 0.9105]])

Now manually calculate the gradient and compare. Run:

2*x[0], 3*x[1]**2, 4*x[1]**3

Output:

(tensor([ 0.7450]), tensor([ 0.5749]), tensor([ 0.3356]))

Run:

2*x[0], 3*x[1]**2 + 4*x[1]**3

Output: (compare the gradient from pytorch above)

(tensor([ 0.7450]), tensor([ 0.9105]))

Code is here: http://nbviewer.jupyter.org/github/yang-zhang/yang-zhang.github.io/blob/master/ds_code/pytorch_grad_accum.ipynb

--

--