Link to Library:

and trained using Amazon product data.

This library supports

  • Predicting categories using the pretrained model.
  • Training from scratch, with a transformers model as the starting point.
  • Transfer learning from the pretrained model.

Pretrained model

Language Modeling Example with Pytorch Lightning and 🤗 Huggingface Transformers.

Language modeling fine-tuning adapts a pre-trained language model to a new domain and benefits downstream tasks such as classification. The script here applies to fine-tuning masked language modeling (MLM) models include ALBERT, BERT, DistilBERT and RoBERTa, on a text dataset. Details about the models can be found in Transformers model summary.

The Transformers part of the code is adapted from examples/language-modeling/ Finetuning causal language modeling (CLM) models can be done in a similar way, following

PyTorch Lightning is “The lightweight PyTorch wrapper for high-performance AI research. Scale your models…

I’ve converted LaBSE model weights to Pytorch model weight and shared on

LaBSE is from Language-agnostic BERT Sentence Embedding by Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, and Wei Wang of Google AI.

Abstract from the paper

We adapt multilingual BERT to produce language-agnostic sen- tence embeddings for 109 languages. While English sentence embeddings have been obtained by fine-tuning a pretrained BERT model, such models have not been applied to multilingual sentence embeddings. Our model combines masked language model (MLM) and translation language model (TLM) pretraining with a translation ranking task using bi-directional dual encoders. The resulting multilingual sentence embeddings improve average…

Pytorch’s Tensor class has a storage() and a stride() method. They are not very often used directly, but can sometimes be helpful when you need to take a closer look at the underlying data. (I’ll show an example of using them to illustrate the difference between Tensor.expand() and Tensor.repeat() at the end.)

As explained in Pytorch’s document, storage() simply “returns the underlying storage”, which is relatively straightforward.

But the explanation for stride from Pytorch’s document a bit difficult for me to understand:

Each strided tensor has an associated torch.Storage, which holds its data. These tensors provide multi-dimensional, strided view of…

This post is some examples for the gradient argument in Pytorch's backward function. The math of backward(gradient) is explained in this tutorialand these threads (thread-1, thread-2), along with some examples. Those were very helpful, but I wish there were more examples on how the numbers in the example correspond to the math, to help me more easily understand. I could not find many such examples so I will make some and write them here, so that I can look back when I forget this in two weeks.

In the examples, I run code in torch, write down the math, and…

cross-entropy loss at different probabilities for the correct class

Cross-entropy loss is used for classification machine learning models. Often, as the machine learning model is being trained, the average value of this loss is printed on the screen. But it is not always obvious how good the model is doing from the looking at this value.

The formula of cross entropy in Python is

def cross_entropy(p):
return -np.log(p)

where p is the probability the model guesses for the correct class.

For example, for a model that classifies images as an apple, an orange, or an onion, if the image is an apple and the model predicts probabilities {“apple”: 0.7…

Here are some notes on setting up docker for Kaggle (especially on installing and enabling nbextensions). I had to do this from time to time and wanted to write the steps down for the record. I put it here in case it’s useful for someone else.

Why use docker for Kaggle

Scaling in dropout

For several times I confused myself over how and why a dropout layer scales its input. I’m writing down some notes before I forget again.

Link to Jupyter notebook:

In Pytorch doc it says:

Furthermore, the outputs are scaled by a factor of 1/(1-p) during training. This means that during evaluation the module simply computes an identity function.

So how is this done and why? Let’s look at some code in Pytorch.

Create a dropout layer m with a dropout rate p=0.4:

import torchimport numpy as npp = 0.4m = torch.nn.Dropout(p)

As explained in Pytorch doc:

This post is an abstract of a Jupyter notebook containing a line-by-line example of a multi-task deep learning model, implemented using the fastai v1 library for PyTorch. This model takes in an image of a human face and predicts their gender, race, and age.

The notebook wants to show:

  1. an example of a multi-task deep learning model;
  2. the multi-task model makes better predictions than the individual model; and
  3. how to use the fastai library to easily implement the model.

The Jupyter notebook is working and runnable, so you can run and change the code if you like (at least it’s…

Simple example of multi-label classification using fastai v1.

%reload_ext autoreload
%autoreload 2

from fastai import *
from import *

In [43]:

path = untar_data(URLs.PLANET_SAMPLE)



In [44]:

data = ImageDataBunch.from_csv(path, folder='train', sep=' ', suffix='.jpg', ds_tfms=get_transforms(), tfms=imagenet_norm, size=224)
img,labels = data.valid_ds[-1]" ".join(np.array(data.classes)[labels.astype(bool)]))

Train last layer:

learn = ConvLearner(data, models.resnet34, metrics=Fbeta(beta=2))
Total time: 00:10
epoch train loss valid loss fbeta
1 0.748020 0.649659 0.365789 (00:10)

Unfreeze and finetune:

Total time: 00:10
epoch train loss valid loss fbeta
1 0.621923 0.705165 0.380751 (00:10)

Yang Zhang

Software Engineering SMTS at Salesforce Commerce Cloud Einstein

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store