How to Summarize Text With Transformer Models
Text summarization has many useful applications. For example, in 2013 a 17-year-old named Nick D'Aloisio sold a news summarization app called Summly to Yahoo for $30 million . Since then, text summarization has come a long way, and now you can implement state-of-the-art text summarization Transformer models with just a few lines of code.
Hugging Face launched a model distribution network that allows anyone to upload Transformer models. Currently, 234 text summarization models have been uploaded to the network, which you can view here. In this article, I will explain how to implement what is currently the most downloaded text summarization model called "sshleifer/distilbart-cnn-12-6." I will also discuss using a model called T5.
We will use an open-sourced Python package I am the lead maintainer of called Happy Transformer. Happy Transformer is available on PyPI and can be downloaded with simple pip command.
pip install happytransformer
Summarization is a "text-to-text" NLP task. This means the model produces original standalone text given a text input. So, we will import a class from Happy Transformer called HappyTextToText.
from happytransformer import HappyTextToText
As discussed, we will use the most downloaded summarization Transformer model that's available on Hugging Face's model distribution network. To do this, we will first create a HappyTextToText object. The first position argument is for the model type, so in this case, it is "DISTILBART." Other examples of model types include "BERT," "ROBERTA," and "T5." The second position argument is for the model's name.
happy_tt = HappyTextToText("DISTILBART", "sshleifer/distilbart-cnn-12-6")
To stay on topic, I copied the first two paragraphs from the Wikipedia page for Transformer models. We will be summarizing this text.
text = "A transformer is a deep learning model that adopts the mechanism of attention, differentially weighing the significance of each part of the input data. It is used primarily in the field of natural language processing (NLP) and in computer vision (CV) Like recurrent neural networks (RNNs), transformers are designed to handle sequential input data, such as natural language, for tasks such as translation and text summarization. However, unlike RNNs, transformers do not necessarily process the data in order. Rather, the attention mechanism provides context for any position in the input sequence. For example, if the input data is a natural language sentence, the transformer does not need to process the beginning of the sentence before the end. Rather, it identifies the context that confers meaning to each word in the sentence. This feature allows for more parallelization than RNNs and therefore reduces training times."
From here, we simply need to call happy_tt's generate_text() method to generate a summary for the text above.
result = happy_tt.generate_text(text)
Let's now print the result.
Result: TextToTextResult(text=' A transformer is a deep learning model that adopts the mechanism of attention...
As you can see, the output is a dataclass with a single variable called text. We can then isolate this variable as shown below.
Result: A transformer is a deep learning model that adopts the mechanism of attention . It is used primarily in the field of natural language processing (NLP) and computer vision (CV)
That's a great result!
By default, an algorithm called "greedy" is used to generate text. This algorithm simply selects the next most likely word. However, this algorithm is prone to repetition and also often generates uncreative text. So, you can visit Happy Transformer's documentation to learn how to implement different algorithms, such as top-k sampling and beam search. In this tutorial, we will discuss how to use top-k sampling.
If you want to learn about the theory behind different text generation algorithms, like top-k sampling, then I suggest you check out my latest course. This course covers how to implement and train a text generation model called GPT-Neo. It also covers how to create a web app to display the model with 100% Python.
First off, we need to import a class called TTSettings.
from happytransformer import TTSettings
Now, we can create a TTSettings object and set its parameters so that we'll use top-k sampling. Please visit Happy Transformer's documentation for more information on what the parameters mean and other ones to adjust.
top_k_sampling_settings = TTSettings(do_sample=True, top_k=50, temperature=0.7, max_length=50)
Let's call happy_gen's generate_text function and provide the object above to the args parameter.
result = happy_tt.generate_text(text, args=top_k_sampling_settings) print(result.text)
Result: A transformer is a deep learning model that adopts the mechanism of attention . It uses the attention mechanism to differentially weight significance of each part of the input data . It is used primarily in the field of natural language processing .
I'm going to briefly discuss using a model called T5 to summarize text. T5 is quite unique, as it is not fine-tuned specifically for summarization. Instead, it was fine-tuned for various text-to-text tasks and could perform any of them – all with a single model. Let's discuss performing summarization with T5. You can find more detail about the model here.
happy_t5 = HappyTextToText("T5", "t5-base")
input = "summarize: " + text
result = happy_t5.generate_text(input) print(result.text)
Result: transformers are deep learning models that adopt the mechanism of attention . they are designed to handle sequential input data for tasks such as translation . unlike recurrent neural networks, transformers do not necessarily process the data in order .
Pretty good eh!
You can also use T5 for other tasks like Translation, but that's a talk for another time.
And that's it! Go implement summarization features into your apps. Maybe you could create a news summarization app for a particular niche, or perhaps you could summarize lengthy books into something short that someone could read in an evening.
Stay happy everyone!
Subscribe to my YouTube channel for new videos on NLP.
Code from this article
Most recent course. It covers how to implement and train GPT-Neo and then create a web app with 100% Python to demonstrate it.
Book a Call
We may be able to help you or your company with your next NLP project. Feel free to book a free 15 minute call with us.