What is Summarization?
Text summarization simply is the process of summarizing a block of text in order to make it shorter.
Generative AI models like ChatGPT, GPT-3.5, GPT-4, LLaMA 3, Yi 34B, and Mixtral 8x7B, are very good at performing text summarization.
Let's say you have the following block of text:
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and
the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side.
During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest
man-made structure in the world, a title it held for 41 years until the Chrysler Building in New
York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to
the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the
Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second
tallest free-standing structure in France after the Millau Viaduct.
This technical description is quite long and maybe not all these details are necessary for a common
reader to grasp the general idea. So we now want to leverage machine learning in order to automatically
summarize this piece of text.
A summarization model would return something like this:
The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its
base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel
Tower surpassed the Washington Monument to become the tallest man-made structure in the world.
Interesting isn't it? As you can see, the general idea is still there, but tons of details were
stripped.
It makes the text half its initial size!
There are several types of summarizations. For example "headline generation" is about generating a
very
short sentence, perfectly suited for a blog or news title. "Dialogue summarization" is about converting
a whole dialogue into a condensed version.
"Extraction summarization" means that the summary is only made of sentences from the original text,
while "abstractive summarization" means that new content can be created in the summary.