How to Find Keywords in Texts Easily with Python, KeyBERT, and Machine Learning#
KeyBERT is a minimal and efficient keyword extraction technique that leverages BERT embeddings. It can be used to extract keywords from text and is particularly useful for summarizing and categorizing large datasets. Digital humanists may find KeyBERT beneficial in their research for understanding key themes, characters, or ideas in textual data.
Video#
Why use KeyBERT?#
KeyBERT allows individuals to easily capture the keywords of a document with minimal code.
How does it work?#
KeyBERT works by taking an input text. It then vectorizes the text with a BERT model of your choosing. Next, it compares the vector of each word in the text to the vector of the document as a whole. In doing so, KeyBERT effectively has a measure of how similar ran individual word is to the document; in other words, it knows the relevance of each word to the document as whole. Like other keyword extraction methods, this works with the presumption that the words that are most similar to the document are the keywords, or potential subjects of the text.
Installation and Setup#
Before using KeyBERT, you’ll need to install the package. You can install it using pip:
pip install keybert
Preparing the Texts#
For this lesson, we’ll use three random sections of the book Dracula
by Bram Stoker.
text1 = """"
Presently the horses began to scream, and tore at their tethers till I
came to them and quieted them. When they did feel my hands on them, they
whinnied low as in joy, and licked at my hands and were quiet for a
time. Many times through the night did I come to them, till it arrive to
the cold hour when all nature is at lowest; and every time my coming was
with quiet of them. In the cold hour the fire began to die, and I was
about stepping forth to replenish it, for now the snow came in flying
sweeps and with it a chill mist. Even in the dark there was a light of
some kind, as there ever is over snow; and it seemed as though the
snow-flurries and the wreaths of mist took shape as of women with
trailing garments. All was in dead, grim silence only that the horses
whinnied and cowered, as if in terror of the worst. I began to
fear--horrible fears; but then came to me the sense of safety in that
ring wherein I stood. I began, too, to think that my imaginings were of
the night, and the gloom, and the unrest that I have gone through, and
all the terrible anxiety. It was as though my memories of all Jonathan’s
horrid experience were befooling me; for the snow flakes and the mist
began to wheel and circle round, till I could get as though a shadowy
glimpse of those women that would have kissed him. And then the horses
cowered lower and lower, and moaned in terror as men do in pain. Even
the madness of fright was not to them, so that they could break away. I
feared for my dear Madam Mina when these weird figures drew near and
circled round. I looked at her, but she sat calm, and smiled at me; when
I would have stepped to the fire to replenish it, she caught me and held
me back, and whispered, like a voice that one hears in a dream, so low
it was:--
"""
text2 = """"
I only slept a few hours when I went to bed, and feeling that I could
not sleep any more, got up. I had hung my shaving glass by the window,
and was just beginning to shave. Suddenly I felt a hand on my shoulder,
and heard the Count’s voice saying to me, “Good-morning.” I started, for
it amazed me that I had not seen him, since the reflection of the glass
covered the whole room behind me. In starting I had cut myself slightly,
but did not notice it at the moment. Having answered the Count’s
salutation, I turned to the glass again to see how I had been mistaken.
This time there could be no error, for the man was close to me, and I
could see him over my shoulder. But there was no reflection of him in
the mirror! The whole room behind me was displayed; but there was no
sign of a man in it, except myself. This was startling, and, coming on
the top of so many strange things, was beginning to increase that vague
feeling of uneasiness which I always have when the Count is near; but at
the instant I saw that the cut had bled a little, and the blood was
trickling over my chin. I laid down the razor, turning as I did so half
round to look for some sticking plaster. When the Count saw my face, his
eyes blazed with a sort of demoniac fury, and he suddenly made a grab at
my throat. I drew away, and his hand touched the string of beads which
held the crucifix. It made an instant change in him, for the fury passed
so quickly that I could hardly believe that it was ever there.
"""
text3 = """"
I knew that there were at least three graves to find--graves that are
inhabit; so I search, and search, and I find one of them. She lay in her
Vampire sleep, so full of life and voluptuous beauty that I shudder as
though I have come to do murder. Ah, I doubt not that in old time, when
such things were, many a man who set forth to do such a task as mine,
found at the last his heart fail him, and then his nerve. So he delay,
and delay, and delay, till the mere beauty and the fascination of the
wanton Un-Dead have hypnotise him; and he remain on and on, till sunset
come, and the Vampire sleep be over. Then the beautiful eyes of the fair
woman open and look love, and the voluptuous mouth present to a
kiss--and man is weak. And there remain one more victim in the Vampire
fold; one more to swell the grim and grisly ranks of the Un-Dead!...
"""
Usage#
Now that we have our texts, let’s go ahead and import the KeyBERT
class.
from keybert import KeyBERT
Now, we can instantiate the class. By default, we will use the all-MiniLM-L6-v2
transformer. You can use others available from the sentence-transformers
library. You can even use models from Flair, Gensim and spaCy. Since we will not be passing any keyword arguments, we are using the all-MiniLM-L6-v2
in the example below.
model = KeyBERT()
With our model loaded, we can now extract the keywords. To do this, we only need to pass the text to the model.
model.extract_keywords(text1)
[('horses', 0.4038),
('gloom', 0.3456),
('fears', 0.3123),
('snow', 0.3074),
('fright', 0.2953)]
Pretty cool, right? Our output is a list of tuples. Each tuple has a keyword and its similarity score to the document. We can even pass a keyword argument: keyphrase_ngram_range
. This let’s us specify if we will allow ngrams, such as bigrams or trigramsm. It expects a tuple with the lower ngram and the higher ngram.
model.extract_keywords(text1, keyphrase_ngram_range=(1, 3))
[('horses began scream', 0.4998),
('grim silence horses', 0.4989),
('horses whinnied cowered', 0.4751),
('silence horses whinnied', 0.4713),
('horses whinnied', 0.4546)]
We can also pass stop words to the model to remove them. Stop words are words that are very ubiquitous and sometimes throw off models.
model.extract_keywords(text1, keyphrase_ngram_range=(1, 2), stop_words="english")
[('horses whinnied', 0.4546),
('silence horses', 0.4493),
('horses began', 0.4117),
('horses', 0.4038),
('horses cowered', 0.3993)]
To visualize the data, we can pass the keyword argument highlight
. This is a great way to visualize the data in a notebook.
model.extract_keywords(text2, highlight=True)
only slept few hours when went to bed and feeling that could not sleep any more got up had hung my shaving glass by the window and was just beginning to shave Suddenly felt hand on my shoulder and heard the Count voice saying to me Good morning started for it amazed me that had not seen him since the reflection of the glass covered the whole room behind me In starting had cut myself slightly but did not notice it at the moment Having answered the Count salutation turned to the glass again to see how had been mistaken This time there could be no error for the man was close to me and could see him over my shoulder But there was no reflection of him in the mirror The whole room behind me was displayed but there was no sign of man in it except myself This was startling and coming on the top of so many strange things was beginning to increase that vague feeling of uneasiness which always have when the Count is near but at the instant saw that the cut had bled little and the blood was trickling over my chin laid down the razor turning as did so half round to look for some sticking plaster When the Count saw my face his eyes blazed with sort of demoniac fury and he suddenly made grab at my throat drew away and his hand touched the string of beads which held the crucifix It made an instant change in him for the fury passed so quickly that could hardly believe that it was ever there
[('shave', 0.3691),
('shaving', 0.3665),
('startling', 0.3437),
('slept', 0.3282),
('saw', 0.2966)]