NER (Entity Extraction)
For large inputs (above 128,000 tokens) you will need to use the asynchronous mode: see more in the documentation.
NER (Named Entity Recognition)
What is NER?
NER is the most common technique used in production by projects that want to leverage NLP today.
It solves a very interesting challenge: extracting entities from a block of text.
Let's say you have the following sentence:
John Doe is a web developer at Google.
You would like to automatically detect that "John Doe" is a name, "web developer" is a job title, and "Google" is a company. And this is exactly what NER is going to do.
Why Use NER?
The world is full of unstructured data, especially the web. Being able to extract structured information from it can give access to a lot of valuable information. Here are a couple of examples.
Sort Customer Requests
When dealing with lots of customer requests (support, sales, ...) it definitely helps to apply NER in order to automatically sort these incoming requests. For example you could automatically extract the type of product mentioned in the request and route this to the right service accordingly.
Extract Financial Data
Extracting and consolidating financial data can be long and tedious. NER can definitely boost your productivity here by helping you extract the right data in a second.
Pre-process Resumes/Applications
HR services are sometimes having a hard time reading all these applications. It can be interesting for them to automatically highlight interesting entities like company names, skills,... in order to save time.
Extract Leads
Many B2B leads can be found on public websites or company brochures, but extracting them manually can sometimes be a pain. Thanks to NER you can automatically extract a person, with her jobtitle, and company, if they exist.
Use GPU
Control whether you want to use the model on a GPU. Machine learning models run much faster on GPUs.
Searched Entity
Only applies to GPT models, so it will be ignored if you're using spaCy. This is the entity you are looking for. You can use anything, like `positions`, `countries`, `programming languages`, `frameworks`, `restaurant names`... If you use a singular you will be more likely to get one single result, while if you use a plural the model will try to extract several entities from the text.