We want specific types of information from data, like name, organization, quantities, date, designation, subject, etc.

v45.jpg

I worked on a project in the public sector, they wanted an interface in which the user would enter the name, and the NER script would return the related information along with the IDs of individual documents where that name has been used.

Extracting specific information from the dataset.

Here is simple code to implement NER in Python using NLTK →

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk import pos_tag, ne_chunk

nltk.download('punkt')
nltk.download('maxent_ne_chunker')
nltk.download('words')

paragraph = """
Barack Obama was born on August 4, 1961, in Honolulu, Hawaii. He was the 44th President of the United States and served two terms from 2009 to 2017.
Obama graduated from Columbia University and Harvard Law School. His wife, Michelle Obama, was the First Lady of the United States.
"""

sentences = sent_tokenize(paragraph)

def ner_on_sentence(sentence):
    tokens = word_tokenize(sentence)
    pos_tags = pos_tag(tokens)
    named_entities = ne_chunk(pos_tags)
    return named_entities

def extract_entities(tree):
    entities = []
    if hasattr(tree, 'label'):
        if tree.label() in ['PERSON', 'ORGANIZATION', 'GPE', 'FACILITY', 'GSP']:
            entity = ' '.join(c[0] for c in tree)
            entities.append((entity, tree.label()))
        else:
            for child in tree:
                entities.extend(extract_entities(child))
    return entities

for sentence in sentences:
    named_entities = ner_on_sentence(sentence)
    entities = extract_entities(named_entities)
    for entity, label in entities:
        print(f'{entity} - {label}')

The output will look like this →

Barack Obama - PERSON
Honolulu - GPE
Hawaii - GPE
President - ORGANIZATION
United States - GPE
Obama - PERSON
Columbia University - ORGANIZATION
Harvard Law School - ORGANIZATION
Michelle Obama - PERSON
First Lady - ORGANIZATION
United States - GPE

GPE → Geo-Political Entity, ORGANIZATION, PERSON