iOSProgrammingSwift

Natural Language Processing in Apple Ecosystem

Natural Language Processing is an important part of Machine Learning. In this post, I’m going through the most popular NLP features provided by SDK on iOS, iPadOS, and MacOS.

Text preprocessing

When you work with natural language text, it’s often useful to do some preprocessing before you will start using some sophisticated Deep Learning Models. In this case, Apple offers few very useful tools, let’s talk about them:

Language Identification

let recognizer = NLLanguageRecognizer()
recognizer.processString("订阅我的频道")
if let lang = recognizer.dominantLanguage {
    print(lang.rawValue)
}
let hypotheses = recognizer.languageHypotheses(withMaximum:2) 
for (key, probability) in hypotheses {
    print("\(key.rawValue) - \(probability)")
}

Tokenization

Using NLTokenizer to enumerate words, rather than simply splitting components by whitespace, ensures correct behavior in multiple scripts and languages. For example, neither Chinese nor Japanese uses spaces to delimit words.

let tokenizer = NLTokenizer(unit: .word)
let text = "订阅我的频道"
tokenizer.string = text
tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { tokenRange, _ in
    print(text[tokenRange])
    return true
}

Lemmatization

Lemmatization is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, Apple NLP Framework allows converting a word into its base form. For example “worked” and “working” are different flavors of the word “work” and have the same core meaning.

let text = "I was running yesterday. My legs hurt a lot."
let tagger = NLTagger(tagSchemes: [.lemma])
tagger.string = text
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lemma) { tag, tokenRange in
    if let tag = tag {
        print("\(text[tokenRange]) - \(tag.rawValue)")
    }
    return true
}

Sentiment Analysis

The range of a sentiment score is [-1.0, 1.0]. A score of 1.0 is the most positive, a score of -1.0 is the most negative, and a score of 0.0 is neutral.

let tagger = NLTagger(tagSchemes: [.sentimentScore])
let text = "This new iPhone is great!"
tagger.string = text
let (sentiment, _) = tagger.tag(at: text.startIndex, unit: .paragraph, scheme: .sentimentScore)
print(sentiment!.rawValue)

Word Tagging

Identifying Parts of Speech

Classify nouns, verbs, adjectives, and other parts of speech in a string. Might be used in the FAQ or Help section to suggest content based on keywords, for example, nouns. Reference: https://developer.apple.com/documentation/naturallanguage/nltagscheme/2976610-lexicalclass

let text = "I love working from home, I save a lot of time on commuting."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    if let tag = tag {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }
    return true
}

 Identifying Named Entities like People, Places, and Organizations

let text = "Apple was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976, in Los Altos, California"

let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text

let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]

tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
    // Get the most likely tag, and print it if it's a named entity.
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }

    // Get multiple possible tags with their associated confidence scores.
    let (hypotheses, _) = tagger.tagHypotheses(at: tokenRange.lowerBound, unit: .word, scheme: .nameType, maximumCount: 1)
    print(hypotheses)

   return true
}

Word Embedding

This operation allows mapping strings into vector space to make it easier to group them into similar semantic groups. What we can do:

Get vector for word

let embedding = NLEmbedding.wordEmbedding(for: .english)
let vector = embedding!.vector(for: "chair")
print(vector)

Compute distance between two words

let embedding = NLEmbedding.wordEmbedding(for: .english)
let distance = embedding!.distance(between: "couch", and: "sofa")
print(distance)

Get nearest neighbors for word

let embedding = NLEmbedding.wordEmbedding(for: .english)
embedding!.enumerateNeighbors(for: "chair", maximumCount: 5) { (string, distance) -> Bool in
    print("\(string) - \(distance)")
    return true
}

Get nearest neighbors for vector

let embedding = NLEmbedding.wordEmbedding(for: .english)
let vectorA = embedding!.vector(for: "couch")
let vectorB = embedding!.vector(for: "window")
let vector = zip(vectorA!,vectorB!).map(+)
embedding!.enumerateNeighbors(for: vector, maximumCount: 5) { (string, distance) -> Bool in
    print("\(string) - \(distance)")
    return true
}

Sentence Embedding

This operation allows mapping whole sentences into vector space to make it easier to compare them or group into similar semantic groups. What we can do:

Get vector for sentence

if let embedding = NLEmbedding.sentenceEmbedding(for: .english) {
    let sentence = "This is a sentence."

    if let vector = embedding.vector(for: sentence) {
        print(vector)
    }
}

Compute distance between two sentences -> for example to find best matching section in help menu

if let embedding = NLEmbedding.sentenceEmbedding(for: .english) {
    let sentence = "I'm working from home."

    let dist = embedding.distance(between: sentence, and: "He is working in the office.")
    print(dist)
}

Leave a Reply

Your email address will not be published. Required fields are marked *