The big noise in NLP at the moment is caused by the application of deep learning to language and the enormous and impenetrable language models produced to do this, by huge organizations such as Google and Microsoft.
If we all end up using Google BERT, or Microsoft’s and others’ alternatives, to build NLP solutions, we will all end up with effectively the same products, and all be in thrall to whatever defects, prejudices and misunderstandings are embedded in them.
There is another way.
This is to take an analytical approach. Generations of work by grammarians and computational linguists have given us tools to do this. The new deep learning models of language are effectively covering the same ground. The difference with the analytical technique is that the elements and the process are predictable, capable of analysis, and leave an audit trail.
The particular example of an NLP application we will look at here is building a chatbot connected to and interrogating a Knowledge graph. This also illustrates nicely that the recognition function is a small part of the entire solution. We’ll illustrate this with a real world application used in the recent UK election campaign.
If you build a chatbot using Google BERT and it misunderstands or misinterprets requests from your customers that result in some kind of loss or damage, you will have no way to understand how this happened, no redress from Google and no obvious means to fix it.
The process I’m going to describe uses a range of techniques that I invite you to investigate in detail separately. In a few thousand words we can only gloss over the most important points.
The first technique is based on “traditional” chatbot design. Chatbots have been around for many years. (I first put one online around 2006 using the long defunct Microsoft Messenger chat system.)
Such systems analyse incoming text by using a recognition tree. The root nodes are the first words in an incoming request, and subsequent nodes represent following words. At points along the tree, as well as leaf nodes, a text can be said to be recognised, and a response or action can be associated with those nodes.
A WordNet is a dictionary and thesaurus of a particular language. It represents a huge amount of human effort in creating the structure. The first WordNet, for English, was built at Princeton, and Wordnets for most other languages have been created since. The key structural element in a WordNet is hypernymy. Hypernymy is an “is a kind of” relation. Thus, in a typical WordNet, all nouns are arranged in an enormous tree with a single root for the concept “thing”. In a WordNet the nodes are not words, but synsets, groups of words which share a meaning in one of their usages.
This tree is therefore a tree of concepts. The upper levels of the tree were first proposed by Aristotle, added to by Roget, and expanded by Princeton.
Concepts are (mostly) universal to all languages, and thus using this basic concept tree the construction of WordNets for other languages is simplified.
Nouns and verbs form hypernymy trees. Nouns with a single root, verbs with some 300 root nodes. All the other parts of speech can be grouped too but have limited hypernymy.
One can therefore build an address for any concept based on its path from a given root. Lineages are these paths expressed in a textual form. Examples are:
Give birth to: verb:023,44,0
One of the principal tasks associated with concept-based processing is to determine if a particular object is a type of another. So, if a particular rule is defined for an object near the top of a hierarchy it ought to apply to objects further down. Given two lineages, if a starts with b then a is a kind of b.
When trying to recognise text using WordNet, the principal problem is that most common words are associated with more than one concept. We disambiguate these concepts by using context when reading or listening. This is much harder for a machine to do. The solution embedded into both deep learning models and my formulation is to recognise sequences.
Informally, if words were randomly distributed in sentences then my technique, and that of deep sequence learning would not work. But language is not randomly distributed. Language exists to convey information, and as such must conform to rules the listener can recognise. The result is that while a single word to lineage association often results in the wrong recognition, a sequence of three or four is seldom misidentified. Previous versions of this idea have been called “concept strings” and a more thorough description can be found here.
If you try to build a recognition tree with just words the trees can grow very large and be unwieldy to edit. If you imagine all the different ways a single question can be posed, with all the different synonyms that might be used, you can see that the resulting tree becomes difficult to maintain.
If you build a tree using concepts, or a mixture of words and concepts the tree is dramatically simplified. Also, the tree’s author can select exactly the right concept for each node, knowing that any word with that meaning will be recognised.
Figure 1, The recognition tree for the Far-left graph
The above tree illustrates how simple this approach can make the process of building a Chatbot. The lineages lack transparency, it’s true, but this tree is all that is required to interrogate the knowledge graph we will describe at the end of the article.
Chatbot frameworks offer a set of tools for getting to the point where an “intent” (the desire of the user) is recognised, but then leave the developer to hard code the action to be taken.
Using DR Andy’s IP’s DARL rule-based language, this coding can be dramatically simplified too. Each node in tree above can have a DARL fragment of varying size associated. In the case shown, which illustrates the fallback response when nothing else matches, a simple DARL rule tells the system to respond with the text “I really don’t know the answer to that”.
The recognition tree contains “value:text” elements. In this case we’re only extracting text, but we can have a range of other value types too. The values found in the text corresponding to that point in the tree are extracted and are accessible to the DARL rules. In the example later, these texts are used for searches into the knowledge graph.
As an example of this, and this is the most complicated piece of DARL used in this example, the following piece of darl code is associated with phrases matching “how are X and Y connected”.
(Note that, because concepts are used for “are” and “connected”, this will match “how is X and Y linked?”, “How were X and Y linked?” etc. X and Y will be replaced with any text that the knowledge graph recognises; so, for instance, “how are Jeremy Corbyn and Paul Mason connected?” is recognised with “Jeremy Corbyn” as X and “Paul Mason” as Y.)
output textual val1;
output textual val2;
if anything then val1 will be Value[“value:text”,”0″];
if anything then val2 will be Value[“value:text”,”1″];
if anything then response will be Graph[“path”,val1,val2,”noun:”,”noun:”];
This code extracts two text values that occur inside the question, the “x” and “Y” of the example, and passes a request to the Graph interface to get an output-able textual response.
Value and Graph are both instances of Stores, the DARL mechanism for interfacing with external data sources.
It has long been clear that the most general way to represent real world information is in the form of a Graph. Databases have been created to handle such data, such as Apache’s Tinkerpop (How I hate that name) and Neo4J. Graphs are composed of nodes and edges. Nodes being things, edges being relationships between things. Nodes and edges can both have attributes which represent extra data about them.
A graph database becomes a knowledge graph when it imposes some structural order on those elements. Exactly what that structure should be is a subject of philosophical debate, but computer scientists rush in where philosophers fear to tread — because we can’t wait the centuries it may take for them to make their minds up.
The structure I have chosen is that imposed by WordNet, and thus philosophers back to Aristotle.
Each node is annotated with a lineage corresponding to what kind of noun it represents. Each edge is annotated with a lineage representing what kind of verb it represents. The elements of WordNet themselves can also be represented in the same way. My formalism of a knowledge graph can therefore contain both abstract elements, detailing what objects exist in the world the graph represents, how they can be linked together and the real-world elements they record. An example will follow.
It’s outside of the scope of this article, but this formalism, mixing the abstract and real, permits the recognition and storage of association information that can be used in context disambiguation and ensuring the real world data stored complies with common sense reasoning.
A few months before the recent pivotal election in the UK, a political group contacted me with a data set. This was a directed graph of the main people and organisations in the far left of politics in the UK. Like the US, the recent election was particularly polarised, with parties on both sides moving to historically extreme positions.
Figure 2 The far-left graph data
The technical problem was how to represent this information so that it could easily be interrogated by users who wanted access to this data, principally journalists. I proposed a chatbot interface.
The data consisted of people and organizations, and vast amounts of biographical data on them. The relationships, the edges, were not so well annotated. The data was converted, annotated with the appropriate lineages and entered into a Gremlin database. In this case Microsoft Azure’s CosmosDB in its Gremlin form.
The very first stage of the design was to brainstorm what kind of questions could be asked of this data set. Questions of the following forms were identified
· Who is X?
· What is Y?
· Who is X connected to?
· What is Y connected to?
· Who is connected to X
· What is connected to Y
· How are X and Y connected?
Other texts like “hello” and “help” were also added to the tree. Note in the above examples, who and what imply different kinds of responses are requested. “Who is X connected to“ implies that only people should be returned.
This gave rise to the tree in Figure 1 and identified a small set of interactions required with the graph database.
· Text — returns a summary of the textual attributes associated with an object
· Links — returns a list of linked objects descended from a particular lineage.
· Path — returns the names objects in a path between two objects
· Attribute — returns an attribute associated with an object if present.
In each case lineages are used to disambiguate kinds of objects sought.
In practice, given a database full of obscure names, users could not be expected to spell the names properly. All the searches in the graph database were made fuzzy, using a metric based on the Levenshtein distance.
The Chatbot was a success; being the first Chatbot used in a UK political campaign. The general framework used was the Darl Bot system, which runs on top of the Microsoft Bot Framework on Azure. The bot itself performed little local processing, handing off recognition and response to the https://darl.dev site where the bot model was hosted, which in turn accessed the Darl Knowledge Graph, now known as ThinkBase. All these elements are hosted on Azure and are set to automatically scale on demand.
This article illustrates an analytic — i.e. non machine learning based — mechanism for creating a chatbot capable of answering questions about the contents of a large knowledge graph. Development time is low for this approach and the process is transparent and auditable. Maintenance and extension are simple, and the task is one amenable to the only slightly technical developer.