Natural Language Processing, or NLP, is probably one of the hotter fields in the tech industry today. With the advent of machine learning and especially neural networks, the field has come a long way from simply using a dictionary to try to predict what word might come next in a sentence. I wanna use this blog post to just talk about some cool stuff I’ve learned while exploring this widely applicable field.
Throughout the past month or so, I’ve been working through the NLP Specialization by deeplearning.ai on Coursera, as a introduction to the field as a whole. The first few weeks were just boilerplate ML stuff like logistic regression and Naive Bayes, but when I got to the word embeddings in Week 3, I got very interested. Very basically, word embeddings are a way to represent words in the english language as high-dimensional vectors, in a manner that preserves the relations between words through metrics like distance or angles. I’ll use a quick example to explain.
The idea in the image is that each word has been mapped to a vector, which we are using colors to represent. We try to obtain this mapping so that we can use the vector representations to perform mathematical operations, in a way that allows us to extract meaning. For example, lets say we wanted to build a system that could complete the following analogy: man is to king as woman is to ______. With the proper word vectors, we could state that the vector man – king should be similar to the vector woman – answer. Thus, answer = man – king + woman, which in a good vector space, would point us to queen.
Having this representation is all nice and good since we can perform cool tasks like analogy solving, part-of-speech tagging, and entity identification, but the big question is how do we get these vectors in the first place? It turns out the answer is Word2Vec, which includes shallow neural net techniques to get these word embeddings. One of the models described in the paper, the continuous bag of words model, works at a high level as follows: use a 3-layer (input, hidden, output) neural net to accomplish some basic supervised learning task, like predicting a center word given context words (i.e. predicting “therefore” given the input “I think, _____ I am”). The NN trains 2 weight matrices to accomplish this task, and the rows/columns of these matrices can be used as word embeddings. To change the dimensionality of the embeddings, all you have to do is change the dimensionality of the hidden layer, and you’re good to go.
Clearly the process is pretty complicated, but you can get some pretty cool results if you see it through to fruition. As an example, I’ve taken the below graph from this Medium article, which I highly recommend reading as well.
This is basically how one would visual word embeddings by plotting them on a 2D plane, with the key difference that the vocabulary here consists entirely of emojis. If you look closely, you can see how some of the emojis are clustered together due to similar meanings (such as the large group of flags at the bottom, or the animals towards the left). And this graph was obtained through the (relatively) simple techniques presented in the aforementioned paper, so who knows how sophisticated this will get eventually? For now, it’s fun to look at pretty graphs like these and think about how we can use NLP to change lives.
For more info about NLP, I recommend checking out the aforementioned specialization, or this YouTube video that we made as part of our summer hackathon. Also, please be sure to check out this medium article for more detailed information about the Emoji2Vec example I used. Till next time!