Echo Dot (3rd Gen) - Smart speaker with Alexa - Charcoal

Use your voice to play a song, artist, or genre through Amazon Music, Apple Music, Spotify, Pandora, and others. With compatible Echo devices in different rooms, you can fill your whole home with music.

Buy Now

Wireless Rechargeable Battery Powered WiFi Camera.

Wireless Rechargeable Battery Powered WiFi Camera is home security camera system lets you listen in and talk back through the built in speaker and microphone that work directly through your iPhone or Android Mic.

Buy Now

Write a post and I will tell you who you are…

0
103


So let’s try to answer the first question and check the distribution of each personality in the dataset.

It seems that there are many more introverts than extraverts in the world… Hmm after second thought that distribution doesn’t seems right. Let’s check statistics from the authors of the test.

That’s interesting! The most popular traits on the above table seem to be ISTJ and ISFJ with scores 11,6% and 13.8%. These results are completely different from those obtained by me when counting the distribution in the Kaggle dataset, where those two types are represented by around 2–2,5% people.

We see that the discrepancies apply to virtually all types. It looks like people with types INFP, INFJ, INTP and INTJ are most likely to post on a personality types forum.

Moreover when we recreate the table on the left in the provided image we can see that it is completely different as well. People with letters I, F and P in their type acronyms will be overrepresented. And in our future analysis we must remember that this data is imbalanced.

Conclusion

  1. Removing links.
  2. Removing all digits and punctuations.
  3. Lowercase all letters.
  4. Removing stop words.
  5. *At first I used lemmatizing, but it resulted in a significant reduction in accuracy, so in further analysis I abandoned it.
  6. Replacing every word with numerical representation

All of these points are quite classical NLP pipeline. I will not discuss it in detail, because this post could grow to horrendous sizes and bore less technical readers. If you want to know more, on Medium you will find lots of fantastic articles describing natural language processing pipelines.

During training I chose four different classifiers that were supposed to decide which of the attributes could be assigned to the author, that means (I)ntroverts or (E)xtraverts, (J)udgers or (F)eelers and so on.

After the first training, the results were satisfying… Suspiciously satisfying. But full of complacency I decided to go on and try to answer which of the words are most significant for each trait. And there are results.

Sigh… I forgot to remove type indicators in posts! We can see that for our classifier most valuable in deciding if the author is introvert (highest bars in the blue part of the plot) were words with personality type itself. And on the other side, most valuable words used by extraverts (highest but negative bars in red part) are this one with ‘e’ in four-letter code. That’s not fair! Let’s go back to our feature engineering process and remove these sneaky words.



Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here