Problem Statement : Build a content based movie recommender system with natural language processing.Give movie name as input and should return top 3 recommended movies
For achieving the goal following steps were taken :
Data Gathering (Unstructured Data)
1. Select text data from Netflix data set.
2. Columns like director,cast,type of the content(movie or TV show),listed_ in ,description are getting used for finding better relevance .
Data pre-processing :
1.Convert to lowercase.
2. Remove trailing spaces,stopwords,endline,punctuations.
3. For now,url has been removed but it can be used for further mining.
4. Remove digits,numbers.
5. Tokenization and lemmatisation of the text for BOW.
6.All the columns are concatenated and created ‘summary’ as new column.
Feature Extraction and BOW(Bag of Words Creation):
1. Create Tf-idf vectors with 1-gram and 2-gram with TF-IDF vectorization
Create Cosine Similarity Matrix
- Create cosine similarity matrix using Tf-idf vectors.
Recommender function which compares the input (movie/TV show title) with cosine similarity scores from matrix and returns the index of most 3 similar titles of content.
Recommend content as per input (Chat flow)
1.Give the input as [movie/tv shows]
2. Find the index of the given content from the data-set ,if not present ask again for input.
3. Get the index of rows most similar to i/p using cosine similarity matrix created above.
4. Select index of top 3 from the list of cosine scores .
5. Return content title of these indices.