Word2Vector Application as Recommendation Engine

Continuing the work of a term project I completed this semester, I created a Python module titled database2vector that utilizes Word2Vec, a simple neural network used to produce word embeddings, as a method of creating similarity ratings between items. Depending on the database that this is applied to, the module can be used to generate recommendations for user purchases based on the history of the user purchases in the database.

Word2Vec determines similarity between words based on context, i.e. the words surrounding it in sentences of a finite corpus. There exist two models for Word2Vec: Continuous Bag-Of-Words, and Skip-gram. In both architectures, the network uses a “target” item (i.e. word) and the “context” around it (i.e. the sentence) to produce distributed representations. Which model is used in database2vector, as well as other hyper-parameters for training, can be set through command line arguments. Additionally, the model can be saved as a keyed vector such that it does not have to be regenerated each use.

In the example below, a purchase “history” is generated by using “CustomerID” as the “target field” of the context generation. Essentially, rows will be grouped based on shared “CustomerIDs”, making a list of all of their purchases.

Model generation of database2vector using an online retail data set.

Finally, after the word2vec model is generated using the history of customer purchases, recommendations can be made based on a given product.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s