Continuing the work of a term project I completed this semester, I created a Python module titled database2vector that utilizes Word2Vec, a simple neural network used to produce word embeddings, as a method of creating similarity ratings between items. Depending on the database that this is applied to, the module can be used to generate recommendations for user purchases based on the history of the user purchases in the database.
Word2Vec determines similarity between words based on context, i.e. the words surrounding it in sentences of a finite corpus. There exist two models for Word2Vec: Continuous Bag-Of-Words, and Skip-gram. In both architectures, the network uses a “target” item (i.e. word) and the “context” around it (i.e. the sentence) to produce distributed representations. Which model is used in database2vector, as well as other hyper-parameters for training, can be set through command line arguments. Additionally, the model can be saved as a keyed vector such that it does not have to be regenerated each use.
In the example below, a purchase “history” is generated by using “CustomerID” as the “target field” of the context generation. Essentially, rows will be grouped based on shared “CustomerIDs”, making a list of all of their purchases.
Finally, after the word2vec model is generated using the history of customer purchases, recommendations can be made based on a given product.