Product Recommendation Engine

Learn how to build a product recommendation engine using collaborative filtering and Pinecone.

In this example, we will generate product recommendations for ecommerce customers based on previous orders and trending items. This example covers preparing the vector embeddings, creating and deploying the Pinecone service, writing data to Pinecone, and finally querying Pinecone to receive a ranked list of recommended products.

Data Preparation

Import Python Libraries

Copy
Copied
import os
import time
import random
import numpy as np
import pandas as pd
import scipy.sparse as sparse
import itertools

Load the (Example) Instacart Data

We are going to use the Instacart Market Basket Analysis dataset for this task.

The data used throughout this example is a set of files describing customers' orders over time. The main focus is on the orders.csv file, where each line represents a relation between a user and the order. In other words, each line has information on userid (user who made the order) and orderid. Note there is no information about products in this table. Product information related to specific orders is stored in the order_product__.csv* dataset.

Copy
Copied
order_products_train = pd.read_csv('data/order_products__train.csv')
order_products_prior = pd.read_csv('data/order_products__prior.csv')
products = pd.read_csv('data/products.csv')
orders = pd.read_csv('data/orders.csv')

order_products = order_products_train.append(order_products_prior)

Preparing data for the model

The Collaborative Filtering model used in this example requires only users’ historical preferences on a set of items. As there is no explicit rating in the data we are using, the purchase quantity can represent a “confidence” in terms of how strong the interaction was between the user and the products.

The dataframe data will store this data and will be the base for the model.

Copy
Copied
customer_order_products = pd.merge(orders, order_products, how='inner',on='order_id')

# creating a table with "confidences"
data = customer_order_products.groupby(['user_id', 'product_id'])[['order_id']].count().reset_index()
data.columns=["user_id", "product_id", "total_orders"]
data.product_id = data.product_id.astype('int64')

# Create a lookup frame so we can get the product names back in readable form later.
products_lookup = products[['product_id', 'product_name']].drop_duplicates()
products_lookup['product_id'] = products_lookup.product_id.astype('int64')

We will create three prototype users here and add them to our data dataframe. Each user will be buying only a specific product:

  • The first user will be buying only Mineral Water
  • The second user will be buying baby products: No More Tears Baby Shampoo and Baby Wash & Shampoo

These users will be later used for querying and examination of the model results.

Copy
Copied
data_new = pd.DataFrame([[data.user_id.max() + 1, 22802, 97],
                         [data.user_id.max() + 2, 26834, 89],
                         [data.user_id.max() + 2, 12590, 77]
                        ], columns=['user_id', 'product_id', 'total_orders'])
data_new
user_id product_id total_orders
0 206210 22802 97
1 206211 26834 89
2 206211 12590 77
Copy
Copied
data = data.append(data_new).reset_index(drop = True)
data.tail()
user_id product_id total_orders
13863744 206209 48697 1
13863745 206209 48742 2
13863746 206210 22802 97
13863747 206211 26834 89
13863748 206211 12590 77

In the next step, we will first extract user and item unique ids, in order to create a CSR (Compressed Sparse Row) matrix.

Copy
Copied
users = list(np.sort(data.user_id.unique()))
items = list(np.sort(products.product_id.unique()))
purchases = list(data.total_orders)

# create zero-based index position <-> user/item ID mappings
index_to_user = pd.Series(users)

# create reverse mappings from user/item ID to index positions
user_to_index = pd.Series(data=index_to_user.index + 1, index=index_to_user.values)

# create zero-based index position <-> item/user ID mappings
index_to_item = pd.Series(items)

# create reverse mapping from item/user ID to index positions
item_to_index = pd.Series(data=index_to_item.index, index=index_to_item.values)

# Get the rows and columns for our new matrix
products_rows = data.product_id.astype(int)
users_cols = data.user_id.astype(int)

# Create a sparse matrix for our users and products containing number of purchases
sparse_product_user = sparse.csr_matrix((purchases, (products_rows, users_cols)), shape=(len(items) + 1, len(users) + 1))
sparse_product_user.data = np.nan_to_num(sparse_product_user.data, copy=False)

sparse_user_product = sparse.csr_matrix((purchases, (users_cols, products_rows)), shape=(len(users) + 1, len(items) + 1))
sparse_user_product.data = np.nan_to_num(sparse_user_product.data, copy=False)

Implicit Model

In this section we will demonstrate creation and training of a recommender model using the implicit library. The recommendation model is based off the algorithms described in the paper Collaborative Filtering for Implicit Feedback Datasets with performance optimizations described in Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering.

Copy
Copied
!pip install --quiet -U implicit
Copy
Copied
import implicit
from implicit import evaluation

#split data into train and test sets
train_set, test_set = evaluation.train_test_split(sparse_product_user, train_percentage=0.9)

# initialize a model
model = implicit.als.AlternatingLeastSquares(factors=100,
                                             regularization=0.05,
                                             iterations=50,
                                             num_threads=1)

alpha_val = 15
train_set = (train_set * alpha_val).astype('double')

# train the model on a sparse matrix of item/user/confidence weights
model.fit(train_set, show_progress = True)
WARNING:root:OpenBLAS detected. Its highly recommend to set the environment variable 'export OPENBLAS_NUM_THREADS=1' to disable its internal multithreading



  0%|          | 0/50 [00:00<?, ?it/s]

We will evaluate the model using the inbuilt library function

Copy
Copied
test_set = (test_set * alpha_val).astype('double')
evaluation.ranking_metrics_at_k(model, train_set.T, test_set.T, K=100,
                         show_progress=True, num_threads=1)
  0%|          | 0/206212 [00:00<?, ?it/s]





{'precision': 0.27489359984895717,
 'map': 0.04460861877969595,
 'ndcg': 0.14436536385146576,
 'auc': 0.6551648380086259}

This is what item and user factors look like. These vectors will be stored in our vector index later and used for recommendation.

Copy
Copied
model.item_factors[1:3]
array([[ 0.01009897,  0.00260342,  0.00165942,  0.01748168,  0.00649343,
        -0.01647822,  0.01860397, -0.01009837,  0.01125452,  0.01987451,
        -0.00579512,  0.00421128,  0.01707346, -0.00212536,  0.01915585,
         0.03640049, -0.01142028,  0.01023709,  0.00446458, -0.00143529,
        -0.00024208,  0.00909473, -0.01408565,  0.02619351,  0.00210135,
        -0.00378899,  0.01231347,  0.00278133,  0.00071992,  0.00915809,
         0.01640408,  0.00880539, -0.00648519, -0.01160682,  0.00664212,
        -0.00406996,  0.01543106,  0.00690582,  0.00898032,  0.00277333,
         0.00626428, -0.01610408,  0.01018737,  0.0008459 ,  0.02026955,
        -0.01055363, -0.00107795,  0.01484767,  0.01800155, -0.00275021,
        -0.0018283 , -0.00346971,  0.00077051, -0.01080908,  0.00037001,
        -0.00290308,  0.00491365, -0.01362148, -0.00129594,  0.00192484,
         0.00101756, -0.00051836,  0.00603317,  0.01611738,  0.00511096,
        -0.0053055 ,  0.01907502,  0.01232757,  0.01042075,  0.01301588,
         0.00567376,  0.0152219 ,  0.02414433,  0.01395251,  0.00916175,
         0.01294622,  0.00187435,  0.01768819,  0.01806206,  0.01500281,
         0.01065951,  0.02733074,  0.00765102,  0.00435439, -0.01976543,
         0.01680202,  0.00840835,  0.00042277, -0.00216795,  0.00113048,
        -0.00012699,  0.01142939,  0.01374972, -0.00985129,  0.00935802,
         0.00541372,  0.01037668,  0.02024015, -0.00793628, -0.00261189],
       [ 0.00088747,  0.00581244,  0.00074211,  0.00428396,  0.00124957,
         0.00699728,  0.00304013,  0.00676518,  0.00414387,  0.00205417,
         0.0029335 ,  0.00505301,  0.00522107,  0.00404108,  0.00236721,
         0.00406507,  0.00101947,  0.00298186,  0.00049156,  0.00279067,
         0.00343525,  0.00175488,  0.00907208,  0.00276436,  0.00414505,
         0.00458229,  0.00363405,  0.00375954,  0.00198171,  0.00270804,
         0.00479605,  0.00120687,  0.00249341,  0.00051512, -0.00110135,
         0.00844493,  0.00641403,  0.00101385,  0.00484058,  0.00632413,
         0.00334539,  0.00232208,  0.00288551,  0.00755766,  0.00279979,
         0.00587453,  0.00742234,  0.00580525,  0.00412665,  0.00347631,
         0.00433106,  0.00427196,  0.00670939,  0.00304596,  0.00385384,
         0.00222394,  0.00511582,  0.00354225,  0.00200116,  0.00717725,
         0.00186237,  0.00434178,  0.00102088,  0.00222063,  0.00230367,
         0.00420666,  0.00698098,  0.00549557,  0.00345657,  0.00642341,
         0.00036   ,  0.00464778,  0.00284442,  0.00530352,  0.00218676,
         0.00493103,  0.00179086,  0.0041003 ,  0.00497837,  0.0068793 ,
         0.00429972,  0.00396508,  0.00451153,  0.00486684,  0.00272128,
         0.00467645,  0.00423267,  0.00388015,  0.00339444,  0.00115735,
         0.00807636,  0.00298532,  0.00143811,  0.00293057,  0.00590145,
         0.00418158,  0.00488713,  0.00097365, -0.00083799,  0.00363581]],
      dtype=float32)
Copy
Copied
model.user_factors[1:3]
array([[ 7.24285245e-01,  5.59004486e-01,  4.96992081e-01,
        -4.15437818e-01, -1.94785964e+00, -2.23764396e+00,
        -1.76767483e-02, -2.21530461e+00, -6.52559578e-01,
         2.78620571e-01,  6.03808701e-01,  1.27670407e-01,
         3.06052566e-01, -9.93388355e-01, -5.34315288e-01,
         1.20948291e+00, -2.11217976e+00,  1.67127061e+00,
         1.03314137e+00,  8.54326487e-01,  1.85733151e+00,
         5.69297194e-01, -8.93577933e-01,  1.76394248e+00,
         1.28939009e+00,  3.32375497e-01, -2.60327369e-01,
         4.21450347e-01, -1.72091925e+00,  1.10491872e+00,
        -1.86411276e-01, -3.51959467e-02, -1.41517222e+00,
        -9.19971287e-01,  4.63204056e-01, -4.07809407e-01,
         1.23038590e+00, -8.25872004e-01, -1.50579488e+00,
         8.65903348e-02, -7.29649186e-01, -5.21384776e-01,
         1.59157085e+00, -8.51297379e-01,  2.81686401e+00,
        -8.55669677e-01, -3.48052949e-01, -5.16085029e-01,
         8.01080287e-01,  1.04207866e-01, -2.72860657e-02,
        -5.18645883e-01, -1.77561533e+00, -1.22266948e+00,
        -1.74415603e-01,  3.58568132e-01, -8.37117255e-01,
        -1.45265543e+00,  2.43810445e-01,  5.80842435e-01,
        -5.91480255e-01,  1.29645097e+00,  1.47483099e+00,
        -6.84086800e-01, -7.20921755e-01, -1.11399984e+00,
         2.38089368e-01,  2.19725475e-01,  3.29073220e-01,
        -6.45937538e-03,  2.44079873e-01,  1.26761782e+00,
         7.07967520e-01,  1.21964478e+00,  1.10735869e+00,
         1.02583379e-01, -2.92189389e-01,  5.52688181e-01,
         1.61700773e+00,  5.11932790e-01, -2.67194122e-01,
         1.47362947e+00, -1.13380539e+00,  1.40330446e+00,
         4.91484731e-01,  1.36100423e+00,  1.80482656e-01,
         9.14917171e-01,  6.22740746e-01, -1.88607132e+00,
        -1.34071469e+00, -2.27820247e-01,  1.15018475e+00,
        -1.23491549e+00, -4.78476077e-01, -4.65549737e-01,
         9.11170244e-01,  2.07606936e+00,  1.04314007e-01,
         1.81862903e+00],
       [ 8.30793440e-01,  3.86868089e-01, -1.63957000e-01,
         6.93703368e-02,  1.53786719e+00, -5.87535620e-01,
         3.72619987e+00,  1.22163899e-01, -8.54973614e-01,
         1.11186251e-01, -1.42095876e+00, -8.75619590e-01,
        -1.81247914e+00, -9.44502056e-01,  8.14570427e-01,
        -5.43736219e-01, -6.02845371e-01,  2.01962996e+00,
         1.60777140e+00,  2.20254612e+00,  2.08239055e+00,
         8.16642225e-01, -4.42571700e-01,  6.22263908e-01,
         6.29432023e-01, -1.16571808e+00,  2.32731175e+00,
        -1.12640738e+00,  1.60938001e+00,  4.67458010e+00,
        -1.46235943e+00,  1.46000063e+00,  1.11922979e-01,
        -2.55218220e+00,  7.85077095e-01,  8.50843608e-01,
        -1.10671151e+00, -6.06540870e-03,  2.76003122e-01,
        -9.57318366e-01, -1.30121040e+00, -3.81188631e-01,
         2.17489243e+00,  8.48001361e-01,  2.24089599e+00,
        -1.32857335e+00,  9.44799244e-01,  2.29169533e-01,
         1.10746622e+00, -3.48530680e-01, -2.12854624e+00,
         4.96270150e-01, -1.30754066e+00,  1.41697776e+00,
         2.73206377e+00,  1.48888981e+00, -1.58728147e+00,
         1.58903934e-03,  1.66406441e+00, -1.75263867e-01,
         2.02891684e+00, -1.95949566e+00,  1.52711666e+00,
         8.71322572e-01,  1.82597125e+00,  1.37408182e-01,
        -1.81464672e+00, -1.04905093e+00, -2.37590694e+00,
         8.15740228e-01,  1.64217085e-01,  1.99734032e+00,
        -1.54955173e+00, -5.57012379e-01,  1.32525837e+00,
        -1.30014801e+00,  1.32985008e+00, -3.50400567e+00,
         2.45490909e-01, -2.43037295e+00, -2.74685884e+00,
        -2.12384558e+00, -1.42703640e+00, -6.69254959e-01,
         1.30702591e+00, -2.15909433e+00,  1.44703603e+00,
        -2.29611732e-02,  1.82583869e+00,  1.57409739e+00,
        -3.97216320e-01, -6.94107652e-01,  2.89623165e+00,
         2.33722359e-01, -5.27708590e-01,  1.04344904e+00,
         8.51706207e-01, -4.50546294e-01,  1.38413882e+00,
         2.07552814e+00]], dtype=float32)

Configure Pinecone

Install and setup Pinecone

Copy
Copied
!pip install --quiet -U pinecone-client
Copy
Copied
import pinecone
Copy
Copied
# Load Pinecone API key
api_key = os.getenv('PINECONE_API_KEY') or 'YOUR_API_KEY'
# Set Pinecone environment. Default environment is us-west1-gcp
env = os.getenv('PINECONE_ENVIRONMENT') or 'us-west1-gcp'

pinecone.init(api_key=api_key, environment=env)

Get a Pinecone API key if you don't have one.

Copy
Copied
#List all present indexes associated with your key, should be empty on the first run
pinecone.list_indexes()
[]

Create an Index

Copy
Copied
# Set a name for your index
index_name = 'shopping-cart-demo'
Copy
Copied
# Make sure service with the same name does not exist
if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)
pinecone.create_index(name=index_name, dimension=100)

Connect to the new index

Copy
Copied
index = pinecone.Index(index_name=index_name)

Load Data

Uploading all items (products that one can buy) and displaying some examples of products and their vector representations.

Copy
Copied
# Get all of the items
all_items = [title for title in products_lookup['product_name']]

# Transform items into factors
items_factors = model.item_factors

# Prepare item factors for upload
items_to_insert = list(zip(all_items, items_factors[1:].tolist()))
display(items_to_insert[:2])
[('Chocolate Sandwich Cookies',
  [0.010098974220454693,
   0.0026034200564026833,
   0.0016594183398410678,
   0.017481675371527672,
   0.006493427790701389,
   -0.016478220000863075,
   0.018603969365358353,
   -0.010098369792103767,
   0.01125451922416687,
   0.019874505698680878,
   -0.005795117933303118,
   0.00421128049492836,
   0.017073458060622215,
   -0.0021253626327961683,
   0.019155845046043396,
   0.036400485783815384,
   -0.01142028160393238,
   0.010237086564302444,
   0.004464581608772278,
   -0.0014352924190461636,
   -0.00024208369723055512,
   0.009094727225601673,
   -0.014085653237998486,
   0.02619350701570511,
   0.002101349411532283,
   -0.0037889881059527397,
   0.012313470244407654,
   0.002781332703307271,
   0.0007199185783974826,
   0.009158086962997913,
   0.016404075548052788,
   0.008805392310023308,
   -0.006485185585916042,
   -0.01160681527107954,
   0.006642122287303209,
   -0.004069960676133633,
   0.015431062318384647,
   0.006905817426741123,
   0.008980315178632736,
   0.002773326588794589,
   0.0062642814591526985,
   -0.0161040760576725,
   0.010187366977334023,
   0.0008458984084427357,
   0.02026955410838127,
   -0.010553630068898201,
   -0.0010779497679322958,
   0.014847667887806892,
   0.018001552671194077,
   -0.0027502067387104034,
   -0.0018282983219251037,
   -0.0034697114024311304,
   0.000770510989241302,
   -0.010809078812599182,
   0.0003700107627082616,
   -0.002903081476688385,
   0.004913648124784231,
   -0.01362148392945528,
   -0.001295942347496748,
   0.0019248360767960548,
   0.0010175565257668495,
   -0.0005183601751923561,
   0.006033174227923155,
   0.016117379069328308,
   0.005110959522426128,
   -0.00530549930408597,
   0.019075021147727966,
   0.012327569536864758,
   0.01042074803262949,
   0.01301588024944067,
   0.005673760548233986,
   0.015221904963254929,
   0.024144325405359268,
   0.01395251415669918,
   0.009161749854683876,
   0.012946223840117455,
   0.0018743481487035751,
   0.017688188701868057,
   0.018062060698866844,
   0.015002812258899212,
   0.010659514926373959,
   0.02733074128627777,
   0.0076510170474648476,
   0.0043543861247599125,
   -0.019765431061387062,
   0.016802024096250534,
   0.008408350870013237,
   0.0004227694298606366,
   -0.002167945960536599,
   0.0011304811341688037,
   -0.0001269889180548489,
   0.01142938993871212,
   0.013749724254012108,
   -0.00985129363834858,
   0.009358019568026066,
   0.0054137222468853,
   0.010376684367656708,
   0.020240148529410362,
   -0.007936276495456696,
   -0.0026118927635252476]),
 ('All-Seasons Salt',
  [0.0008874664781615138,
   0.0058124433271586895,
   0.0007421106565743685,
   0.00428396463394165,
   0.001249574706889689,
   0.006997276097536087,
   0.0030401344411075115,
   0.006765175145119429,
   0.004143866710364819,
   0.0020541702397167683,
   0.002933498937636614,
   0.005053007043898106,
   0.00522107258439064,
   0.004041083622723818,
   0.002367211040109396,
   0.004065068904310465,
   0.0010194696951657534,
   0.0029818632174283266,
   0.0004915563040412962,
   0.0027906731702387333,
   0.0034352506045252085,
   0.0017548849573358893,
   0.009072077460587025,
   0.002764355158433318,
   0.004145053215324879,
   0.004582288675010204,
   0.003634049789980054,
   0.0037595359608531,
   0.00198170798830688,
   0.002708042971789837,
   0.004796050023287535,
   0.0012068713549524546,
   0.0024934052489697933,
   0.0005151224322617054,
   -0.001101348432712257,
   0.00844493042677641,
   0.006414031144231558,
   0.001013854518532753,
   0.0048405807465314865,
   0.006324129644781351,
   0.0033453928772360086,
   0.0023220758885145187,
   0.002885512774810195,
   0.007557660341262817,
   0.002799794776365161,
   0.005874533671885729,
   0.007422335911542177,
   0.0058052497915923595,
   0.004126648418605328,
   0.0034763067960739136,
   0.004331058822572231,
   0.004271955695003271,
   0.00670938566327095,
   0.0030459642875939608,
   0.0038538381922990084,
   0.0022239401005208492,
   0.005115816835314035,
   0.003542253514751792,
   0.002001164946705103,
   0.007177253719419241,
   0.0018623704090714455,
   0.004341782070696354,
   0.0010208759922534227,
   0.0022206329740583897,
   0.002303670858964324,
   0.004206661134958267,
   0.006980976089835167,
   0.005495565943419933,
   0.003456572536379099,
   0.006423408165574074,
   0.0003599990450311452,
   0.004647782538086176,
   0.0028444179333746433,
   0.005303522571921349,
   0.0021867596078664064,
   0.004931030794978142,
   0.0017908598529174924,
   0.0041002980433404446,
   0.004978368990123272,
   0.006879299879074097,
   0.004299724940210581,
   0.0039650811813771725,
   0.004511528182774782,
   0.00486684450879693,
   0.0027212793938815594,
   0.004676445387303829,
   0.0042326669208705425,
   0.003880152478814125,
   0.003394442144781351,
   0.0011573455994948745,
   0.008076360449194908,
   0.0029853193555027246,
   0.0014381115324795246,
   0.0029305710922926664,
   0.005901449825614691,
   0.004181584343314171,
   0.004887125454843044,
   0.0009736462379805744,
   -0.0008379911305382848,
   0.0036358062643557787])]

Insert items into the index

Copy
Copied
def chunks(iterable, batch_size=100):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it, batch_size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it, batch_size))
Copy
Copied
print('Index statistics before upsert:', index.describe_index_stats())

for e, batch in enumerate(chunks([(ii[:64],x) for ii,x in items_to_insert])):
    index.upsert(vectors=batch)

print('Index statistics after upsert:', index.describe_index_stats())
Index statistics before upsert: {'dimension': 0, 'namespaces': {}}
Index statistics after upsert: {'dimension': 100, 'namespaces': {'': {'vector_count': 49677}}}

This is a helper method for analysing recommendations later. This method returns top N products that someone bought in the past (based on product quantity).

Copy
Copied
def products_bought_by_user_in_the_past(user_id: int, top: int = 10):

    selected = data[data.user_id == user_id].sort_values(by=['total_orders'], ascending=False)

    selected['product_name'] = selected['product_id'].map(products_lookup.set_index('product_id')['product_name'])
    selected = selected[['product_id', 'product_name', 'total_orders']].reset_index(drop=True)
    if selected.shape[0] < top:
        return selected

    return selected[:top]
Copy
Copied
data.tail()
user_id product_id total_orders
13863744 206209 48697 1
13863745 206209 48742 2
13863746 206210 22802 97
13863747 206211 26834 89
13863748 206211 12590 77

Query for Recommendations

We are now retrieving user factors for users that we have manually created before for testing purposes. Besides these users, we are adding a random existing user. We are also displaying these users so you can see what these factors look like.

Copy
Copied
user_ids = [206210, 206211, 103593]
user_factors = model.user_factors[user_to_index[user_ids]]

display(user_factors[1:])
array([[-2.446773  , -0.62870413, -0.9166386 , -1.0933994 ,  0.9897131 ,
        -2.166681  ,  0.09873585,  1.1049409 ,  1.6753025 ,  1.5794269 ,
         1.8142459 ,  1.5048354 ,  0.7157051 , -0.7888281 ,  0.06156079,
        -1.6539581 , -0.15790005,  0.5999737 , -1.4803663 , -0.03179923,
         0.91451246,  0.14260213, -1.1541293 , -0.01566206, -1.3449577 ,
        -2.232925  , -0.88052607,  0.19183849,  0.3109626 ,  1.32479   ,
         0.16483077, -0.8045166 ,  1.36922   ,  0.81774026,  1.3368418 ,
         2.8871357 ,  2.4540865 , -1.908394  ,  2.8208447 , -1.3499558 ,
        -0.90089166,  1.0632626 ,  1.8107275 , -0.83986664,  1.1764408 ,
        -1.6621239 , -1.4636188 , -2.3367987 , -1.2510904 ,  0.4349534 ,
         0.08233324,  1.0688674 , -0.41190436,  1.6045849 , -2.3667567 ,
        -1.8557758 , -0.1931467 ,  0.10383442,  1.3932719 ,  1.3465406 ,
        -0.17274773,  0.41542327, -1.0992794 ,  1.7954347 , -0.9157203 ,
        -0.3183454 ,  0.7724282 , -0.5658835 ,  1.0758705 , -1.7377888 ,
         2.0294137 , -2.1382923 ,  1.0606468 ,  1.800927  , -1.3713943 ,
         1.0659586 ,  0.31013912, -0.5963934 ,  0.69738954,  1.383554  ,
         1.0078012 , -2.7117298 , -1.7087    ,  0.4050448 ,  3.548174  ,
         0.27247337, -0.16570352, -0.92676795, -1.2243328 ,  0.63455725,
        -1.5337977 , -2.8735108 ,  1.2812912 , -0.11600056,  1.2358317 ,
         0.5591759 , -0.63913107,  1.2325013 ,  1.3712876 , -1.3370212 ],
       [ 1.70396   , -1.5320156 ,  2.8847353 ,  0.32170388,  1.3340172 ,
        -1.1947397 ,  1.9013127 , -0.4816413 , -2.0899863 , -1.2761233 ,
        -1.8430734 , -0.6221577 ,  0.8063771 ,  1.2961249 ,  0.18268324,
        -3.2958453 , -0.31202024,  3.8049164 ,  0.73393685,  1.7682556 ,
         0.372242  ,  1.002703  ,  0.32070097,  0.2046866 ,  0.9008953 ,
         1.3807229 ,  1.1176021 ,  0.1957425 , -1.3196671 ,  2.1180258 ,
         0.48846507,  0.76666814, -0.30274457, -2.5167181 ,  0.3489467 ,
         2.0131872 , -1.5119745 , -0.91736513,  1.3228838 , -1.5192536 ,
        -1.1463904 , -1.0334512 ,  1.2355485 , -0.21977787,  2.3017268 ,
        -1.4751832 , -0.6216355 ,  0.3089897 , -0.85497165, -0.31444585,
        -3.100829  ,  2.390458  ,  0.07399248, -0.09938905, -1.0162137 ,
         1.9475894 , -0.9248195 , -1.084834  ,  0.39212215,  0.6491842 ,
         1.2028612 , -1.0323097 ,  2.6522071 , -0.8172474 ,  1.0873827 ,
        -2.9416876 , -0.06957518, -0.7316911 , -0.7430743 ,  0.319504  ,
        -0.9984044 ,  0.06710945, -3.003772  ,  0.6744962 ,  2.1210036 ,
        -0.4559903 ,  0.6154137 , -1.7743443 ,  0.5672013 ,  1.004357  ,
        -1.8588076 ,  0.05864619,  0.01209994,  2.0575655 , -1.1680491 ,
         0.3783967 ,  1.6527759 ,  1.5397102 , -0.2965242 ,  2.5335467 ,
        -0.40009058, -0.66989446, -1.6143844 ,  0.7761751 , -1.0538983 ,
         0.48226374,  1.2432365 ,  2.1671696 ,  1.7070205 ,  0.2968687 ]],
      dtype=float32)

Model recommendations

We will now retrieve recommendations from our model directly, just to have these results as a baseline.

Copy
Copied
print("Model recommendations\n")

start_time = time.process_time()
recommendations0 = model.recommend(userid=user_ids[0], user_items=sparse_user_product)
recommendations1 = model.recommend(userid=user_ids[1], user_items=sparse_user_product)
recommendations2 = model.recommend(userid=user_ids[2], user_items=sparse_user_product)
print("Time needed for retrieving recommended products: " + str(time.process_time() - start_time) + ' seconds.\n')

print('\nRecommendations for person 0:')
for recommendation in recommendations0:
    product_id = recommendation[0]
    print(products_lookup[products_lookup.product_id == product_id]['product_name'].values)

print('\nRecommendations for person 1:')
for recommendation in recommendations1:
    product_id = recommendation[0]
    print(products_lookup[products_lookup.product_id == product_id]['product_name'].values)

print('\nRecommendations for person 2:')
for recommendation in recommendations2:
    product_id = recommendation[0]
    print(products_lookup[products_lookup.product_id == product_id]['product_name'].values)
Model recommendations

Time needed for retrieving recommended products: 0.0625 seconds.


Recommendations for person 0:
['Sparkling Water']
['Soda']
['Smartwater']
['Zero Calorie Cola']
['Natural Artesian Water']
['Natural Spring Water']
['Distilled Water']
['Sparkling Natural Mineral Water']
['Spring Water']
['Drinking Water']

Recommendations for person 1:
['Baby Wipes Sensitive']
['YoKids Squeezers Organic Low-Fat Yogurt, Strawberry']
['Organic Blackberries']
['Organic Whole Milk']
['Eggo Pancakes Minis']
['Natural California Raisins Mini Snack Boxes']
['100% Raw Coconut Water']
['White Buttermints']
['Danimals Strawberry Explosion Flavored Smoothie']
['Strawberry Explosion/Banana Split Smoothie']

Recommendations for person 2:
['Organic Golden Delicious Apple']
['Organic Red Delicious Apple']
['Bartlett Pears']
['Organic Blackberries']
['Bag of Organic Bananas']
['Black Seedless Grapes']
['Organic Braeburn Apple']
['Organic Blueberries']
["Organic D'Anjou Pears"]
['White Peach']

Query the index

Let's now query the index to check how quickly we retrieve results. Please note that query speed depends in part on your internet connection.

Copy
Copied
# Query by user factors

start_time = time.process_time()
query_results = index.query(queries=user_factors[:-1].tolist(), top_k=10)
print("Time needed for retrieving recommended products using Pinecone: " + str(time.process_time() - start_time) + ' seconds.\n')

for _id, res in zip(user_ids, query_results.results):
    print(f'user_id={_id}')
    df = pd.DataFrame(
        {
            'products': [match.id for match in res.matches],
            'scores': [match.score for match in res.matches]
        }
    )
    print("Recommendation: ")
    display(df)
    print("Top buys from the past: ")
    display(products_bought_by_user_in_the_past(_id, top=15))
Time needed for retrieving recommended products using Pinecone: 0.03125 seconds.

user_id=206210
Recommendation: 
products scores
0 Mineral Water 0.919242
1 Zero Calorie Cola 0.716640
2 Orange & Lemon Flavor Variety Pack Sparkling F... 0.631119
3 Sparkling Water 0.603575
4 Milk Chocolate Almonds 0.577868
5 Extra Fancy Unsalted Mixed Nuts 0.577714
6 Popcorn 0.565397
7 Organic Coconut Water 0.547605
8 Drinking Water 0.542832
9 Tall Kitchen Bag With Febreze Odor Shield 0.538533
Top buys from the past: 
product_id product_name total_orders
0 22802 Mineral Water 97
user_id=206211
Recommendation: 
products scores
0 Baby Wash & Shampoo 0.731054
1 No More Tears Baby Shampoo 0.695655
2 Size 6 Baby Dry Diapers 0.526953
3 Natural Applesauce Snack & Go Pouches 0.478145
4 White Buttermints 0.475006
5 Size 5 Cruisers Diapers Super Pack 0.474203
6 Go-Gurt SpongeBob SquarePants Strawberry Ripti... 0.461982
7 Baby Wipes Sensitive 0.461840
8 Original Detergent 0.456813
9 Stage 1 Newborn Hypoallergenic Liquid Detergent 0.456143
Top buys from the past: 
product_id product_name total_orders
0 26834 No More Tears Baby Shampoo 89
1 12590 Baby Wash & Shampoo 77

Note The inference using Pinecone is much faster compared to retrieving recommendations from a model directly. Please note that this result depends on your internet connection as well.

All that’s left to do is surface these recommendations on the shopping site, or feed them into other applications.

Clean up

Delete the index once you are sure that you do not want to use it anymore. Once it is deleted, you cannot reuse it.

Copy
Copied
pinecone.delete_index(index_name)

Summary

In this example we used Pinecone to build and deploy a product recommendation engine that uses collaborative filtering, relatively quickly.

Once deployed, the product recommendation engine can index new data, retrieve recommendations in milliseconds, and send results to production applications.