# Postgres to Pinecone Syncing

> In this chapter, we'll learn how to leverage the Pinecone Airbyte connector to synchronize a Postgres database with a Pinecone index.

## Introduction

A core part of many applications is a relational database - and in many AI applications it’s very common to use Pinecone alongside such a relational databases. PostgreSQL, commonly referred to as Postgres, is one of the most popular open-source relational databases. It is known for its robustness, scalability, and stability, making it a popular choice for many applications.

In order to facilitate the process of keeping a Postgres database in sync with your Pinecone index, we’ll make use of Airbyte’s Postgres **************source************** connector and the Pinecone **********************************destination********************************** connector. This integration streamlines the process of creating embeddings and simplifies the data pipeline, making it easier for you to focus on building you application.

## The Pinecone Airbyte connector

The Pinecone Airbyte Connector is a data integration solution that allows you to easily connect multiple data sources. It leverages Airbyte's source connectors and Pinecone's capabilities to streamline the data integration process and enable seamless data flow across different systems. The connector is highly adaptable and flexible to various data needs, with no need for complex integration steps or on-premises deployment. The connector offers the ability to embed and [upsert](https://docs.pinecone.io/docs/insert-data#upserting-records-with-sparse-values) information, with specific column and metadata fields defined by the user. This enables greater flexibility and customization, making it easier to handle specific data integration needs.

[Read about the basics of the Pinecone connector.](https://www.pinecone.io/learn/series/airbyte/airbyte-and-pinecone-intro/)

## The scenario

For this example, consider an e-commerce website that allows users to search for products based on their description. To enable better results, the search functionality will leverage semantic search powered by Pinecone. The product data will be stored in a Postgres database. Our goal will be to enable this semantic search whenever a new product is created in the database - or when any of the products are updated.

We’re going to use a table with the following structure:

```sql
CREATE TABLE public.products ( 
  id character varying(255) NOT NULL DEFAULT nextval('products_id_seq'::regclass),
  name text NOT NULL, sku character varying(255) NULL,
  description character varying(255) NULL,
  price money NULL,
  last_updated date NULL DEFAULT now() 
);
```

We load the table with a set of fictitious products found [in this file](https://raw.githubusercontent.com/pinecone-io/semantic-search-postgres/main/data/products.csv)_._ Here’s a sample:

| id | name | sku | description | price | last_update |
| 3d6ae001-ef80-4cbd-8e8e-5e296bbf80a6 | Pro System	 | SKU-fLRM-842 | This is a monetize user-centric markets product with Implemented local framework features. | $344.87 | 2023-09-20 22:51:32 |
| ee47e11c-e06b-4281-8531-4be90978549d | Lite Device | SKU-wxJp-879 | This is a iterate customized models product with User-centric zero-defect success features. | $845.00	 | 2023-09-03 22:51:32 |
| 05d6f029-1ede-446d-ad07-855bdc55a02d | Ultra Controller | SKU-raQP-504 | This is a facilitate plug-and-play technologies product with Realigned regional product features. | $242.11	 | 2023-09-09 22:51:32 |

We’ll set up a pipeline which will consume the data found in our `products` table in the Postgres database, and automatically generate and upsert embeddings for the `description` column. We’ll save the `id` of each `product` as metadata, so that we can reference the information once we retrieve the semantically relevant entries from Pinecone:

![architecture](https://cdn.sanity.io/images/vr8gru94/production/2c3864e7612d381ac8e14c0d714cd5d4e10334af-1558x458.png)


1. Postgres tracks changes in the `products` table for any insert / update / delete event.
2. The Airbtye Postgres connector picks up the changes when it is triggered.
3. When the Airbyte Pinecone connector is triggered, it takes the target column (on our case `description` and sends them to the configured embedding provider to produce embeddings. The metadata is also passed to the connector based on it’s configuration (more on configuring the connector later).
4. The embeddings and metadata are upserted into Pinecone.

## Postgres Integration with Pinecone

In this portion of the guide, we’ll walk through the set up and replication options of using Postgres with Pinecone.

### Setting up Postgres Integration with Pinecone

1. Install and configure the Postgres Airbyte connector
2. Connect to the Postgres database by providing the necessary details such as the host, port, database name, username, and password
3. Configure the data replication settings, such as the tables to be synced, the frequency of syncing, and the data mapping
4. Verify and test the data replication process to ensure that data is being synced accurately and efficiently

### Postgres Connector Configuration

To set up the Postgres connector, start by adding a new Postgres “Source”.

![postgres connector](https://cdn.sanity.io/images/vr8gru94/production/2a81fe3d787a83acc48ae9b11fe62d384c78cde2-403x86.png)


Name the source `Postgres` - one connector may serve many connections.

![create a source](https://cdn.sanity.io/images/vr8gru94/production/ba6510344d5864267472505430075e729de85d1d-640x151.png)


Next, provide the host, port and the database name for your Postgres instance. You may optionally provide a specific schema - although by default `public` should suffice.

![postgres setup](https://cdn.sanity.io/images/vr8gru94/production/ba5ff47c47f704eb26e123b172e879036f10ac57-635x406.png)


Provide the user name and password for your database. It’s advisable to create a user specific to the Airbyte connector. By default, the SSL mode is set to `require` - you should modify this if your database configuration differs. You can read more about connecting with SSL in the [Postgres Airbyte Connector page](https://docs.airbyte.com/integrations/sources/postgres/?_gl=1*gfhqtx*_gcl_au*MTMxMDgzNzgzMC4xNjkxNTE5MDU5*_ga*NDc2NTE3NjM4LjE2OTE1MTkwNjE.*_ga_EDX3TPP6H7*MTY5NjQ0MjU1Mi4xNy4xLjE2OTY0NDI3NDAuMC4wLjA.#connecting-with-ssl-or-ssh-tunneling).

![password](https://cdn.sanity.io/images/vr8gru94/production/cffdb91e019c2956f9e994d4d6e12e6850431d79-626x375.png)


As of the time of this writing, the Pinecone connector doesn’t support the CDC update method, so we’ll choose the second option, which will leverage the Xmin System Column.

Xmin replication is a cursor-less replication method for Postgres. Cursorless syncs enable syncing new or updated rows without explicitly choosing a [cursor](https://www.postgresql.org/docs/current/plpgsql-cursors.html) field. The xmin system column (available in all Postgres databases) is used to track inserts and updates to your source data.

![xmin](https://cdn.sanity.io/images/vr8gru94/production/cc756c1bed2946ab04c55def3ee7cad4c7699e22-625x483.png)


Finally, you’ll have to allow ingress to your Postgres database. You’ll find the list of IPs that must be whitelisted for the connector to work properly in the last section of the connector setup.

![white list ips](https://cdn.sanity.io/images/vr8gru94/production/a16bfe928cc44ebad37f5de38cbf7aa0523e4680-626x352.png)


### Pinecone Connector Configuration

We covered the basics of how the Pinecone connector works in the [previous chapter](https://www.pinecone.io/learn/series/airbyte/airbyte-and-pinecone-intro/), so we’ll just highlight the changes that are relevant for our setup. Since the Pinecone connector targets specific text fields for embeddings and metadata, we’ll have to create a separate connector that will handle the connection for our Postgres source.

![destination name](https://cdn.sanity.io/images/vr8gru94/production/0b23b1528871c9a25fc9bad08dbccb9ce8fa427d-520x166.png)


In the “Processing” section, set the metadata field to be stored to `id` and the text field to embed to `description`.

![destination processing](https://cdn.sanity.io/images/vr8gru94/production/52dc4193ff8fd3ef5ed791c44f1645d76d71c93a-519x401.png)


Point the connector to a 1536 dimensions index called `airbyte-postgres-products-table` , then provide the Pinecone environment and API key.


![indexing settings](https://cdn.sanity.io/images/vr8gru94/production/c49b29c37029f3d17eba61895343ce75d8d19bb8-527x305.png)


### Create a connection

We’ll start creating a new connection by selecting the Postgres source:

![define source](https://cdn.sanity.io/images/vr8gru94/production/a674e0eed4f4d478b8d23234dc9dd1ece79220d7-1091x333.png)


Then select the newly crated `Pinecone - Postgres - Products Table` destination:

![define destination](https://cdn.sanity.io/images/vr8gru94/production/53695697640da62bc770630eac0f2a4bbe1cbb76-1092x404.png)


In the next screen, we can set up the desired frequency in which the target table will be synchronized. We’ll also select the specific table in Postgres that we’d like to sync:

![configure connection](https://cdn.sanity.io/images/vr8gru94/production/eb9b278caead2292335e39ae7cc01358ef045d7b-1184x987.png)


Once our connection is created, we can trigger a manual sync by clicking “Sync now”:

![pre sync](https://cdn.sanity.io/images/vr8gru94/production/0e2c260415ea120521256f14e7c603884cb094bc-1185x458.png)


As the connector finishes upserting embeddings into Pinecone, we’ll see the updated vector count in the console:

![indexing done](https://cdn.sanity.io/images/vr8gru94/production/3e3e889d110810b1174bac4a4af262ff0890d4f5-1020x321.png)


## Testing

To test the integration, we built a [semantic search application](https://github.com/pinecone-io/semantic-search-postgres) that searches through the descriptions of each product. We’ll start with a simple search for the term `without friction`:

![initial table](https://cdn.sanity.io/images/vr8gru94/production/61bd25ffb7f244a3104f38a03660f13ff5cdecf3-754x373.png)


Since we’re performing a semantic search (and not just searching for keywords), our results will include entries with the word `frictionless` - which is semantically equivalent to `without friction`

Next, we’ll edit the first three entries, and remove the word “frictionless” from the description.


![e2e first step](https://cdn.sanity.io/images/vr8gru94/production/ad7a91c09467fa2b9697edb2d3147329b46b6c0f-748x372.png)


When we query again, the Postgres database has not yet been synchronized - and so the same 3 products still appear for the `without friction` query - even though the word `frictionless` doesn’t appear in them. Let’s synchronize the database with Pinecone again.

![re-sync](https://cdn.sanity.io/images/vr8gru94/production/12412e289a167d87804a7816e40f51e34c300617-1113x196.png)


Once the synchronization has completed, we’ll check out application once again:


![final](https://cdn.sanity.io/images/vr8gru94/production/8c55c4e074eb167a103fa20250f1653fc82a039e-735x425.png)


This time, as expected, the first 3 original products don’t show up, and instead we get only those which include the term “frictionless”. The Airbyte Postgres connector picked up the changes, and then the Airbyte Pinecone connector re-embedded the changed entries - which eventually resulted in the updated query result.

## Conclusion

With Airbyte's connectors, you can easily create embeddings and simplify your data pipeline - **without writing a single line of code**. Still, this integration will work for both modest and very large datasets, and efficiently keep the your Postgres database and Pinecone index in sync. In the next chapter, we’ll dive into how to use Airbyte with one the most popular document stores: MongoDB.