With Lantern CLI you can generate embeddings for your Postgres data locally without using any third party APIs.


Get Available Models

lantern-cli show-models

You will see an output like this

[*] [Lantern Embeddings] Available Models

BAAI/bge-small-en - type: textual, downloaded: true
transformers/multi-qa-mpnet-base-dot-v1 - type: textual, downloaded: false
microsoft/all-mpnet-base-v2 - type: textual, downloaded: false
thenlper/gte-base - type: textual, downloaded: true
clip/ViT-B-32-textual - type: textual, downloaded: true
llmrails/ember-v1 - type: textual, downloaded: true
microsoft/all-MiniLM-L12-v2 - type: textual, downloaded: true
BAAI/bge-base-en - type: textual, downloaded: true
intfloat/e5-large-v2 - type: textual, downloaded: false
intfloat/e5-base-v2 - type: textual, downloaded: true
thenlper/gte-large - type: textual, downloaded: true
BAAI/bge-large-en - type: textual, downloaded: true
clip/ViT-B-32-visual - type: visual, downloaded: true

downloaded - if the model onnx file and tokenizer are already downloaded or not (it will be automatically downloaded on the first run) type - if visual you should provide either image url or image path as input to generate embeddings for the image data

Set Up Data

You can skip this step if you already have data in your database

INSERT INTO articles (title) VALUES ('What is vector search'), ('Getting your AI application up and running in minutes'), ('HNSW vs IVFFLAT');

Run Embedding Generation

lantern-cli create-embeddings  --model 'microsoft/all-MiniLM-L12-v2'  --uri 'postgresql://[username]:[password]@localhost:5432/[db]' --table "articles" --column "title" --out-column "title_embedding" --pk id --batch-size 100

Verify Results

You can now query the database and see that embeddings have been generated for your data.

SELECT title_embedding FROM articles;

Follow the Lantern Extras documentation to see how to generate embeddings inside Postgres!

CLI parameters

Run bash lantern-cli create-embeddings --help to get available CLI parameters

Usage: lantern-cli create-embeddings [OPTIONS] --model <MODEL> --uri <URI> --table <TABLE> --column <COLUMN> --out-column <OUT_COLUMN>

  -m, --model <MODEL>            Model name
  -u, --uri <URI>                Fully associated database connection string including db name
  -t, --table <TABLE>            Table name
  -s, --schema <SCHEMA>          Schema name [default: public]
  -p, --pk <PK>                  Table primary key column name [default: id]
  -c, --column <COLUMN>          Column name to generate embeddings for
      --out-uri <OUT_URI>        Output db uri, fully associated database connection string including db name. Defaults to
      --out-table <OUT_TABLE>    Output table name. Defaults to table
      --out-column <OUT_COLUMN>  Output column name
  -b, --batch-size <BATCH_SIZE>  Batch size
  -d, --data-path <DATA_PATH>    Data path
      --visual                   If model is visual
  -o, --out-csv <OUT_CSV>        Output csv path. If specified result will be written in csv instead of database
  -f, --filter <FILTER>          Filter which will be used when getting data from source table
  -l, --limit <LIMIT>            Limit will be applied to source table if specified
      --stream                   Stream data to output table while still generating
  -h, --help                     Print help
  -V, --version                  Print version