Skip to content

What are embeddings?

Embeddings are numerical representations of real-world objects that machine learning (ML) and artificial intelligence (AI) systems use to understand complex knowledge domains like humans do.

Example

A bird-nest and a lion-den are analogous pairs, while day-night are opposite terms. Embeddings convert real-world objects into complex mathematical representations that capture inherent properties and relationships between real-world data. The entire process is automated, with AI systems self-creating embeddings during training and using them as needed to complete new tasks.

Advantages of using embeddings

Dimentionality reduction:

DS use embeddings to represent high-dimensional data in a low-dimensional space. In data science, the term dimension typically refers to a feature or attribute of the data. Higher-dimensional data in AI refers to datasets with many features or attributes that define each data point.

Train large language Models

Embeddings improve data quality when training/re-training large language models (LLMs).

Types of embeddings

  • Image Embeddigns - With image embeddings, engineers can build high-precision computer vision applications for object detection, image recognition, and other visual-related tasks.

  • Word Embeddings - With word embeddings, natural language processing software can more accurately understand the context and relationships of words.

  • Graph Embeddings - Graph embeddings extract and categorize related information from interconnected nodes to support network analysis.

What are Vectors?

ML models cannot interpret information intelligibly in their raw format and require numerical data as input. They use neural network embeddings to convert real-word information into numerical representations called vectors.

Vectors are numerical values that represent information in a multi-dimensional space. They help ML models to find similarities among sparsely distributed items.

The Conference (Horror, 2023, Movie)

Upload (Comedy, 2023, TV Show, Season 3)

Crypt Tales (Horror, 1989, TV Show, Season 7)

Dream Scenario (Horror-Comedy, 2023, Movie)

Their embeddings are shown below

The Conference (1.2, 2023, 20.0)

Upload (2.3, 2023, 35.5)

Crypt Tales (1.2, 1989, 36.7)

Dream Scenario (1.8, 2023, 20.0)

Embedding Models?

Data scientists use embedding models to enable ML models to comprehend and reason with high-dimensional data.

Types of embedding models are shown below

PCA

Principal component analysis (PCA) is a dimensionality-reduction technique that reduces complex data types into low-dimensional vectors. It finds data points with similarities and compresses them into embedding vectors that reflect the original data.

SVD

Singular value decomposition (SVD) is an embedding model that transforms a matrix into its singular matrices. The resulting matrices retain the original information while allowing models to better comprehend the semantic relationships of the data they represent.


Was this page helpful?
-->