Skip to content


Generative model vs Discriminative model: A generative AI model could be trained on a dataset of images of cats and then used to generate new images of cats. A discriminative AI model could be trained on a dataset of images of cats and dogs and then used to classify new images as either cats or dogs.

Foundation model: A foundation model is a large AI model pre-trained on a vast quantity of data that was "designed to be adapted” (or fine-tuned) to a wide range of downstream tasks, such as sentiment analysis, image captioning, and object recognition.

Prompt: A prompt is a short piece of text that is given to the large language model as input, and it can be used to control the output of the model in many ways.

We see embeddings as fundamental to AI and deep learning. Embeddings are the core of how deep learning models represent structures, mappings, hierarchy and manifolds that are learned by models. They proliferate modern deep learning from transformers to encoders, decoders, auto-encoders, recommendation engines, matrix decomposition, SVD, graph neural networks, and generative models — they are simply everywhere.

Embeddings are dense, low-dimensional representations of high-dimensional data. They are an extremely powerful tool for input data representation, compression, and cross-team collaboration. They show you relationships in unstructured data, such as the example below showing that from word embeddings we learn man is to woman as king is to queen (from the corpus the vectors were trained on).

Embeddings are foundational because:

  • They provide a common mathematical representation of your data
  • They compress your data
  • They preserve relationships within your data
  • They are the output of deep learning layers providing comprehensible linear views into complex non-linear relationships learned by models


LangChain is an open source framework for building applications based on large language models (LLMs).

LLMs are large deep-learning models pre-trained on large amounts of data that can generate responses to user queries—for example, answering questions or creating images from text-based prompts.

LangChain provides tools and abstractions to improve the customization, accuracy, and relevancy of the information the models generate. For example, developers can use LangChain components to build new prompt chains or customize existing templates.

LangChain also includes components that allow LLMs to access new data sets without retraining.

Agents, Chains

What are AI agents?

Agents are used in LLMs as a reasoning engine to determine which actions to take and in which order to take them. So instead of using an LLM directly, you can use an agent and give it access to a set of tools and context to deliver a sequence based on your intended task

Why use an agent instead of a LLM directly? Agents offer benefits like memory, planning, and learning over time.

What are complex agents? However, complex agents require more complex testing scenarios and engineering to build and maintain. These architectures may include modules for memory, planning, learning over time, and iterative processing

What is a chain? A chain is a sequence of calls to components that can be combined to create a single, coherent application. For example, we can create a chain that takes user input, formats it with a prompt template, and then passes the formatted response to an LLM.

Custom Chaining? What happens when an applications requires not just a predetermined chain of calls to LLMs, but potentially an unknown chain that depends on the user's input?

In these types of chains, there is an “agent” which has access to a suite of tools.

Depending on the user input, the agent can then decide which, if any, of these tools to call.

LLM Frameworks Frameworks like LlamaIndex and LangChain are great for leveraging agents when developing applications powered by LLMs.

ReAct(Reason + Act) Agent Architecture The Reason and Action (ReAct) framework: 1.Has the agent reason over the next action 2.Constructs a command 3.Executes the action

Prompt Engineering

Prompt engineering is a practice used to guide an LLM’s response by querying the model in a more specific way. For example, instead of merely asking a model to “create a quarterly report,” a user could provide specific figures or data in their input, providing context and leading the LLM to a more relevant and accurate output. However, copying large amounts of information into an LLM’s input is not only cumbersome but is also limited by constraints on the amount of text that can be inputted into these models at once.

This is where the concept of “search and retrieval” can enhance LLM capabilities. By pairing LLMs with an efficient search and retrieval system that has access to your proprietary data, you can overcome the limitations of both static training data and manual input. This approach is the best of both worlds, combining the expansive knowledge of LLMs with the specificity and relevance of your own data, all in real-time.


LLMs are trained on vast datasets, but these will not include your specific data (things like company knowledge bases and documentation). Retrieval-Augmented Generation (RAG) addresses this by dynamically incorporating your data as context during the generation process. This is done not by altering the training data of the LLMs but by allowing the model to access and utilize your data in real-time to provide more tailored and contextually relevant responses.

In RAG, your data is loaded and prepared for queries. This process is called indexing. User queries act on this index, which filters your data down to the most relevant context. This context and your query then are sent to the LLM along with a prompt, and the LLM provides a response.

RAG is a critical component for building applications such a chatbots or agents and you will want to know RAG techniques on how to get data into your application.

RAG Steps

To ensure accuracy and relevance, there are six stages to follow within RAG that will in turn be a part of any larger RAG application.

  • Data Indexing: Before RAG can retrieve information, the data must be aggregated and organized in an index. This index acts as a reference point for the retrieval engine.

  • Input Query Processing: The user’s input is processed and understood by the system, forming the basis of the search query for the retrieval engine.

  • Search and Ranking: The retrieval engine searches the indexed data and ranks the results in terms of relevance to the input query.

  • Prompt Augmentation: The most relevant information is then combined with the original query. This augmented prompt serves as a richer source for response generation.

  • Response Generation: Finally, the generation engine uses this augmented prompt to create an informed and contextually accurate response.

  • Evaluation: A critical step in any pipeline is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures on accuracy, faithfulness, and speed of responses.

By following these steps, RAG systems can provide answers that are not just accurate but also reflect the most current and relevant information available. This process is adaptable to both online and offline modes, each with its unique applications and advantages.