SageMaker ¶

Bedrock 🧗‍♂️¶

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Why use Bedrock?

Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources.

Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

Advatages¶

Leading FM's: Amazon Bedrock helps you rapidly adapt and take advantage of the latest generative AI innovations with easy access to a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon.

The single-API access of Amazon Bedrock, regardless of the models you choose, gives you the flexibility to use different FMs and upgrade to the latest model versions with minimal code changes.
Fine Turning: Model customization helps you deliver differentiated and personalized user experiences. To customize models for specific tasks, you can privately fine-tune FMs using your own labeled datasets in just a few quick steps.
RAG: To equip the FM with up-to-date proprietary information, organizations use RAG, a technique that involves fetching data from company data sources and enriching the prompt with that data to deliver more relevant and accurate responses.

Knowledge Bases for Amazon Bedrock is a fully managed RAG capability that allows you to customize FM responses with contextual and relevant company data.
Agenting: Agents for Amazon Bedrock plan and execute multistep tasks using company systems and data sources—from answering customer questions about your product availability to taking their orders. With Amazon Bedrock, you can create an agent in just a few quick steps by first selecting an FM and providing it access to your enterprise systems, knowledge bases, and AWS Lambda functions to securely execute your APIs.

An agent analyzes the user request and automatically calls the necessary APIs and data sources to fulfill the request. Agents for Amazon Bedrock offer enhanced security and privacy—no need for you to engineer prompts, manage session context, or manually orchestrate tasks.

SageMaker 🧘¶

Sagemaker Jumpstart 🎩¶

Amazon SageMaker JumpStart is a machine learning hub that provides access to a wide range of public ML models and seamlessly integrates them into AWS infrastructure managed by SageMaker.

The hub is particularly useful for applications that need to implement common use cases, which publicly available models can solve

CodeWhisperer 🔊¶

Amazon CodeWhisperer provides generative coding capabilities across multiple coding languages, supporting productivity enhancements such as code gen‐ eration, proactively scanning for vulnerabilities and suggesting code remediations, with automatic suggestions for code attribution.

Trainium 🚊¶

AWS Trainium is the machine learning (ML) chip thsat AWS purpose built for deep learning (DL) training of 100B+ parameter models. Each EC2 Trn1 instance deploys up to 16 Trainium accelerators to deliver a high-performance, low-cost solution for DL training in the cloud.

How does trainium and Infrentia works

AWS Neuron SDK helps developers train models on Trainium accelerators (and deploy them on AWS Inferentia accelerators).

It natively integrates with popular frameworks, such as PyTorch and TensorFlow, so that you can continue to train on Trainium accelerators and use your existing code and workflows.

Infernia 📈¶

Sagemaker Deployment Types¶

After you train your machine learning model, you can deploy it using Amazon SageMaker to get predictions. Amazon SageMaker supports the following ways to deploy a model, depending on your use case:

Real-time Inference ⏩¶

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements.

These endpoints are fully managed and support autoscaling

Deployments and Endpoints

You can deploy one or more models to an endpoint with Amazon SageMaker. When multiple models share an endpoint, they jointly utilize the resources that are hosted there, such as the ML compute instances, CPUs, and accelerators.

Serverless Inference ⚙️¶

Workloads that have idle periods between traffic spikes and can tolerate cold starts, use Serverless Inference.

Asynchronous Inference 📞¶

SageMaker Asynchronous Inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements.

Optimizing Model Inference with Amazon SageMaker: A Step-by-Step Guide

In today's data-driven world, leveraging the power of models for inference is crucial. Amazon SageMaker offers a robust solution for asynchronous model inference, allowing organizations to efficiently handle large volumes of requests. Here are the 10 steps that illustrate how SageMaker manages async model inference seamlessly:

Input Upload: The user begins by uploading the input request file to an Amazon S3 bucket, which serves as the storage for data to be processed.
Invoking the Async Endpoint: Once the input is securely stored, a Data Scientist (DS) invokes the asynchronous endpoint, providing a reference to the original S3 bucket path of the uploaded data.
Webhook for Status Checks: The asynchronous endpoint responds by returning a webhook endpoint. This allows users or services to check the status of their inference requests as needed.
Enqueuing the Request: The endpoint then enqueues the model request into an internal queue, prioritizing the requests based on specific needs and optimizing workflow management.
Triggering SageMaker Compute: Following queuing, the endpoint triggers the SageMaker compute resources, initiating the inference process.
Batch File Processing: The compute resources retrieve the model input batch file, parse its contents, divide the data into manageable chunks, and commence parallel processing across multiple nodes. This parallelization enhances efficiency and reduces latency.
Batch Inference Execution: SageMaker conducts batch inference, applying the trained model to the input data and generating predictions.
Storing Outputs: Upon completion of the inference, the model outputs are written back to the designated S3 bucket for further use.
Notifying the SNS Topic: If configured, SageMaker sends a success or failure notification to an Amazon Simple Notification Service (SNS) topic, keeping stakeholders informed about the inference results.
User/Service Notification: Lastly, the SNS service notifies the user or service regarding the completion status of the inference request, ensuring streamlined communication and timely updates.

By following these structured steps, Amazon SageMaker enables organizations to efficiently process large-scale model inferences asynchronously, ultimately enhancing productivity and decision-making capabilities.

Tip

Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

Batch Inference 📦¶

To get predictions for an entire dataset, use SageMaker batch transform. See Use batch transform to run inference with Amazon SageMaker.