AI 900 Exam Prep¶

Cognitive Search¶

You can use Azure Cognitive Search's knowledge mining results and populate your knowledge base of your chatbot

NLP¶

Natural language processing (NLP) is used for tasks such as sentiment analysis, topic detection, language detection, key phrase extraction, and document categorization.
Text Analytics is the NLP solution in Azure.

Model Deployment¶

To infer a model, you need:
- REST endpoint for your service
- Key for your service
Real-time endpoints must be deployed to an AKS cluster.

Feature Engineering¶

Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
In machine learning and statistics, feature selection is the process of selecting a subset of relevant, useful features to use in building an analytical model. Feature selection helps narrow the field of data to the most valuable inputs. Narrowing the field of data helps reduce noise and improve training performance.

AutoML¶

Automated machine learning, also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.
During training, Azure Machine Learning creates a number of pipelines in parallel that try different algorithms and parameters for you. The service iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score. The higher the score, the better the model is considered to "fit" your data.It will stop once it hits the exit criteria defined in the experiment.
In machine learning, if you have labeled data, that means your data is marked up, or annotated, to show the target, which is the answer you want your machine learning model to predict.
Accuracy is simply the proportion of correctly classified instances. It is usually the first metric you look at when evaluating a classifier. However, when the test data is unbalanced (where most of the instances belong to one of the classes), or you are more interested in the performance on either one of the classes, accuracy doesn't really capture the effectiveness of a classifier.

Form Recognizer¶

Form Recognizer applies advanced machine learning to accurately extract text, key/ value pairs, and tables from documents. With just a few samples, Form Recognizer tailors its understanding to your documents, both on-premises and in the cloud. Turn forms into usable data at a fraction of the time and cost, so you can focus more time acting on the information rather than compiling it.

Regression¶

For regression problems, the label column must contain numeric data that represents the response variable. Ideally the numeric data represents a continuous scale.
Linear regression attempts to establish a linear relationship between one or more independent variables and a numeric outcome, or dependent variable.

Classification¶

Classification is a machine learning method that uses data to determine the category, type, or class of an item or row of data.
Two-class classification provides the answer to simple two-choice questions such as Yes/No or True/False

Clustering¶

It is a method of grouping data points into similar clusters. It is also called segmentation.

AML Designer¶

Azure Machine Learning designer lets you visually connect datasets and modules on an interactive canvas to create machine learning models.
Azure Machine Learning designer is a drag-and-drop UI interface for building machine learning pipelines in Azure Machine Learning Workspaces.
You can drag-and-drop datasets and modules onto the canvas.
Pipeline Draft: As you edit a pipeline in the designer, your progress is saved as a pipeline draft. You can edit a pipeline draft at any point by adding or removing components, configuring compute targets, creating parameters, and so on.
Pipeline Job: Each time you run a pipeline, the configuration of the pipeline and its results are stored in your workspace as a pipeline job. You can go back to any pipeline job to inspect it for troubleshooting or auditing. Clone a pipeline job creates a new pipeline draft for you to continue editing.

Note

Enterprise workspaces are no longer available as of September 2020. The basic workspace now has all the functionality of the enterprise workspace.

Custom Vision¶

Azure AI Custom Vision is an image recognition service that lets you build, deploy, and improve your own image identifier models. An image identifier applies labels to images, according to their visual characteristics. Each label represents a classification or object. Custom Vision allows you to specify your own labels and train custom models to detect them.
Custom Vision service can be used only on graphic files.
The Custom Vision service uses a machine learning algorithm to analyze images. You, the developer, submit groups of images that feature and lack the characteristics in question. You label the images yourself at the time of submission. Then, the algorithm trains to this data and calculates its own accuracy by testing itself on those same images.
Custom Vision functionality can be divided into two features.
1. Image classification applies one or more labels to an image.
2. Object detection is similar, but it also returns the coordinates in the image where the applied label(s) can be found.

Validation Set¶

The validation dataset is different from the test dataset that is held back from the training of the model
A validation dataset is a sample of data that is used to give an estimate of model skill while tuning model's hyperparameters

Metrics¶

The Model evaluation module outputs a confusion matrix showing the number of true positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall, and Lift curves
F1 score also known as balanced F-score or F-measure is used to evaluate a classification model.
aucROC or area under the curve (AUC) is used to evaluate a classification model.
R-squared (R2), or Coefficient of determination represents the predictive power of the model as a value between -inf and 1.00. 1.00 means there is a perfect fit, and the fit can be arbitrarily poor so the scores can be negative.
RMS-loss or Root Mean Squared Error(RMSE) (also called Root Mean Square Deviation, RMSD), measures the difference between values predicted by a model and the values observed from the environment that is being modeled.

Object detection¶

Object detection is similar to tagging, but the API returns the bounding box coordinates (in pixels) for each object found. For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image.

You can use this functionality to process the relationships between the objects in an image. It also lets you determine whether there are multiple instances of the same tag in an image.

LUIS¶

Language Understanding (LUIS) is a cloud-based API service that applies custom machine-learning intelligence to a user's conversational, natural language text to predict overall meaning, and pull out relevant, detailed information.
Design your LUIS model with categories of user intentions called intents. Each intent needs examples of user utterances. Each utterance can provide data that needs to be extracted with machine-learning entities.
Key phrase extraction/ Broad entity extraction: Identify important concepts in text, including key phrases and named entities such as people, places, and organizations.
Utterances are input from the user that your app needs to interpret. To train LUIS to extract intents and entities from them, it's important to capture a variety of different example utterances for each intent. Active learning, or the process of continuing to train on new utterances, is essential to machine-learned intelligence that LUIS provides.

Relation between intent and utterance

Each intent needs to have example utterances, at least 15. If you have an intent that does not have any example utterances, you will not be able to train LUIS. If you have an intent with one or very few example utterances, LUIS will not accurately predict the intent.

Chit-chat can be added to provide the professional greeting and make the bot more user friendly.
The None intent is filled with utterances that are outside of your domain.
For QnA maker you can extract question-answer pairs from semi-structured content, including FAQ pages, support websites, excel files, SharePoint documents, product manuals and policies.
To ensure GDPR compliance with the Language Understanding (LUIS) API in the context of a Bot Framework implementation, it is important to handle user data appropriately. Deleting the utterances from the Review endpoint utterances helps to remove any personal data that may have been collected during the interaction with the bot. This step ensures that user data is not stored or retained longer than necessary, aligning with GDPR principles of data minimization and data retention.
Text to speech enables your applications, tools, or devices to convert text into humanlike synthesized speech. The text to speech capability is also known as speech synthesis. Use humanlike prebuilt neural voices out of the box, or create a custom neural voice that's unique to your product or brand.

Remember

Intent is for tasks or actions

Entities is for object - names, dates, times, numbers, measurements, currency

KV soft delete¶

Soft-delete behavior

With this feature, the DELETE operation on a key vault or key vault object is a soft-delete, effectively holding the resources for a given retention period (90 days), while giving the appearance that the object is deleted. The service further provides a mechanism for recovering the deleted object, essentially undoing the deletion
Soft-delete is an optional Key Vault behavior and is not enabled by default in this release. It can be turned on via CLI or Powershell

Purge protection - When purge protection is on, a vault or an object in deleted state cannot be purged until the retention period of 90 days has passed. These vaults and objects can still be recovered, assuring customers that the retention policy will be followed.

Purge protection is an optional Key Vault behavior and is not enabled by default. It can be turned on via CLI or Powershell.