Azure Monitor ¶

It delivers a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. This information helps you understand how your applications are performing and proactively identify issues that affect them and the resources they depend on.

States: - Alert State: set by user such as admin - Monitor state: Set by system

Log Analytics Workspace¶

A Log Analytics workspace is a unique environment for log data from Azure Monitor and other Azure services, such as Microsoft Sentinel and Microsoft Defender for Cloud. The connected sources, configuration, and the repository are managed per workspace.

Logs¶

Its an event-based data. Example: Syslog in Linux is an example of a log, as the log data is not consistent and the format may vary from source to source.
Free form or structured
Stored in logs analytics workspace
Analysis via Kusto Query Language (KQL): The first step in writing queries is to understand which table contains the information you need. Some examples include Event, Syslog, Heartbeat, and Alert.

View error events from a table named Event using 3 methods

search in (Event) "error"

Event | search "error"

Event | where EventType == "error"

query perf table

Perf 
| summarize AggregatedValue = count() by CounterPath

Count of agent nodes that are sending a heartbeat each day in the last month

Heartbeat 
| where TimeGenerated > startofday(ago(31d))
| summarize nodes = dcount(Computer) by bin(TimeGenerated, 1d)    
| render timechart

Log Analytics¶

A service for aggregating the log data in a single pane so that it can be analyzed, visualized and queried via KQL

Agents¶

Azure Monitor Agent (AMA) collects monitoring data from the guest operating system of Azure and hybrid virtual machines and delivers it to Azure Monitor for use

Windows agents
Linux agents

Cost control using agents

With the help of the agents configuration, you will be able to declare what logs you want to collect using the agents and what level of logging information you need. In this way, you will have granular control over what is getting ingested into your workspace.

Various logs used are:

Windows Event Logs: This helps you to select which event log items you want to ingest to the workspace
Windows Performance Counters: You can select performance counters of Windows servers and the sample rate.
Linux Performance Counters: These are performance counters for Linux servers and their sample rate.
Syslog: Control which facilities in Syslog you want to ingest.
IIS Logs: This enables collection of W3C format log files from IIS server.

DCR¶

Azure Monitor Agent uses data collection rules (DCR), where you define which data you want each agent to collect. Data collection rules let you manage data collection settings at scale and define unique, scoped configurations for subsets of machines. You can define a rule to send data from multiple machines to multiple destinations across regions and tenants.

Data retention¶

Data in each table in a Log Analytics workspace is retained for a specified period of time after which it's either removed or archived with a reduced retention fee.

Activity log¶

With the help of an activity log, you can get insights into different operations occurring at the subscription level.

Categories¶

As Azure Activity Log is a subscription-wide logging system, Azure has divided the logs ingested into different categories.

Administrative: Contains the record of all create, update, delete, and action operations performed through Resource Manager. Examples of Administrative events include create virtual machine and delete network security group.
Service Health: Contains the record of any service health incidents that have occurred in Azure. An example of a Service Health event SQL Azure in East US is experiencing downtime.
Resource Health: Contains the record of any resource health events that have occurred to your Azure resources. An example of a Resource Health event is Virtual Machine health status changed to unavailable.
Alert: Contains the record of activations for Azure alerts. An example of an Alert event is CPU % on myVM has been over 80 for the past 5 minutes.
Autoscale: Contains the record of any events related to the operation of the autoscale engine based on any autoscale settings you have defined in your subscription. An example of an Autoscale event is Autoscale scale up action failed.
Recommendation: From advisor
Security: This includes any security alerts generated by Azure Defender for Servers.
Policy: Whenever Azure Policy is evaluated, the effect action will be logged in this category.

Metrics¶

Metrics are numerical values that are ingested from Azure resources used to represent the state of the system at a particular point in time.

Metric Example

For a virtual machine, the metrics available will be CPU Percentage, Network In, Network Out, Memory, etc. On the other hand, if you take a storage account, the available metrics will be Number Of Requests, Number Of Failed Requests, Number Of API Calls, etc.

Short time based data.
Frequently updated
Near real-time data
Visualizations via Metrics Explorer

Distributed Tracing¶

In monolithic architectures, we've gotten used to debugging with call stacks. Call stacks are brilliant tools for showing the flow of execution (Method A called Method B, which called Method C), along with details and parameters about each of those calls. This technique is great for monoliths or services running on a single process. But how do we debug when the call is across a process boundary, not simply a reference on the local stack?

That's where distributed tracing comes in.

!!! info '' Distributed tracing is the equivalent of call stacks for modern cloud and microservices architectures, with the addition of a simplistic performance profiler thrown in.

Diagnostic Settings¶

They are used to define where the logs and metrics will be stored.

Action groups¶

What are action groups?

An action group is a collection of notification preferences that can be reused in multiple alerts. The

notifications and actions that you define inside the action group will be executed when the alert is fired. You can create multiple action groups with different notification preferences, and these can be used across your alerts.

They can be found in Monitor → Alerts → Action Groups

Action groups consist of two parts: notifications and actions

Notification types¶

Email/SMS: These will work even if Azure is down while other services needs the Azure to be running.
Email Azure Resource Manager; Role You can send email notifications to Azure RBAC roles like Owner, Contributor, Reader, Monitoring Contributor, and Monitoring Reader that are assigned at the subscription scope. All user principals assigned with any of the aforementioned roles will be notified when the alert is triggered. Azure AD group and service principals are excluded from the email notification.

Action types¶

Azure app push notification:
Azure Function: Using serverless compute, you can run small chunks of code when the alert is fired.
Logic App: run a business process
Secure Webhook/WebHook: This is the HTTPS or HTTP endpoint for an external application to communicate.
ITSM: You can integrate your ITSM tools like ServiceNow so that whenever an alert is triggered, the corresponding ticket will be created in the ITSM tool.
Automation runbook:
Event Hub: Ingest event to other systems.

Insights¶

This is service specific monitoring feature in Azure

VM Insights¶

Used to monitor VM and VMSS.
This is also called as Azure monitor for VMs
Require Log Analytics Agent to be installed

Network Insights¶

No agent installation required for this.

Container Insights¶

Used to monitor containers

App Insights (for Devs)¶

Application Insights is an extension of Azure Monitor and provides Application Performance Monitoring (also known as “APM”) features. APM tools are useful to monitor applications from development, through test, and into production.

What to use for logging?

In addition to collecting Metrics and application Telemetry data, which describe application activities and health, Application Insights can also be used to collect and store application trace logging data.

Features¶

Metrics and alerts
Application Map
Profiler
Usage Analytics

Instrumentation¶

At a basic level, instrumenting is simply enabling an application to capture telemetry.

There are two methods to instrument your application:

Manual instrumentation
Automatic instrumentation (auto-instrumentation)

It can be:

Runtime instrumentation
Build-time instrumentation

Instrumentation key

Key of implementing Instrumentation in application and is stored in app insights resource.

Auto-instrumentation¶

It's an agent for App insights. Auto-instrumentation enables telemetry collection through configuration without touching the application's code. Although it's more convenient, it tends to be less configurable. It's also not available in all languages

The Application Insights agent or SDK pre-processes telemetry and metrics before sending the data to Azure where it's ingested and processed further before being stored in Azure Monitor Logs.

Network Watcher¶

It is a regional service which is used to monitor networks. It can monitor IaaS but not PaaS.

Azure Network Watcher provides tools to monitor, diagnose, view metrics, and enable or disable logs for resources in an Azure virtual network. Network Watcher is designed to monitor and repair the network health of IaaS (Infrastructure-as-a-Service) products including Virtual Machines (VM), Virtual Networks, Application Gateways, Load balancers, etc.

NSG flow logs

NSG flow logs is a feature of Azure Network Watcher that allows you to log information about IP traffic flowing through an NSG. These logs will show inbound and outbound flows on a per-rule basis.

Monitoring Tools¶

Topology map¶

As resources are added to a virtual network, it can become difficult to understand what resources are in a virtual network and how they relate to each other. The topology capability enables you to generate a visual diagram of the resources in a virtual network and the relationships between the resources

Connection Monitor¶

Monitor connectivity between Azure resources on Network.

Network Performance Monitor¶

Monitor Network performance, connectivity between Vnets, ExpressRoute etc.

Diagnostic Tools¶

Next Hop¶

Next Hop is used to ensure if the traffic is getting routed to the expected destination. Ideally, this will be useful in scenarios where you will be using user-defined routes (UDRs) to verify if the routing rules are working

By using Next Hop, you can easily find which route table is used for routing the traffic from a source to destination

IP Flow verify¶

IP Flow Verify can be used to quickly troubleshoot connectivity issues from or to a remote IP address from a local IP address.

Example

When you create a VM, there will be a default NSG that will be assigned to the VM. Let’s assume that even after opening the ports you are not able to connect to the VM remotely via RDP. To understand which rule is blocking the connectivity from the remote IP to the VM, you can use IP Flow Verify

Effective security rules¶

As you know, you can apply an NSG at the subnet level and at the NIC level. Sometimes this can get complicated and with the help of effective security rules will be capable of finding the effective rules applied on the traffic.

Packet Capture¶

Capture packets from VM for analysis when a condition is met.

Connection Troubleshoot¶

Using Connection Troubleshoot, you can check the connectivity from a virtual machine to another VM, FQDN, URI, or IPv4 address.

VPN Diagnostic¶

Used to diagnose/troubleshoot VPN related issues.

ITSMC¶

IT Service Management Connector (ITSMC) allows you to connect Azure to a supported IT Service Management (ITSM) product or service. Azure services like Azure Log Analytics and Azure Monitor provide tools to detect, analyze, and troubleshoot problems with your Azure and non-Azure resources. But the work items related to an issue typically reside in an ITSM product or service. ITSMC provides a bidirectional connection between Azure and ITSM tools to help you resolve issues faster.

supported ITSMC's

ITSMC supports connections with the following ITSM tools: - ServiceNow - System Center Service Manager - Provance - Cherwell.

Performance Counters¶

Performance counters in Windows and Linux provide insight into the performance of hardware components, OS, and applications. Azure Monitor can collect performance counters from Log Analytics agents such as waagent (Azure Linux Agent) at frequent intervals for Near Real Time (NRT) analysis in addition to aggregating performance data for longer term analysis and reporting.

Misc Notes¶

Heartbeat table can tell which systems have not sent heartbeat in last x minutes.
Auth event is part of syslog table.
The Network Watcher connection troubleshoot provides the capability to check a direct TCP connection from a virtual machine to a virtual machine (VM), fully qualified domain name (FQDN), URI, or IPv4 address.
Activity log saves data for 90 days.
Performance counters are stored in perf table.
Usage Table; it stores Hourly usage data for each table in the workspace.
Log analytics is billed for data ingestion and data retention.
31 days of data retention is free in Log analytics
NSG flow logs are stored in JSON format.