Azure Monitor ¶
It delivers a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. This information helps you understand how your applications are performing and proactively identify issues that affect them and the resources they depend on.
States: - Alert State: set by user such as admin - Monitor state: Set by system
Log Analytics Workspace¶
A Log Analytics workspace
is a unique environment for log data from Azure Monitor
and other Azure services, such as Microsoft Sentinel
and Microsoft Defender
for Cloud. The connected sources, configuration, and the repository are managed per workspace.
Logs¶
- Its an event-based data. Example: Syslog in Linux is an example of a log, as the log data is not consistent and the format may vary from source to source.
- Free form or structured
- Stored in
logs analytics workspace
- Analysis via
Kusto Query Language (KQL)
: The first step in writing queries is to understand which table contains the information you need. Some examples include Event, Syslog, Heartbeat, and Alert.
search in (Event) "error"
Event | search "error"
Event | where EventType == "error"
Perf
| summarize AggregatedValue = count() by CounterPath
Heartbeat
| where TimeGenerated > startofday(ago(31d))
| summarize nodes = dcount(Computer) by bin(TimeGenerated, 1d)
| render timechart
Log Analytics¶
A service for aggregating the log data in a single pane so that it can be analyzed, visualized and queried via KQL
Agents¶
Azure Monitor Agent (AMA) collects monitoring data from the guest operating system of Azure and hybrid virtual machines and delivers it to Azure Monitor for use
- Windows agents
- Linux agents
Cost control using agents
With the help of the agents configuration
, you will be able to declare what logs you want to collect using the agents and what level of logging information you need. In this way, you will have granular control over what is getting ingested into your workspace.
Various logs used are:
Windows Event Logs
: This helps you to select which event log items you want to ingest to the workspaceWindows Performance Counters
: You can select performance counters of Windows servers and the sample rate.Linux Performance Counters
: These are performance counters for Linux servers and their sample rate.Syslog
: Control which facilities in Syslog you want to ingest.IIS Logs
: This enables collection ofW3C format log files
from IIS server.
DCR¶
Azure Monitor Agent
uses data collection rules (DCR)
, where you define which data you want each agent to collect. Data collection rules let you manage data collection settings at scale and define unique, scoped configurations for subsets of machines. You can define a rule to send data from multiple machines to multiple destinations across regions and tenants.
Data retention¶
Data in each table in a Log Analytics workspace
is retained for a specified period of time after which it's either removed or archived with a reduced retention fee.
Activity log¶
With the help of an activity log, you can get insights into different operations occurring at the subscription level.
Categories¶
As Azure Activity Log is a subscription-wide logging system, Azure has divided the logs ingested into different categories.
Administrative
: Contains the record of all create, update, delete, and action operations performed through Resource Manager. Examples of Administrative events include create virtual machine and delete network security group.Service Health
: Contains the record of any service health incidents that have occurred in Azure. An example of a Service Health event SQL Azure in East US is experiencing downtime.Resource Health
: Contains the record of any resource health events that have occurred to your Azure resources. An example of a Resource Health event is Virtual Machine health status changed to unavailable.Alert
: Contains the record of activations for Azure alerts. An example of an Alert event is CPU % on myVM has been over 80 for the past 5 minutes.Autoscale
: Contains the record of any events related to the operation of the autoscale engine based on any autoscale settings you have defined in your subscription. An example of an Autoscale event is Autoscale scale up action failed.Recommendation
: From advisorSecurity
: This includes any security alerts generated by Azure Defender for Servers.Policy
: Whenever Azure Policy is evaluated, the effect action will be logged in this category.
Metrics¶
Metrics are numerical values
that are ingested from Azure resources used to represent the state of the system at a particular point in time.
Metric Example
For a virtual machine, the metrics available will be CPU Percentage
, Network In
, Network Out
, Memory
, etc. On the other hand, if you take a storage account
, the available metrics will be Number Of Requests
, Number Of Failed Requests
, Number Of API Calls
, etc.
- Short time based data.
- Frequently updated
- Near real-time data
- Visualizations via
Metrics Explorer
Distributed Tracing¶
In monolithic architectures
, we've gotten used to debugging with call stacks. Call stacks are brilliant tools for showing the flow of execution (Method A
called Method B
, which called Method C
), along with details and parameters about each of those calls. This technique is great for monoliths or services running on a single process. But how do we debug when the call is across a process boundary, not simply a reference on the local stack?
That's where distributed tracing comes in.
!!! info ''
Distributed tracing
is the equivalent of call stacks for modern cloud and microservices architectures
, with the addition of a simplistic performance profiler thrown in.
Diagnostic Settings¶
They are used to define where the logs and metrics will be stored.
Action groups¶
What are action groups?
An action group is a collection of notification preferences that can be reused in multiple alerts. The
notifications and actions that you define inside the action group will be executed when the alert is fired. You can create multiple action groups with different notification preferences, and these can be used across your alerts.
They can be found in Monitor
→ Alerts
→ Action Groups
Action groups consist of two parts: notifications
and actions
Notification types¶
Email/SMS
: These will work even if Azure is down while other services needs the Azure to be running.Email Azure Resource Manager
; Role You can send email notifications to Azure RBAC roles like Owner, Contributor, Reader, Monitoring Contributor, and Monitoring Reader that are assigned at the subscription scope. Alluser principals
assigned with any of the aforementioned roles will be notified when the alert is triggered.Azure AD group
andservice principals
are excluded from the email notification.
Action types¶
Azure app push notification
:Azure Function
: Using serverless compute, you can run small chunks of code when the alert is fired.Logic App
: run a business processSecure Webhook/WebHook
: This is the HTTPS or HTTP endpoint for an external application to communicate.ITSM
: You can integrate your ITSM tools likeServiceNow
so that whenever an alert is triggered, the corresponding ticket will be created in the ITSM tool.Automation runbook
:Event Hub
: Ingest event to other systems.
Insights¶
This is service specific monitoring feature in Azure
VM Insights¶
- Used to monitor VM and VMSS.
- This is also called as Azure monitor for VMs
- Require
Log Analytics Agent
to be installed
Network Insights¶
- No agent installation required for this.
Container Insights¶
- Used to monitor containers
App Insights (for Devs)¶
Application Insights
is an extension of Azure Monitor
and provides Application Performance Monitoring (also known as “APM”) features. APM tools are useful to monitor applications from development, through test, and into production.
What to use for logging?
In addition to collecting Metrics
and application Telemetry data
, which describe application activities and health, Application Insights can also be used to collect and store application trace logging data
.
Features¶
- Metrics and alerts
- Application Map
- Profiler
- Usage Analytics
Instrumentation¶
At a basic level, instrumenting
is simply enabling an application to capture telemetry.
There are two methods to instrument your application:
- Manual instrumentation
- Automatic instrumentation (auto-instrumentation)
It can be:
- Runtime instrumentation
- Build-time instrumentation
Instrumentation key
Key of implementing Instrumentation in application and is stored in app insights
resource.
Auto-instrumentation¶
It's an agent for App insights. Auto-instrumentation
enables telemetry collection through configuration without touching the application's code. Although it's more convenient, it tends to be less configurable. It's also not available in all languages
The Application Insights agent
or SDK
pre-processes telemetry and metrics before sending the data to Azure where it's ingested and processed further before being stored in Azure Monitor Logs
.
Network Watcher¶
It is a regional service
which is used to monitor networks. It can monitor IaaS
but not PaaS
.
Azure Network Watcher
provides tools to monitor, diagnose, view metrics, and enable or disable logs for resources in an Azure virtual network. Network Watcher is designed to monitor and repair the network health of IaaS (Infrastructure-as-a-Service) products including Virtual Machines (VM), Virtual Networks, Application Gateways, Load balancers, etc.
NSG flow logs
NSG flow logs
is a feature of Azure Network Watcher that allows you to log information about IP traffic flowing through an NSG. These logs will show inbound and outbound flows on a per-rule basis
.
Monitoring Tools¶
Topology map¶
As resources are added to a virtual network, it can become difficult to understand what resources are in a virtual network and how they relate to each other. The topology capability enables you to generate a visual diagram of the resources in a virtual network and the relationships between the resources
Connection Monitor¶
Monitor connectivity between Azure resources on Network.
Network Performance Monitor¶
Monitor Network performance, connectivity between Vnets, ExpressRoute etc.
Diagnostic Tools¶
Next Hop¶
Next Hop
is used to ensure if the traffic is getting routed to the expected destination. Ideally, this will be useful in scenarios where you will be using user-defined routes
(UDRs) to verify if the routing rules are working
By using Next Hop, you can easily find which route table is used for routing the traffic from a source to destination
IP Flow verify¶
IP Flow Verify
can be used to quickly troubleshoot connectivity issues from or to a remote IP address from a local IP address.
Example
When you create a VM, there will be a default NSG that will be assigned to the VM. Let’s assume that even after opening the ports you are not able to connect to the VM remotely via RDP. To understand which rule is blocking the connectivity from the remote IP to the VM, you can use IP Flow Verify
Effective security rules¶
As you know, you can apply an NSG at the subnet level and at the NIC level. Sometimes this can get complicated and with the help of effective security rules will be capable of finding the effective rules applied on the traffic.
Packet Capture¶
Capture packets from VM for analysis when a condition is met.
Connection Troubleshoot¶
Using Connection Troubleshoot, you can check the connectivity from a virtual machine to another VM, FQDN, URI, or IPv4 address.
VPN Diagnostic¶
Used to diagnose/troubleshoot VPN related issues.
ITSMC¶
IT Service Management Connector
(ITSMC) allows you to connect Azure to a supported IT Service Management (ITSM) product or service. Azure services like Azure Log Analytics
and Azure Monitor
provide tools to detect, analyze, and troubleshoot problems with your Azure and non-Azure resources. But the work items related to an issue typically reside in an ITSM product or service. ITSMC provides a bidirectional connection between Azure and ITSM tools to help you resolve issues faster.
supported ITSMC's
ITSMC supports connections with the following ITSM tools: - ServiceNow - System Center Service Manager - Provance - Cherwell.
Performance Counters¶
Performance counters
in Windows and Linux provide insight into the performance of hardware components, OS, and applications. Azure Monitor
can collect performance counters from Log Analytics agents
such as waagent (Azure Linux Agent)
at frequent intervals for Near Real Time (NRT) analysis
in addition to aggregating performance data for longer term analysis and reporting.
Misc Notes¶
- Heartbeat table can tell which systems have not sent heartbeat in last x minutes.
- Auth event is part of syslog table.
- The
Network Watcher connection troubleshoot
provides the capability to check a direct TCP connection from a virtual machine to a virtual machine (VM), fully qualified domain name (FQDN), URI, or IPv4 address. - Activity log saves data for 90 days.
- Performance counters are stored in
perf
table. - Usage Table; it stores
Hourly usage data
for each table in the workspace. Log analytics
is billed for data ingestion and data retention.- 31 days of data retention is free in
Log analytics
- NSG flow logs are stored in JSON format.