Skip to content

Cloud Skills Challenge - DevOps Engineer

Introduction

Deep dive into the DevOps Engineer challenge from Microsoft Learn.

This challenge is a personal way of staying updated in a neutral context. I use Azure and GitHub every day as part of my job, but - like everyone - I can improve my knowledge of these platforms and their best practices. Besides, my daily usage is driven by my company’s needs, which can be far from best practices and scope on specific topics.

Here are my notes. I capture what I consider interesting and make a quick summary. Feel free to use them as you need.

Modules

Capture Web Application Logs with App Service Diagnostics Logging

Link

  • File system log:
    • Windows supports Azure Blob Storage as a logging target.
    • Contains Stderr and Stdout outputs except for Windows ASP.NET and ASP.NET Core applications, which are managed by IIS.
    • If file system logs on Windows is configured, it is automatically reset to off after 12 hours for “performance reasons”: This is not usable in production.
    • The Quota and Retention period are available for Azure Blob Storage and Linux file system logs.
    • Storage location:
      • Windows & File System: Virtual drive in D:\Home.
      • Windows & Azure Blob Storage: Storage in year, month, date and time hierarchies.
      • Linux: Docker log files accessible via SSH.
    • Windows storage access: Az CLI, Kudu and the storage browser depending on log destination.
  • Azure Application Insight:
    • More info: Monitoring, performance, designed for production.
    • SDK-based, so it requires configuration.
    • Billable service.
    • Doc.
  • Log feeds panel:
    • Real-time analysis.
    • Connection to a single instance, so not useful with a multi-instance application - and so production -.
    • Accessible with curl but requires FTPS credentials.

A good introduction to logs on Azure Web Application, but too Windows-centric.

Control and organize Azure resources with Azure Resource Manager

Link

  • Resource group organization: resource type, lifecycle, RBAC, price.
  • Tags are essential for finance, organization, metadata, automation, etc.
  • Use of policies: application of standards, organization, impact on resources.
  • RBAC: use, scope, best practices.
  • Resource locking: read-only, delete. Be careful with side effects! Ex: the storage account list key is no longer possible with a read-only lock.

Back to the basics! Quick summary of some Azure services.

Deploy Spring microservices to Azure

Link

  • Create the Spring Cluster.
  • Create a GitHub repository.
  • Create a MYSQL database and integrate it into the Spring Boot Cluster.
  • Create a Spring Cloud gateway.
  • Short introduction to distributed tracing provided by default.
  • Scaling the application from the Azure portal. Unfortunately, there’s nothing about monitoring here and no introduction to the subject. Before scaling, a user needs to know what’s going on!

This module shows how PAAS services can quickly expose an application to the end user, but there needs to be something on troubleshooting or monitoring. I’d appreciate some monitoring content with the distributed tracing unit.

Microsoft Azure Well-Architected Framework - Performance efficiency

Link

  • Identify the target: what a platform can offer vs what the company needs.
  • Identify potential growth and impact on the application.
  • Forecast capacity requirements.
  • POC and test application and infrastructure behavior.
  • Verify that the infrastructure can deliver the capacity.
  • Test performance in development and production.
  • Reproducible performance control process throughout the lifecycle.
  • Allocate technical time to debt and performance.
  • Azure Application Insights can help find hot spots.

A high-level module on performance problems, including a gentle introduction to the DevOps shift-left methodology. Debt, performance, and reliability are linked and well managed, helping to maintain velocity and anticipate problems. As tests, they need to be addressed as early as possible, and this is only possible with tools and processes.

Microsoft Azure Well-Architected Framework - Operational excellence

Link

  • Introduction to project management with Azure DevOps. There is nothing on the GitHub project; the doc must be too old.
  • Continuous improvement with feedback, knowledge sharing, retrospective and success sharing.
  • A/B testing, data-driven decision making.
  • Introduction to SCRUM, tools and hygiene.
  • Keeping alerts and telemetry relevant.
  • Introduction to infrastructure as code.
  • Importance of automation: simple workflows, design for automation, computerized standards.
  • Feature flags: limit exposure of new features for metrics and easy deactivation.

What’s interesting about this module is the importance of project management. Technology isn’t everything; process and knowledge are essential.

Analyze your Azure infrastructure by using Azure Monitor logs

Link

  • Data collected:
    • Application data: Data relating to your customized application code.
    • Operating system data: Data from the Windows or Linux virtual machines hosting your application.
    • Azure resource data: Data relating to the operations of an Azure resource, such as a web application or a load balancer.
    • Azure subscription data: Data relating to your subscription. This includes data on Azure status and availability.
    • Azure tenant data: Data relating to your Azure services at the organization level, such as Microsoft Entra.
  • Activation of logging and diagnostic agent with Log Analytics Workspace.
  • Logs: Time-stamped information organized in the form of records. The type of event and quantity are difficult to predict.
  • Metrics: Numerical values describing an aspect of the system at a given time. Constant over time, the quantity is easy to predict.
  • Introduction to Kusto.

Good overview of the Azure Monitor service. The exercises with Kusto are good, but a dedicated module might be better. This language is a pillar of the platform, and even though it’s simple and close to SQL, it takes time to master.

Monitor cloud resources

Link

  • Logs: Permanent, immutable records containing information on changes, tasks, and errors. They are intended to be read by a human being in their current state.
    • Log management platform: Analyzes and creates a response based on content. It provides correlation, normalization, and reporting.
    • Essential for fault diagnosis and prediction.
  • Metrics: Represent a system’s relative health, stability, and availability. It is a quantitative value.
    • Correlation-based: Metrics are based on the composition and comparison of ratios.
    • Can be high-level: user session duration, user sentiment.
    • Part of service delivery.
  • Traces: Records of execution paths between services.
    • They enable tracking low-level calls between components using graphs and performance.
    • Useful for analyzing and tracing errors and their impact.
  • Application Performance Management (APM) is agent-based or agentless:
    • The agent can be an SDK in a Web page. It’s a tool that retrieves data from where it is. Messages between agents are machine-to-machine oriented and can lead to huge amounts of data that are difficult to manage. The protocol and use of data need to be analyzed.
    • Agentless retrieves metrics and logs from the outside world in a human format. Generally, it is based on logs and passive data.
  • Indicator and correlation measurements:
    • Simple: CPU, inactivity, garbage collection invocation, error rate, response time.
    • Complex:
      • Saturation point: The point at which the system begins to enter a bad state by increasing the waiting times, the size of a queue, or error messages impacting the end user.
      • Application performance index (Apdex): A measure ranging from 0 (satisfied) to 100 (frustrated) based on an application’s response time from the end-users point of view.
    • Framework:
  • Remediation planning or how to fix when something goes or will go wrong:
    • Ticketing system or IT operations management (ITOM) integrates with alerts and notifications.
    • KPI monitoring and alerts:
      • Activity indicators.
      • Mean time to detection (MTTD).
      • Mean time to resolution (MTTR).
      • Percentage of system impact on problems.
  • Daily remediation:
    • Nothing is normal anymore; services are constantly changing.
    • Continuously improve the system to stay ahead of saturation and keep it agile and understandable.

A key module on monitoring and how to define it within your organization.

React to state changes in your Azure services by using Event Grid

Link

  • Event Grid:
    • Event-based routing service.
    • Events are updated infrastructures that can be filtered and customized.
    • 24-hour retries to ensure message delivery.
    • Webhook manager for calling an endpoint outside Azure.
  • Logic Apps:
    • Can be triggered by the Event Grid.
    • Designer and JSON view.
    • Allows logical flow and conditions.
    • Plenty of connectors.

Interesting exercise using Event Grid with the Logic App on a good use case.

Design a full-stack monitoring strategy on Azure

Link

  • Full stack monitoring: monitoring of infrastructure and services (with Azure Monitor), applications (with Application Insight) and security (with Microsoft Defender and Sentinel).
  • Microsoft Defender for Cloud:
    • Integrated natively into the platform, it centralizes security information in one place
    • Connects to a Log Analysis Workspace to capture and analyze logs.
    • Analyzes data, network, application security, identity, and access.
    • Recommendation based on scoring simplifying prioritization. Ex: apply MFA.
    • Enable just-in-time (JIT) access to virtual machines to authorize access to virtual machines for a limited period and audit them.
    • Adaptive Application Control to verify the conformity of processes running on a virtual machine. If an unusual process is running, trigger an alert.
    • Suggest remedies, alerts, and preventive measures.
  • Microsoft Sentinel:
    • Thread search, alerts and proactive responses based on user and platform logs.
    • Multi-cloud and on-premise solution.
    • Plethora of connectors, including Microsoft Entra and Office 365.
    • Customized or integrated binder and alerts.
    • Incident investigation and management.
  • Application Insight is used to monitor application performance, availability, but also the user behavior.
  • Azure Monitor insights hub for monitoring resources on the platform:
    • VM insights for monitoring at scale across subscriptions and obtaining processes and network topology.
    • Container insights for AKS across subscriptions and get metrics, logs and performance.
    • Prometheus support for PromQL querying and Graphana display.
  • Azure Monitor can be used with managed Graphana.
  • All collected information can be used to create alert rules.

The security topic is new, but much of the monitoring information is redundant with previous modules.

Introduction to GitHub

Link

  • Code repository with integrated tools for code security, CI/CD, AI, wiki, etc.
  • Code snippets with gist.
  • Git flows with branches and commits, but also GitHub features based on pull requests and reviewers.
  • Collaboration with issues, pull requests, and discussions.
  • Keeping updated with a notification system on users, repository and organisation.
  • GitHub Pages for static hosting.

Overall GitHub guide. It’s a nice tutorial that is really useful for those new to the platform.

Migrate your repository by using GitHub best practices

Link

  • GitHub is a cloud-based solution, and everything on GitHub must be considered compromised.
  • History can be kept.
  • Support Large File is needed via Git LFS.
  • Understand collaboration files as a Gitignore or README.md to simplify the collaboration. Here some best practices.

Getting started, practice around GitHub.

Upload your project by using GitHub best practices

Link

  • Import project in GitHub.
  • Add metadata on the project as email for commits.
  • Import or create a project with the GitHub website, the Git command or the GitHub CLI.

The migration example from another code management platform is interesting.

Manage repository changes by using pull requests on GitHub

Link

  • Pull Requests and review as a communication tool.

The module focuses on GitHub practices around Pull Request and processes.

Settle competing commits by using merge conflict resolution on GitHub

Link

  • Git branch and rebase management.

Basic good practices around Git and the integration in GitHub.

Search and organize repository history by using GitHub

Link

  • Search on GitHub in repository and pull requests:
  • Links between PR, issues, comments based on tag and keywords.

Common standard GitHub usage.

Manage an InnerSource program by using GitHub

Link

  • Using OpenSource best practices but in companies context by using InnerSource.
    • Keeping transparency, communication, and history.
    • Reduce friction by enabling self-doing from one team to another and even repository fork if needed.
    • Standardize practices.
  • Repository permissions:
    • Read level is recommended for non-code contributors who want to view or discuss the project.
    • Triage level is recommended for contributors who need to proactively manage issues and pull requests without write access.
    • Write level is recommended for contributors who actively push to the project.
    • Maintain level is recommended for project managers who manage the repository without access to sensitive or destructive actions.
    • Admin level is recommended for people who need full access to the project, including sensitive and destructive actions like managing security or deleting a repository.
  • Readme.md, CODEOWNER.md, and CONTRIBUTING.md can be put in the .github folder. They are essential to retrieve information at scale.
  • Standardize processes using templates for Pull Requests and Issues.

Great module on how to use InnerSource and measure if it’s working.

Communicate effectively on GitHub using Markdown

Link

  • Broad description of the Markdown language.
  • Introduction to the GitHub-Flavored Markdown (GFM).
  • Reference using a URL, the syntax #, GH- or Username/Repository# for Pull Request and Issues. Similar syntax for commit and users.
  • Commands based on /command as /code or /details.

Good review of the Markdown/GFM on GitHub to be more efficient on day-to-day tasks.

Maintain a secure repository by using GitHub best practices

Link

  • Handle the security at the beginning of the project.
  • Security is like deployment or tests; it must shift left.
  • Use the Security tab of the repository:
    • Security policies allow you to specify how to report a security vulnerability in your project by adding a SECURITY.md file to your repository.
    • Dependabot alerts notify you when GitHub detects that your repository uses a vulnerable dependency or malware.
    • Security advisories that you can use to privately discuss, fix, and publish information about security vulnerabilities in your repository.
    • Code scanning that helps you find, triage, and fix vulnerabilities and errors in your code with CodeQL.
    • Secret scanning to prevent credential leaks. Remove the secret from the history by rewriting it, then contact GitHub to clear some cache. Of course, don’t forget to revoke the leaked credentials.
  • Use .gitignore for sensitive files.
  • Assume your GitHub account is compromised.
  • Use branch protection rules.
  • Use code owner file.

Good inputs on the security shift left. However, a deep dive into CodeQL would have been appreciated.

Automate DevOps processes by using GitHub Apps

Link

  • Use API to extend GitHub functionalities.
  • Permission can be at the repository level.
  • Oauth App:
    • The application acts on behalf of a user.
    • Consume a seat on GitHub Orga.
    • For reading, writing, and modifying user data.
  • GitHub App:
    • Install on personal, orga or repository.
    • Need admin access.
    • Does not consume GitHub seat.
    • Need token.
    • Customize permissions.
    • Support auth on behalf of the user as Oauth App.
  • Events can be:
    • GitHub webhooks.
    • Polling.
  • Github tokens:
    • GitHub personal access tokens (ghp):
      • Individual token for user.
      • Can be fined-grained.
    • Device token (gup):
      • Machine version of PAT.
      • Mainly for runners.
    • GitHub Application Installation tokens (ghs):
      • Valid for a short period.
      • Perfect for installation actions.
    • OAuth access tokens (gho):
      • For headless CLI.
      • Can be acquired using the web application flow.
      • Behalf the user.
    • Refresh tokens (ghr):
      • OAuth token refreshed.
  • There are rate limits on token delivery. GitHub allows to monitor and manage it.

GitHub App is a good topic and market for numerous tools. The exercise based on a video is interesting and fits well for this module.

Automate GitHub by using GitHub Script

Link

  • Based on Octokit.
  • Provide an Octokit client authentication on the repository.
  • Another way to interact with GitHub APIs and act on them.

Ex:

- uses: actions/github-script@0.8.0
with:
github-token: ${{secrets.GITHUB_TOKEN}}
script: |
github.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: "🎉 You've created this issue comment using GitHub Script!!!"
})

Each organization creates its own processes, and GitHub Script helps a lot in implementing them.

Manage software delivery by using a release based workflow on GitHub

Link

  • Using GitHub Project and issues to create sprint.
  • Sprints are time-boxed periods to produce incremental changes.
  • Milestones are similar to project tracking but focused on product features.
  • Long-lived branches should mainly be for released and must come from the main branch. Also, they must be protected against modifications or deletions.
  • Use git cherry-pick to apply specific commits from one branch to another and update releases.
  • Releasing to consumers must be based on GitHub releases.

Managing project releases is simplified by Git, GitHub Releases, and GitHub Project, and this module clearly explains how to do it correctly.

Build continuous integration (CI) workflows by using GitHub Actions

Link

  • GitHub Action automated tasks on a repository and its content based on events.
  • An Artifact is a job output besides logs that can be kept and stored.
  • Environment variables for workflows can be provided:
    • At the GitHub level in secrets and variable - by someone with permission -.
    • At the workflow and job level - defined by the developer -.
    • By GitHub directly.
  • Customs scripts for the CI can be put directly in the .github/scripts folder.
  • Reproducible action in the workflow - as installing dependencies - can be reused between workflows to reduce the CI time thanks to cache.
  • More debugging information can be provided at the runner or step level by defining in GitHub secrets respectively ACTIONS_RUNNER_DEBUG and ACTIONS_STEP_DEBUG to true.
  • Logs are accessible from the GitHub UI or by API.

This module introduces the GitHub Action syntax and the possibilities offered. I’m surprised it comes after the module Automate GitHub by using GitHub Script, which needs GitHub workflow.

Build and deploy applications to Azure by using GitHub Actions

Link

  • GitHub Action can upload a container image on Azure using pre-build actions on the GitHub marketplace.
  • GitHub Action can also interact with resources on cloud providers such as Azure and can create, modify, or delete cloud resources.
  • Workflow status badges can provide visibility:
    • They can display information on any branch or event on the repository.
    • They can be integrated into a website, not only GitHub.
    • Ex: ![example branch parameter.](https://github.com/mona/special-octo-eureka/actions/workflows/grading.yml/badge.svg?branch=my-workflow)
  • Workflow can use environments with custom secrets and variables. As branches, they can be protected with reviewers or delayed with a timer.

This module only talks about authentication between Azure and GitHub based on tokens, forgetting tokenless solution as OpenId.

Implement a code workflow in your build pipeline by using Git and GitHub

Link

This module is a complete project showing the integration between Azure DevOps - the board and pipeline - and GitHub. It’s interesting to understand how to use both and how both platforms provide similar or additional features.

Summary

This challenge is perfect for a new GitHub and Azure user. My initial goal was a success; I learned. However, what I’ve learned pales in comparison to the time spent. If you’re already familiar with Azure and GitHub, I hope this article will help you choose the most useful modules for you and save you time.