• 5 min read

High availability solutions on Microsoft Azure by SLES for SAP Applications

In today’s business world, service availability and reliability are key to a successful digital transformation. SUSE and Microsoft have been working closely to provide a trusted path to SAP Solutions in the cloud, including solutions to reduce unplanned and planned downtime.

This post was co-authored with Sherry Yu, Director of SAP Success Architect, SUSE.

In today’s business world, service availability and reliability are key to a successful digital transformation. Extensive downtime not only costs a business revenue and productivity, but may also cause reputational damage. SUSE and Microsoft have been working closely to provide a trusted path to SAP Solutions in the cloud, including solutions to reduce unplanned and planned downtime.

SUSE and Microsoft work together

SUSE is the leader in SAP Solutions, especially the developer of high availability (HA) solutions. HA Solutions are first tested, supported on-premises, and documented in the official configuration guides that are published on SUSE’s site. Microsoft tests the solutions in Azure’s infrastructure, tunes the settings and configurations, then releases Azure-specific HA configuration guides on Microsoft’s documentation site. Microsoft actively provides feedback and requests support for new scenarios from SUSE. The working process can be summarized in the chart below. It’s been a smooth collaboration between SUSE and Microsoft to support customers’ digital transformation journey.

Representation of the collaboration process used by SUSE and Microsoft to innovate, test and release support for high availability solutions for SAP.

Solutions to reduce unplanned downtime

High availability solutions—that can prevent 24/7 SAP systems from being disrupted by various issues caused by hardware, network, and applications—are commonly based on cluster technologies. Pacemaker is an open source cluster, used by various HA solutions.
For SAP HANA, the HA solutions are based on HANA System Replication (HSR). SUSE has developed resource agents to automate the failover of HANA System Replication in scale-up and scale-out scenarios.

For SAP S/4HANA and NetWeaver, the HA solutions are based on ASCS/ERS enqueue replication. SUSE’s HA solutions for ENSA1 and ENSA2 are both certified by SAP HA-Interface certification. Recently SUSE released a new architecture called Simple Mount File System, that reduces the complexity of the Pacemaker configuration for SAP ASCS/ERS architecture. It’s also SAP HA-Interface certified. Microsoft was the first cloud provider to release a configuration guide for SAP ASCS/ERS simple mount structure.

HA for SAP ASCS/ERS on Azure with SLES for SAP Applications

The following configurations are supported on Azure based on ASCS/ERS Enqueue Replication, ENSA1 and ENSA2, respectively:

The paragraph below outlines the major differences between the various scenarios:

Architectural diagram for high availability for SAP ASCS/ERS on Linux. It highlights the common elements of the HA solution (like Pacemaker cluster, ILB and so on), and depicts the different options for SAP shared directories –NFS on Azure files, or NFS on ANF or NFS cluster built by the customer.

New simple mount architecture for SAP ASCS/ERS on Azure VMs with NFS

This is a new architecture to simplify the management of shared file systems on NFS. Instead of using a FileSystem resource agent to manage the shared file systems by the cluster, shared file systems are managed by the OS and mounted at boot time. A new resource agent SAPStartSrv was created to control the start and stop of the SAP start framework of each SAP instance. The benefit is a more robust cluster architecture.

Representation of Pacemaker cluster resources in high availability configuration SAP ASCS/ERS. It shows the classic versus simple mount configuration.

This solution has been tested and released on Microsoft Azure with the official configuration guide published.

HA for SAP HANA on Azure with SLES for SAP Applications

HA Solutions for SAP HANA are based on HANA System Replication (HSR) in scale-up and scale-out. The following scenarios are supported on Azure:

Scenario

SLES for SAP Applications

 

Scale-up HSR + Pacemaker

High availability of SAP HANA on Azure VMs on SLES – Azure Virtual Machines | Microsoft Docs

• Basic HANA scale-up and HSR

• Can be used with NFS-mounted file systems

• Doesn’t include more resilient pacemaker configuration to handle loss of NFS mounts

Scale-up HSR with NFS-mounted file systems

High availability of SAP HANA Scale-up with ANF on SLES – Azure Virtual Machines | Microsoft Docs

• Additional Pacemaker configuration monitors the NFS file systems

• Loss of access to NFS-mounted files systems (including /hana/shared), triggers failover

Scale-out n+m

(scale-out with stand-by node)

SAP HANA scale-out with standby with Azure NetApp Files on SLES – Azure Virtual Machines | Microsoft Docs

• Requires shared storage (ANF on Azure)

• For /hana/data and /hana/logs only NFSv4.1 supported!

• For /hana/shared NFSv3 or NFSv4.1 is supported

Scale-out HSR + Pacemaker

SAP HANA scale-out with HSR and Pacemaker on SLES – Azure Virtual Machines | Microsoft Docs

• Includes the additional Pacemaker configuration for loss of NFS access

 

There are some considerations in picking the right scenario based on your business needs:

  • How critical is it to minimize downtime in the case of a failover?
  • What is the willingness to increase spend and/or lower downtime, in case of incident?

Azure virtual machine availability overview

Azure offers several compute deployment options and it’s important to understand their differences, especially the SLAs as noted below:

Representation of the Azure virtual machines (VMs) SLA for single VM, VMs deployed in Availability Set, and VMs deployed across Availability zones. Information about Disaster Recovery or DR regional pairs.

Solutions to reduce planned downtime

Planned downtime normally is associated with maintenance of the environment. SUSE and Microsoft present the following solutions to minimize the planned downtime.

For instance, when performing maintenance on a SAP HANA system running in a cluster, whether it’s to upgrade of the OS or apply HANA SPs, it’s recommended to do a rolling update. That means to upgrade the secondary HANA node first, perform a takeover, then upgrade the former primary HANA node. It’s an effective way to reduce the planned downtime to the time necessary to perform a takeover. The same approach can be applied to SAP Central Services in HA configuration.

To keep the SAP systems secure, system admins must apply security patches in a timely manner. Kernel Live Patching is provided by SUSE to effectively help avoid reboots for up to one year. It’s highly practical and recommended for mission-critical HANA systems.

When performing maintenance to the SAP ASCS/ERS running in the cluster, it’s essential to leverage the sap_vendor_cluster_connector that SUSE has developed for the SAP HA-Interface certification, to avoid split-brain. During maintenance, a system admin can stop an SAP ASCS or ERS instance via SAP tools such as sapcontrol or MMC. If the instance is managed by the cluster, via the cluster connector, the cluster will be notified that this is intended and instead of trying to remediate the “failure,” the cluster will not interfere. The HA-Interface helps avoid accidents during planned maintenance windows. You can find the details and an example in this blog.

Accelerate your SAP S/4HANA migration to Azure

SUSE and Microsoft provide solutions to automate, validate and monitor the SAP Landscape:
•    Automation: Microsoft Automation Framework for SAP provides built-in best practices to speed up provisioning and reduce errors. Deployment time is reduced from months or weeks to days. SUSE as a contributing partner provided best practices especially for the HA deployment.
•    Validation: SUSE Project Trento, part of SLES for SAP Applications, provides rule-based autodetection of SAP configuration issues in Azure infrastructure. It can be used as a powerful pre-go-live validation tool to ensure quality. In Day 2 operations it continuously checks the production system to detect deviation and prevent outage.
•    Monitoring: Microsoft Azure Monitor for SAP helps customers gain insights into the SAP landscape, especially HA clusters. The proactive monitoring helps to fix issues before outages happen. Monitor for clusters on SLES for SAP Applications co-developed with SUSE.

SLES for SAP Applications

SLES for SAP Applications is the leading Linux platform for SAP HANA, SAP NetWeaver, and SAP S/4HANA solutions and is an SAP Endorsed App. Two of the many key components of SLES for SAP Applications are the High Availability Extension and Resource Agents. The High Availability Extension provides Pacemaker, an open-source cluster framework. The Resource Agents manage automated failover of SAP HANA System Replication, S/4HANA ASCS/ERS ENSA2, and NetWeaver ASCS/ERS ENSA1. On Microsoft Azure’s marketplace, the PAYG image of SLES for SAP Applications includes Live Patching.

Learn More

Microsoft Azure is an enterprise-class cloud platform optimized for SAP that provides significant cost savings, new insights from advanced analytics, and unmatched security and compliance.

SUSE has more than 20 years in SAP Partnership with more than 130 worldwide benchmarks. Some of the world’s largest SAP workloads run on SUSE on Azure. The first reference architectures for SAP on Azure were SUSE based. As a result of the close collaboration between Microsoft and SUSE, a comprehensive portfolio of HA Solutions for SAP on Azure is available for customers, leveraging the strengths of SUSE on Microsoft Azure.