A highly scalable solution for data replication using Amazon FSx for NetApp ONTAP and NetApp SnapMirror

On premises customers using NetApp storage arrays in their own data centers often have strict network and firewall access control rules in place to secure their data, but this type of security usually introduces Network Address Translation (NAT) into the path between the arrays. ONTAP, either on-premises or in the cloud, requires storage clusters to be configured with static IP addresses, and the SnapMirror™ protocol that is used to replicate data between them doesn’t support NAT between them. This prevents connectivity between FSx for NetApp ONTAP and on-premises NetApp instances that are behind firewalls performing NAT. Users in environments like these cannot easily migrate data between their on-premises environments and Amazon FSx for NetApp ONTAP. Ideally, they would choose to connect via SnapMirror over the public internet, but this would be impossible in a default configuration.

NetApp SnapMirror™ is a commonly used DR, backup, and replication feature of NetApp ONTAP storage, both on-premises and in the cloud. Amazon FSx for NetApp ONTAP includes SnapMirror as part of a fully managed service within AWS. However, since NetApp SnapMirror doesn’t support NAT, and these IPs are verified against the file systems being peered, then an Amazon Elastic Compute Cloud (Amazon EC2) based NAT device fronted with Elastic IPs needs to be deployed to make the L3 headers match between each file system and the internet.

In this post we discuss an architecture and design to streamline and scale this migration challenge in an AWS environment. Another option would have been setting up individual VPN tunnels between the data centers and AWS. However, this would have been challenging to manage at scale. The reason NetApp SnapMirror doesn’t support NAT is because the metadata exchanged during peering, such as Logical Interface (LIF) IPs, are verified against the file systems being peered. If they don’t match, then the connection fails. Given this information, we can say that NAT doesn’t break SnapMirror, but this changing of IPs does. So, what if we can NAT the connections in a way that SnapMirror can still verify those IPs? We just need to make sure that the L3 Headers on the IP packets match, and the most flexible way to do this is with a second NAT.

Solution overview

To make the L3 headers match, we need an NAT device between each file system and the internet. This device can be a Linux instance, firewall, router, or other NAT capable device within your data center, with the appropriate bandwidth. In this post, we deploy a Graviton network optimized EC2 instance in our Amazon Virtual Private Cloud (Amazon VPC). We use a c7gn.medium instance with a 3.5 GB/s network throughput. This instance size can be scaled based on your bandwidth requirements. Given that we have very minimal CPU and memory requirements, this gives us the best network throughput for the cost at the time of this writing.

Prerequisites

The following prerequisites are necessary to complete this solution:

  • An NAT device that SnapMirror passes through at each filer
  • A unique subnet for every file system

An AWS architectural diagram showing two VPCs connected through a self-managed Linux based NAT gateway
 AWS architectural diagram showing two VPCs connected through a self-managed Linux based NAT gateway

Example

The example configuration goes from AWS to AWS for the sake of directness, as shown in the preceding figure, but either side can be replaced with any NAT device that does similar work. Similarly, the example is based on a Single-Availability Zone (AZ) FSx for ONTAP file system. If you’re using a Multi-AZ file system, then we recommend deploying two Amazon EC2 instances in each of the Multi-AZ file system’s AZs and routing traffic through the instance in each AZ.

EIPs

In our example we use one EIP for each FSx for ONTAP inter-cluster interface. To start we immediately request four EIPs not associated with anything yet. Using some port allocations could allow for less IPs to be used, but that is outside the scope of this example.

Security group

Having EIPs connected directly to Amazon EC2 without restrictions is bad posture. Therefore, we created a security group in each VPC and specified all traffic from all four EIPs as allowed. Technically, only TCP 10000, 11104, 11105, and ICMP are necessary, but our router only forwards these ports.The following summarizes the example network configuration for the two FSx for ONTAP deployments. The following IP addresses are for example purposes only, and the values differ in your environment based on your network configurations.

Side A:

VPC: 10.1.0.0/16FSx ONTAP inter-cluster endpoint 1: 10.1.0.137FSx ONTAP inter-cluster endpoint 2: 10.1.0.125inter_1 EIP: 18.190.143.162inter_2 EIP: 3.128.12.212

Side B:

VPC: 10.2.0.0/16FSx ONTAP inter-cluster endpoint 1: 10.2.0.155FSx ONTAP inter-cluster endpoint 2: 10.2.0.110inter_1 EIP: 3.135.134.67inter_2 EIP: 3.146.166.253

Amazon EC2

To handle the NATs, deploy an EC2 instance with RedHat 9 with our previously created security group attached. RedHat isn’t necessary, and any Linux distribution that supports nftables should work for this exercise. For each of the EC2 instances, we need to bind two of our EIPs. Each of these EIPs must be associated with different private IPs. Finally, we must disable source/dest check on the network interface. This allows our Amazon EC2 to send packets sourced from IP addresses that it doesn’t own.

A screenshot of the network summary page of an EC2 instance with the private and public IP addresses highlighted
Screenshot of the network summary page of an EC2 instance with the private and public IP addresses highlighted

nftables

On each of these Linux instances we must add some nftables rules to handle the connections. This creates a 1:1 mapping between the FSx for ONTAP cluster interfaces and an EIP. For our example environment, the nftables config for Side B would be as follows.

Option 1: Edit the nftables file natively

table ip nat {
   chain prerouting {
       type nat hook prerouting priority dstnat; policy accept;
       tcp dport 11104 ip daddr 10.1.0.135 dnat to 10.1.0.125
       tcp dport 11105 ip daddr 10.1.0.135 dnat to 10.1.0.125
       tcp dport 10000 ip daddr 10.1.0.135 dnat to 10.1.0.125
       icmp type { echo-reply, echo-request } ip daddr 10.1.0.135 dnat to 10.1.0.125
       tcp dport 11104 ip daddr 10.1.0.123 dnat to 10.1.0.137
       tcp dport 11105 ip daddr 10.1.0.123 dnat to 10.1.0.137
       tcp dport 10000 ip daddr 10.1.0.123 dnat to 10.1.0.137
       icmp type { echo-reply, echo-request } ip daddr 10.1.0.123 dnat to 10.1.0.137
       ip daddr 10.2.0.110 dnat to 3.146.166.253
       icmp type { echo-reply, echo-request } ip daddr 10.2.0.110 dnat to 3.146.166.253
       ip daddr 10.2.0.155 dnat to 3.135.134.67
       icmp type { echo-reply, echo-request } ip daddr 10.2.0.155 dnat to 3.135.134.67
   }

   chain postrouting {
       type nat hook postrouting priority srcnat; policy accept;
       ip saddr 10.1.0.125 snat to 10.1.0.135
       icmp type { echo-reply, echo-request } ip saddr 10.1.0.125 snat to 10.1.0.135
       ip saddr 10.1.0.137 snat to 10.1.0.123
       icmp type { echo-reply, echo-request } ip saddr 10.1.0.137 snat to 10.1.0.123
       tcp dport 11104 ip saddr 3.146.166.253 snat to 10.2.0.110
       tcp dport 11105 ip saddr 3.146.166.253 snat to 10.2.0.110
       tcp dport 10000 ip saddr 3.146.166.253 snat to 10.2.0.110
       icmp type { echo-reply, echo-request } ip saddr 3.146.166.253 snat to 10.2.0.110
       tcp dport 11104 ip saddr 3.135.134.67 snat to 10.2.0.155
       tcp dport 11105 ip saddr 3.135.134.67 snat to 10.2.0.155
       tcp dport 10000 ip saddr 3.135.134.67 snat to 10.2.0.155
       icmp type { echo-reply, echo-request } ip saddr 3.135.134.67 snat to 10.2.0.155
   }
}

Option 2: nftables CLI config script

#!/bin/bash
# Install nftables and enable ip forwarding in the kernel
dnf install -y
echo 1 > /proc/sys/net/ipv4/ip_forward
echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf

# Create the pre-routing and postrouting chains in nftables.
nft add table ip nat
nft -- add chain ip nat prerouting { type nat hook prerouting priority -100 \; }
nft add chain ip nat postrouting { type nat hook postrouting priority 100 \; }

# Unmap any incoming packets from the internet to the local fsx interface
# Map packets destined to 3.146.166.253(10.2.0.70) -> 10.2.0.110
nft add rule ip nat prerouting tcp dport 11104 ip daddr 10.2.0.70 dnat to 10.2.0.110
nft add rule ip nat prerouting tcp dport 11105 ip daddr 10.2.0.70 dnat to 10.2.0.110
nft add rule ip nat prerouting tcp dport 10000 ip daddr 10.2.0.70 dnat to 10.2.0.110
nft add rule ip nat prerouting icmp type { echo-request, echo-reply } ip daddr 10.2.0.70 dnat to 10.2.0.110
# Map packets destined to 3.135.134.67(10.2.0.186) -> 10.2.0.155
nft add rule ip nat prerouting tcp dport 11104 ip daddr 10.2.0.186 dnat to 10.2.0.155
nft add rule ip nat prerouting tcp dport 11105 ip daddr 10.2.0.186 dnat to 10.2.0.155
nft add rule ip nat prerouting tcp dport 10000 ip daddr 10.2.0.186 dnat to 10.2.0.155
nft add rule ip nat prerouting icmp type { echo-request, echo-reply } ip daddr 10.2.0.186 dnat to 10.2.0.155

# Map any outgoing packets from the local fsx interface to its respective public IP
# 10.2.0.110 -> 3.146.166.253(10.2.0.70)
nft add rule ip nat postrouting ip saddr 10.2.0.110 snat to 10.2.0.70
nft add rule ip nat postrouting icmp type { echo-request, echo-reply } ip saddr 10.2.0.110 snat to 10.2.0.70
# 10.2.0.155 -> 3.135.134.67(10.2.0.186)
nft add rule ip nat postrouting ip saddr 10.2.0.155 snat to 10.2.0.186
nft add rule ip nat postrouting icmp type { echo-request, echo-reply } ip saddr 10.2.0.155 snat to 10.2.0.186

# Unmap any incoming packets for the remote EIPs to the originating FSX internal IP
# 3.128.12.212 -> 10.1.0.125
nft add rule ip nat postrouting tcp dport 11104 ip saddr 3.128.12.212 snat to 10.1.0.125
nft add rule ip nat postrouting tcp dport 11105 ip saddr 3.128.12.212 snat to 10.1.0.125
nft add rule ip nat postrouting tcp dport 10000 ip saddr 3.128.12.212 snat to 10.1.0.125
nft add rule ip nat postrouting icmp type { echo-request, echo-reply } ip saddr 3.128.12.212 snat to 10.1.0.125
# 18.190.143.162 -> 10.1.0.137
nft add rule ip nat postrouting tcp dport 11104 ip saddr 18.190.143.162 snat to 10.1.0.137
nft add rule ip nat postrouting tcp dport 11105 ip saddr 18.190.143.162 snat to 10.1.0.137
nft add rule ip nat postrouting tcp dport 10000 ip saddr 18.190.143.162 snat to 10.1.0.137
nft add rule ip nat postrouting icmp type { echo-request, echo-reply } ip saddr 18.190.143.162 snat to 10.1.0.137

# Map any outgoing packets destined to a remote fsx interface to their respective public IP
# 10.1.0.125 -> 3.128.12.212
nft add rule ip nat prerouting ip daddr 10.1.0.125 dnat to 3.128.12.212
nft add rule ip nat prerouting icmp type { echo-request, echo-reply } ip daddr 10.1.0.125 dnat to 3.128.12.212
# 10.1.0.137 -> 18.190.143.162
nft add rule ip nat prerouting ip daddr 10.1.0.137 dnat to 18.190.143.162
nft add rule ip nat prerouting icmp type { echo-request, echo-reply } ip daddr 10.1.0.137 dnat to 18.190.143.162

# Persist the config
nft list ruleset > /etc/sysconfig/nftables.conf

Route tables

With both routers configured we need to make sure that SnapMirror traffic passes through them. To do this, we update the route table associated with FSx for ONTAP to send traffic from our remote VPCs to the network interface of our EC2 instances. For example, on Side B we add a route pointing 10.1.0.0/16 to the Elastic Network Interface of our EC2 instance. On side A we would do the inverse: 10.2.0.0/16 to its EC2 instance.

Route table associated with FSx for ONTAP

FSx for ONTAP security group

As the last part of setup, we must allow connection from our remote VPC network to connect to our FSx for ONTAP interfaces. For this we added 10.0.0.0/8 to the security group on both FSx for ONTAP instances.

Peer the file systems

With the network connections in place, all that’s left is to peer the FSx for ONTAP file systems. First, we log in to Side A and start the peering request.

Side A - peering request

Then, we log in to the Side B option and execute the same command without the generate passphrase and using the IPs from Side A. From the Side B of the FSx for ONTAP file system.

Side B - peering

From here the SVMs can be peered and a SnapMirror relationship can be created.

Cleaning up

There are costs associated with running EC2 instances and FSx for ONTAP file systems. Remember to delete and terminate these resources if they are no longer necessary. To delete a file system, follow the instructions in the FSx for NetApp ONTAP user guide. To terminate your EC2 instances, go to Terminate Your Instance in the Amazon EC2 user guide.

Conclusion

In this post, we outlined the highly reliable and scalable solution design to replicate data using Amazon FSx for NetApp ONTAP and SnapMirror over the internet and enable users to have connectivity between any ONTAP instance to any other ONTAP instance with the firewall rules in between. We also discussed the steps to deploy the configurations as well as scripts to automate the setup and deployment of the solution.

If you have more questions, then feel free to leave a comment or read the FSx for ONTAP FAQs.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *