IP Prefixes on EC2

Friday, Jul 23, 2021

By Donavan Fritz

What are IP Prefixes?

Yesterday, AWS released IP Prefixes for EC2. IP prefixes are a contiguous, summarizable, blocks of IPv6 and/or IPv4 addresses. These prefixes are assigned to a single Elastic Network Interface (ENI) in a VPC (which is typically attached to an EC2 instance). In the most basic sense, it’s this:

ip prefix logical

Why do I want this?

It comes down to efficiency, simplicity, and scalability for container platforms. Let’s break this down-

Efficiency is gained in two ways. First, we can put more IPs on an ENI. The limit of IPs that can be assigned to an ENI is all over the place. Some instance types get more than others. Best case scenario today, you’re looking at 30 IPs per ENI. But with IP Prefixes, we get a /80 on an ENI. That’s 2^48 IPs per ENI. That’s the same amount of bits used for MAC addresses. Or the entire IPv4 internet, 65k times over, per ENI. It’s an insane amount of addresses. But the actual beauty here is that it’s a single control plane operation to set up IP Prefixes (the second efficiency gain). Once IP Prefixes are set up, the instance can use as many IPs as needed, an never re-use an IP!

Simplicity. Container networking is a mixed bay nowadays, this space is being rapidly innovated in and there are many different configurations and players in the game. I typically see folks using an overlay on top of the AWS VPC network (which is already an overlay!). Projects like calico, or docker swarm make it easy to set an such an environment. But this “easy” option comes with a lot of drawbacks. With an overlay, you’re creating a walled garden. Now you have to figure out how to ingress and egress traffic to your walled garden. We bring in observability and accounting challenges, “what was this ip + source port at this time?” How do we do service discovery? Now we have split scope DNS. What hostname do I need to use again? Oh, it’s cluster.local here and my-company.com here. Okay, got it. Don’t get me wrong, these are not insurmountable challenges. k8s does a lot of this for you, but it could be way simpler. I consider simpler not using an overlay. Not using an overlay and using addressing from the VPC network is quantifiably simpler, in the same way that 1 is less than 2.

Scale. Scale is the product of simplicity and efficiency. By being more efficient and simple with our use of the network, we can scale our ability to use the network. That’s pretty much it. So maybe this part wasn’t as much of a “break down” as I promised. The efficiency, simplicity, and scalability benefits are coupled. But that’s also what’s so amazing. We have a single technique that checks all three boxes.

How does it work?

IP Prefixes are similar to having a routed network block (i.e. prefix) belong to a node. But slightly different in that the prefix must be from the same VPC Subnet as the ENI. This is a simpler approach that can work because of AWS network magic, and allows for other features, like NACLs to continue to operate seamlessly.

Setting up IP Prefixes

In this section, we will walk through setting up IP prefixes for an EC2 instance. This section assumes a basic level of familiarity with AWS and assumes a Nitro-based EC2 instance (IP Prefixes are only supported on nitro) is already running in a VPC. We assume IPv6 first and consider private IPv4 secondarily; and will call out differences when needed.

Using IP prefixes does not require a new API, it’s tacked onto the existing AssignIpv6Addresses API (AssignPrivateIpAddresses, for IPv4). In this example, we use the AWS CLI to set this up:

dfritz@workstation:~$ aws ec2 assign-ipv6-addresses --network-interface-id eni-abc123 --ipv6-prefix-count 1
{
    "AssignedIpv6Prefixes": [
        "2001:db8::/80"
    ],
    "NetworkInterfaceId": "eni-abc123"
}
dfritz@workstation:~$

In the above operation, the AWS control plane assigned 2001:db8::/80 to eni-abc123. Now any traffic destined to 2001:db8::/80 will be sent to this ENI. We can also observe the configuration via the IMDS:

ubuntu@instance:~$ curl http://169.254.169.254/latest/meta-data/network/interfaces/macs/01:12:23:34:45:56/ipv6-prefix
2001:db8::/80
ubuntu@instance:~$

Now that 2001:db8::/80 is “delegated” to this EC2 instance, let’s test and validate that we have IP reachability to/from that address space. We can pick any address within 2001:db8::/80 and assign it to our interface inside the OS, like so:

ubuntu@instance:~$ ip addr add 2001:db8::100 dev eth0
ubuntu@instance:~$ ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 01:12:23:34:45:56 brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.100/24 brd 192.0.2.255 scope global dynamic eth0
       valid_lft 2228sec preferred_lft 2228sec
    inet6 2001:db8::100/128 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link
       valid_lft forever preferred_lft forever
ubuntu@instance:~$

Since this is the first IPv6 address assigned to the EC2 instance, a simple curl can be used to validate functionality. Not only do we see that it works, but also the IP observed by external systems matches what we have configured. It’s also a nice way to validate there’s no network address translation going on as well.

ubuntu@instance:~$ curl -6 ifconfig.io
2001:db8::100
ubuntu@instance:~$

It was mentioned earlier that this is geared towards container platforms. We will look at a basic example with docker next, but first we need to set up an IPv4 prefix because IPv6 only docker networking isn’t a thing.

dfritz@workstation:~$ aws ec2 assign-private-ip-addresses --network-interface-id  eni-abc123 --ipv4-prefix-count 1
{
    "NetworkInterfaceId": "eni-abc123",
    "AssignedIpv4Prefixes": [
        {
            "Ipv4Prefix": "192.0.2.0/28"
        }
    ]
}
dfritz@workstation:~$

Similarly with the IPv6 example above, we now have 192.0.2.0/28 assigned to eni-abc123 based on the above output.

Example of using IP Prefixes with Docker

We will look at an example of how container networking is drastically simplified with IP prefixes. To follow along, you’ll need docker installed.

We are flipping between two machines in this example. A workstation with a dfritz@workstation:~$ prompt, and an EC2 instances with a ubuntu@instance:~$ prompt. The docker host is the EC2 instance running ubuntu (with an IP prefix assigned as described in first section of this post). We plan to showcase reachability to/from the internet as well as our workstation.

The first step is to create a custom docker network. The IPVlan network driver may be the easiest to use. Here’s an example:

ubuntu@instance:~$ docker network create -d ipvlan --ipv6 \
  --subnet 192.0.2.0/28 \
  --subnet 2001:db8::/80 \
  -o parent=eth0 \
  -o ipvlan_mode=l3 \
  ip-prefix
ubuntu@instance:~$

This is telling docker that any containers that run in that network will be addressed by those subnets. These subnets match the IP prefixes assigned. Once the network is set up, containers immediately get a real network addresses that are globally unique (for IPv6) and NOT NAT’d (for both IPv6 and IPv4) by the docker host. No more needing to think about exposing ports, just run the container.

Here we validate egress IPv6 and IPv4 connectivity by curling ifconfig.io. This confirms reachability and that there is no IPv6 NAT’ing taking place. For IPv4, we see a different IPv4 address returned since we have public/private IPv4 addressing at play here.

ubuntu@instance:~$ docker run --net ip-prefix -it --rm curlimages/curl curl ifconfig.io
2001:db8::2
ubuntu@instance:~$ docker run --net ip-prefix -it --rm curlimages/curl curl -4 ifconfig.io
203.0.113.123
ubuntu@instance:~$

Here we prepare to validate ingress IPv6 and IPv4 by using an apache (httpd) and an nginx container to host a demo web page. We use both nginx and apache for two reasons. First, the default page is distinctly different for each. It’s easy to identify the default nginx page from the default apache page. Second, we are intentionally creating what would be a port conflict, because they both use TCP/IP port 80 by default. By simply using IP prefixes we showcase how exposing ports is not needed, and we fundamentally avoid any/all port conflicts.

We run each container and check the docker assigned IPv6 and IPv4 addresses for validation from our workstation. Notice how we do not specify any ports to be exposed.

ubuntu@instance:~$ docker run --net ip-prefix -d --rm --name nginx nginx
f07b3a7658060a0983245cd26b6aabe0b8c8eaeb2310a99a9870148628717fc
ubuntu@instance:~$ docker run --net ip-prefix -d --rm --name httpd httpd
15b1bf7b617f98b3d00e6534bdac79879b32837d8fc0ab982e393e8f6435a03d

ubuntu@instance:~$ docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' nginx
192.0.2.2
ubuntu@instance:~$ docker inspect -f '{{range.NetworkSettings.Networks}}{{.GlobalIPv6Address}}{{end}}' nginx
2001:db8::2
ubuntu@instance:~$ docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' httpd
192.0.2.3
ubuntu@instance:~$ docker inspect -f '{{range.NetworkSettings.Networks}}{{.GlobalIPv6Address}}{{end}}' httpd
2001:db8::3
ubuntu@instance:~$

To grok things easier, we can just remember that the nginx container ends with ::2 and .2 for IPv6 and IPv4 respectively. The httpd container ends with ::3 and .3 for IPv6 and IPv4 respectively.

To start, from our workstation we’ll test the ::2 and .2 addresses (the nginx container). We see the well known default nginx splash page (as expected).

dfritz@workstation:~$ curl http://[2600:1f14:923:8001:c0d9::2]
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
  ...

dfritz@workstation:~$ curl http://192.0.2.2
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
  ...

dfritz@workstation:~$

Next, from our workstation we test the ::3 and .3 addresses (the httpd container). We see the well known default apache splash page (as expected).

dfritz@workstation:~$ curl http://[2001:db8::3]
<html><body><h1>It works!</h1></body></html>
dfritz@workstation:~$  curl http://192.0.2.3
<html><body><h1>It works!</h1></body></html>
dfritz@workstation:~$

This setup provides a nice straightforward mechanism for container networking. Without IP prefixes one could achieve the same results by assigning individual IPv6 and IPv4 addresses to the ENI, re-write/“expose” ports, or run an overlay network. All those options are not as straightforward and add complexity. IP prefixes provide a drastic simplification to container networking.

tags: #aws #ipv6 #ip-prefix