Datacenter Concepts: 2019

Wednesday, May 22, 2019

IP Broadcasting and Multicasting in the Cloud

Broadcasting and Multicasting in the Cloud

In public clouds such as Amazon EC2, Google Compute Engine and Microsoft Azure, native support for multicast and broadcast is missing. In fact, on AWS it has been on the "to do" list since 2009 see https://forums.aws.amazon.com/thread.jspa?messageID=280285 . Broadcast & multicast are integral parts of today's network solutions and this is a missed opportunity for all public cloud platforms.

Additionally, in public clouds Layer 2 access is generally limited by design of VPC, Security Groups and ACLs. This makes public clouds networking very different from datacenter, where there's usually full L2 access (even across VLANs using L2 routing methods such as SVI).

Broadcasting, Multicasting, Anycasting & Unicasting

Before delving into broadcast and multicast, let's take a look at the most common addressing mode in IP networks - unicast. In IP network, the most common addressing mode is unicast where 2 hosts on the network can communicate with each other. It's a typical server client topology. The vast majority of Internet is unicast where servers serve continuous request of billions of client end point (mobile, IoT and traditional PCs, laptops). The reason behind this architecture is based on the type of protocol used (TCP). TCP is preferred because of it's guaranteed delivery and recovery mechanism. Since TCP is only unicast, we have majority of the internet as unicast. Please note that UDP on the other hand can be used with unicast, multicast & broadcast packets.

In a broadcast addressing mode (see RFC919 in October 1984), a packet is address to all hosts in a local network rather than being sent/addressed to a single host.

In multicast (see RFC966 in December 1985), which is basically a subset of broadcast mode, a packet is not addressed to all hosts but it's instead addressed to a group of hosts called a "multicast group". Multicast groups are dynamic by default where any host can join and leave it "on the fly" and rejoin using a protocol called IGMP (Internet Group Management Protocol). A multicast group is defined by an IP address which can range from multicast reserved range (224.0.0.0 - 239.255.255.255).

On a host joining the multicast group, it will start receiving messages addressed to the group. For multicast addressing, UDP (User Datagram Protocol).

When a host is joined to a multicast group, it receives messages addressed to the group. The protocol that is most commonly used with multicasting is the User Datagram Protocol (UDP). UDP is a very flexible protocol that can work with any addressing mode. TCP on the other hand works with unicast only.

IPv4 has only unicast, multicast & broadcast.

IPv6 has unicast, anycast, multicast & broadcast.

Anycast is a relatively newer addressing mode (kind of a subset of multicast) where a packet is sent of only a single host within a multicast group. Please note that anycast is present in IPv6 only.

Broadcasting and Multicasting at Layer 2

At layer 2, we deal with Ethernet which is the most prevalent Layer 2 protocol used today & a PDU here is called as a "frame". The ethernet frame has embedded source and destination MAC address, also called as a MAC address which is 48 bits hexadecimal address such as 01:23:45:67:89:01. 6 octets with the first 3 octets used as OUI (Organizationally Unique Identifier) and last 3 octets used exclusively to identify the device. Within the OUI's first octet the least significant bit (b0) identifies whether addressing is multicast or unicast and bit (b1) second least significant bit signifies whether the MAC address is locally or universally administered (locally unique or universally unique).

So for example 06:00:00:00:00:00, where the first octet (06) is also represented as 00000110 has the b1 bit as 1 which means this is a locally administered address and not universally unique.

Now, a MAC address in an Ethernet frame is considered unicast if the b0 bit is set to 0 and broadcast if b0 bit is set to 1. In the above example of MAC 06:00:00:00:00:00, we have LSB in first octet set to 0 (06 = 0110) and hence the MAC address is unicast which means the frame to which this address belongs is a unicast PDU and is encompassing a unicast packet and is meant to reach only a single host/NIC/node unlink a broadcast frame which will be delivered to all hosts/nodes/NICs in the collision domain. For multicast as well, this bit (b0) is set to 1 with the caveat that it is broadcast to only those hosts which have joined a specific multicast group!

When an IP unicast packet is passed to layer 2 so that it can be sent to the next hop, it is wrapped in an unicast Ethernet frame. The MAC address of the next hop is determined using a protocol called the Address Resolution Protocol (ARP, which incidentally uses broadcast Ethernet frames to find out the Mac address for a given IP). If a switch is unaware of the port which leads to a given MAC unicast address in the frame then it will forward the unicast frame to all of it's port (except the originating port), an action known as unicast flood

IP broadcast and multicast do not use ARP. IP broadcasts are always sent to the "all-ones" Ethernet address ff:ff:ff:ff:ff:ff. Since the low bit of the high byte is a 1, this is a broadcast address, and it will be delivered to all hosts on the L2 network. IP multicast instead uses a formula to convert the IP multicast group address to an Ethernet address. This formula is described in RFC1112. The group address 224.1.2.4 for example is translated to 01:00:52:01:02:04. The mapping is not unique: multiple group addresses correspond to the same broadcast address on the Ethernet.

Applications using IP Multicasting

While many more applications use unicast addressing, multicasting does have a few important use cases. The two main areas seem to be infrastructure for high availability solutions, and to implement "zero config" discovery mechanisms.

Examples of high availability solutions that use multicasting are the well-known keepalived (an implementation of Cisco's Virtual Router Redundancy Protocol or VRRP), uCarp, the Red Hat Cluster Suite (based on the open source Corosync/OpenAIS projects) and JGroups. In this category, there is also the venerable Veritas Cluster Server (VCS). It should be mentioned that some of these projects have grown unicast support recently, exactly because of the lack of multicasting in the cloud. However in all cases the most optimal solution is to use multicasting. The multicast networking in this category is used to send "heartbeat" messages. All nodes listen to these messages. If, at some point, a message is not received for a certain amount of time, the nodes assume something went wrong and can start a corrective action. At Layer 2 level, many solutions such as MSCS (Microsoft Clustering Services) and many other solutions also use multicast to send "heartbeat" messages

Examples of discovery solutions that use multicasting include the Apple Bonjour/Zeroconf protocol (also known as multicast DNS or DNS service discovery), the Java NoSQL databases Hazelcast and EhCache, and the Oracle Grid Infrastructure. These solutions use multicast to announce a presence or a status on the network, without having to explicitly configure which other nodes exist.

Conclusion

People have tried to work around the lack of multicasting using various OS level tools. A few interesting ones are using n2n to set up a peer to peer L2 VPN between virtual machines, or using various approaches to turn multicast into unicast. Some of these approaches may have valid use cases. That said, in all cases, these solutions add significant complexity, and push what is essentially a network responsibility back into the OS.

Tuesday, April 16, 2019

SMB/CIFS/SAMBA/NFS

SMB

So what is SMB? SMB stands for “Server Message Block.” It’s a file sharing protocol that was invented by IBM and has been around since the mid-eighties. Since it’s a protocol (an agreed upon way of communicating between systems) and not a particular software application, if you’re troubleshooting, you’re looking for the that is said to implement the SMB protocol.

The SMB protocol was designed to allow computers to read and write files to a remote host over a local area network (LAN). The directories on the remote hosts made available via SMB are called “shares.”

CIFS

CIFS stands for “Common Internet File System.” CIFS is a dialect of SMB. That is, CIFS is a particular implementation of the Server Message Block protocol, created by Microsoft.

CIFS vs SMB

Most people, when they use either SMB or CIFS, are talking about the same exact thing. The two are interchangeable not only in a discussion but also in application – i.e., a client speaking CIFS can talk to a server speaking SMB and vice versa. Why? Because CIFS is a form of SMB.

While they are the same top level protocol, there are still differences in implementation and performance tuning (hence the different names). Protocol implementations like CIFS vs SMB often handle things like file locking, performance over LAN/WAN, and mass modification of file differently.

CIFS vs SMB: Which One Should I Use?

In this day and age, you should always use the acronym SMB.

I know what you’re thinking – “but if they’re essentially the same thing, why should I always use SMB?”

Two reasons:-

1.) The CIFS implementation of SMB is rarely used these days. Under the covers, most modern storage systems no longer use CIFS, they use SMB 2 or SMB 3. In the Windows world, SMB 2 has been the standard as of Windows Vista (2006) and SMB 3 is part of Windows 8 and Windows Server 2012.

2.) CIFS has a negative connotation among pedants. SMB 2 and SMB 3 are massive upgrades over the CIFS dialect, and storage architects who are near and dear to file sharing protocols don’t appreciate the misnomer. It’s kind of like calling an executive assistant a secretary.

Samba and NFS

CIFS and SMB are far from the entirety of file sharing protocols and if you’re working to make legacy systems interoperate, it is quite likely that you’re also going to run into situations where others are necessary. Two other prominent file sharing protocols you should know about are Samba and NFS.

SAMBA

What is Samba? Samba is a collection of different applications with when used together let a Linux server perform network actions like file serving, authentication/authorization, name resolution and print services.

Like CIFS, Samba implements the SMB protocol which is what allows Windows clients to transparently access Linux directories, printers and files on a Samba server (just as if they were talking to a Windows server).

Crucially, Samba allows for a Linux server to act as a Domain Controller. By doing so, user credentials on the Windows domain can be used instead of needing to be recreated and then manually kept in sync on the Linux server.

NFS

The acronym NFS means “Network File System.” The NFS protocol was developed by Sun Microsystems and serves essentially the same purpose as SMB (i.e., to access files systems over a network as if they were local), but is entirely incompatible with CIFS/SMB. This means that NFS clients can’t speak directly to SMB servers.

So what does NFS mean in terms of your network communications toolkit? You should use NFS for dedicated Linux Client to Linux Server connections. For mixed Windows / Linux environments use Samba.

Wednesday, April 10, 2019

AWS CSAA - AWS Certified Solutions Architect Associate - 2019 - 4 Week Learning Path

Week 1:-

To get a good overview of basic concepts and architecture one should get started with the official Guide AWS Certified Solutions Architect Official Study Guide: Associate Exam. You can watch the videos from https://www.udemy.com/aws-architect/learn/ and read the guide as both are synched and it will help go through the written material faster.

Week 2:-

Go through the Linux Academy and BackSpace Academy Videos as these cover a lot more detailed scenarios with labs. I particularly recommend "The Orion Papers" which i found to be very useful. The concepts are very well explained with visual diagrams where any single area let's say AWS databases is covered at a high level in one single image. This aids in recalling the concepts and applying them to scenario specific questions in the exam correctly.

Week 3:-

Go through a cloud Guru Videos, FAQs for all major AWS services in AWS.
Review the AWS Whitepapers

AWS Well-Architected Framework Whitepaper
AWS_Risk_and_Compliance_Whitepaper
AWS_Security_Whitepaper
AWS_Cloud_Best_Practices
AWS_Overview
AWS_Storage-Options

Week 4:-

Take Practice Tests

Braincert AWS Solutions Architect – Associate SAA-C01 Practice Exams, which provide extensive scenario based questions
Udemy AWS Solutions Architect – Associate SAA-C01 Practice Exams

Also Refer to CheatSheet before exam day here

Troubleshooting EC2 instances

References :-

https://www.udemy.com/aws-architect/learn/
https://www.udemy.com/linux-academy-aws-certified-solutions-architect-associate/learn/
https://www.udemy.com/aws-certified-associate-architect-developer-sysops-admin/learn/
https://www.udemy.com/aws-certified-solutions-architect-associate/learn/