Some WIP notes from a first watch of Google’s The Bits and Bytes of Computer Networking course.
This course appears to be targeted at a support engineer
Parallels with person-to-person communication and the rules that we follow.
TCP/IP 5-layer model (we’ll also touch on 7-layer OSI)
Copper twisted-pair: most common are Cat5, Cat5e, Cat6 - affects length / transfer rate, resistance to interference
Crosstalk: when an electrical pulse on one wire accidentally detected on the other wire
Fiber optic: longer distance than copper, not affected by interference, much more fragile
Hubs and switches are used on a single network (LAN). Router: a device that knows how to forward data between independent networks
Router is a Layer 3 (network layer) device. Router inspects IP data to determine where to send things.
(Routing tables?)
Home / office routers - forward traffic from home to ISP. There a more sophisticated router takes over.
Core ISP routers deal with complexity in deciding where to send traffic.
Routers communicate with BGP (Border Gateway Protocol) - informs about optimal paths to transfer traffic
(Aside: network engineer designs a network?)
Modulation - the way of varying the voltage of charge across charge. Aka line coding. Representing 1s and 0s.
Twisted pair cable: cat6 has 8 wires (4 twisted pairs). How many are in use? depends on transmission technology.
Allows for duplex communication: information can flow in both directions across the cable (in contrast with simplex)
1 or 2 pairs reserved for communicating in each direction
RJ45 plug / port: exposes the internal wires
RJ45 port LEDs:
Network connectors in walls usually end in patch panel: container for endpoints of many runs of cable. Cables then run from patch panel to a switch
Frames
Ethernet 1980, standardized 1983. A few changes since then for increasing bandwidth needs, but pretty much as it was then. Back then, switches hadn’t been invented yet. All devices shared a single collision domain (network segment where only one device can send data at a time). Ethernet solved this with carrier sense multiple access with collision detection (CSMA/CD): if no data currently being sent, node will feel free to send data. If both try and send data, they’ll detect and stop. Both wait a random time and then try again.
On a collision domain, all nodes receive all signals. We need a way of identifying who data was intended for. Hence MAC address (globally unique ID atached to an individual network interface). 48-bit (6 groupings of 2 hex numbers). Or an octet. Any number that can be represented in 256 bits (2 hex digits).
How are MAC addresses globally unique?
data packet: all encompassing term for any single set of binary data being sent across a network link (not tied to layer or technology - concept)
At Ethernet level they’re known as frames. Highly structured collection of information presented in specific order.
MAC addresses are not a good way of addressing computers on different networks - they have no useful organisation and tell us nothing about a physical location. We need something else.
32-bit, 4 octets. Dotted decimal notation
Distributed in large sections to various organisations. Hierarchical and easier to store data about.
e.g. IBM own all IP addresses starting with 9. So if a router wants to get data to a network starting with 9, it knows it just needs to get it to one of IBM’s routers
IP addresses belong to networks, not the devices attached to those networks
Packets in IP are called IP datagrams. Again, a highly structured series of fields. Header and payload.
Contains more data than an Ethernet frame header
An IP address can be split into network ID and host ID.
9.100.100.100: first octet is network ID, remainder is host ID
address class system: a way of defining how global IP address space split up:
We talk of e.g. a “Class C network”.
Class affects max # hosts addressable on network.
Can tell what type of network it is by first octet.
There are also
Mainly replaced by CIDR - classless inter-domain routing
A protocol used to discover the hardware address of a node with a certain IP address.
Transmitting device needs a destination MAC address.
All devices maintain a local list of IP - MAC addresses (ARP table)
If we want to send to an IP we don’t have a table entry for, then it sends it to MAC broadcast address. Then the computer with that IP address responds with an ARP response containing its MAC address. Computer will then store this in its local IP table.
ARP table entries usually expire after a short period of time.
(Not sure what this has to do with the Internet - it seems to be talking about a LAN)
Taking a large network and splitting it up into smaller sub-networks
Incorrect subnetting setups are a common IT support problem
IP routers route to the “gateway router” - entry and exit path to a certain network e.g. to the 9.0.0.0 class A network, which is then responsible to getting it to the correct system by looking at the host ID
But on a class A network that’s 16 million host IDs - way too many to connect to one router
So split it up - multiple subnets each with their own gateway.
Subnet ID. In a network with subnetting, some of the bits that would otherwise be used for the host ID are used for the subnet ID.
Subnet IDs are calculated by subnet mask - 32-bit numbers, written as 4 octets in decimal.
9.100.100.100 is, in binary:
0000 1001 0110 0100 0110 0100 0110 0100
Subnet mask: 1111 1111 1111 1111 1111 1111 0000 0000 - tells us what we can ignore when computing a host ID 255.255.255.0
Tells a router what part of an IP address is the subnet ID
(where is this defined)
255.255.255.224
27 1’s followed by 5 0’s - 5 bits of host ID space (32 addresses)
Shorthand way of writing subnet masks (CIDR notation)
9.100.100.100 with subnet mask of 255.255.225.224
can be written as 9.100.100.100/27 (what is this telling us?)
and then there’s something about doing an AND…?
Address classes where the first attempt at addressing the global Internet IP space. Subnetting introduced when became clear address classes themselves weren’t sufficient.
Not enough class A networks, possibly too many class Cs to put in a routing table. The number of hosts avaialable is often too large or small for # hosts needed by an organisation. Orgs often ended up with multiple adjoining class Cs.
Demarcation point: where one system ends and another begins
Abandons concept of address classes
This:
If I’ve understood correctly, there’s no “subnet”, it’s a “netmask” and a “network”? “Routers only need to know one entry in routing table”
Most intensive routing issues are handled by ISPs and largest companies
Complex topic
Router: network device that forwards traffic depending on destination address. Has at least 2 network interfaces
Network A: 192.168.1.0/24 (router at 192.168.1.1) Network B: 10.0.0.0/24 (router at 10.0.0.254)
How it works:
Now let’s introduce a third network:
Network C: 172.16.1/23
Computer on Network A wants to send to computer on Network C
Core Internet routers usually connected in a mesh - multiple routes for a packet to take
Major OSs today still have a routing table they ship with. Vary a ton depending on make and class of router
4 columns:
Routing tables on Internet have millions of rows, and must be consulted for every packet
Magic of routing is how the routing tables are constantly updated
Routers use routing protocols to share information with other routers
Getting data to the edge router of an autonomous system is #1 goal of core Internet routers.
IANA helps manage things like IP address allocation. Also responsible for ASN (Autonomous System Number) allocation. 32-bit numbers (like IP) but written as single decimal numbers
There’s only 4 billion IP addresses, but 7.5 billion humans
Can’t account for data centres
RFC 1918, 1996. Outlined addresses that would be non-routable address space. Ranges of IPs set aside for anyone that cannot be routed to. Not every computer connected to the internet needs to be able to communicate. No gateway router would attempt to forward traffic to this type of network.
In future module we’ll cover NAT. Allows computers on non-routable address space to communicate with other computers on the Internet.
Primary 3 ranges are:
interior gateway protocols will route these
Allows traffic to be directed to specific network applications
an IP datagram encapsulates a TCP segment: TCP header + data section
Establishing a connection (three-way handshake)
Now operating in full duplex. Each segment sent in either direction should be responded to with an ACK.
Closing a connection (four-way handshake)
Socket: the instantiation of an endpoint in an potential TCP connection
TCP sockets can exist in lots of states, and understanding them will help troubleshooting:
These states are OS-specific, lie outside of the TCP spec
TCP is connection-oriented: establishes a connection, and uses this to check all data has been properly transmitted
Even minor crosstalk from another twisted pair in same cable can make a CRC fail - this causes a whole frame to be discarded
Congestion might cause a router to drop traffic in favour of a higher priority one, or construction company might cut a cable between ISPs
IP and Ethernet use checksums, but they don’t re-send data that doesn’t pass the check, it just gets discarded
TCP sends all segments in sequential order, but they might not arrive in that order - this is why sequence numbers important
TCP has overhead (set up, tear down, acknowledge), only necessary if you really care about the data getting there
UDP - you just set destination port and send packet. For example, streaming video. Imagine each UDP datagram is a single frame of a video. Doesn’t really matter if a few get lost along the way
A device that blocks traffic that meets certain criteria.
Can operate at:
Small business network, with a server that serves company website, and also private file server. So firewill would only allow traffic to port 80.
Sometimes independent network devices. But best to think of them as a program that can run anywhere. For home users, the router and firewall are usually same device.
Too many protocols to dive into here. Let’s briefly talk HTTP.
All web browsers and servers need to speak same protocol (HTTP).
Open Systems Interconnection - the most rigorously defined, often used in academic settings / certification orgs
Introduces 2 layers between transport and application:
There’s no additional encapsulation going on here, which is why we usually focus on the 5-layer model here
The whole thing here is just desribing getting a single TCP segment with a SYN flag from computer 1 to computer 2
(This is very cool.)
DNS, DHCP, NAT, VPN, proxies
IP address is a 32-bit binary number, MAC address 48-bit binary number. Humans are better at remembering words. Furthermore, you’d have to remember changing IP addresses
Domain names might resolve to different things depending on where in the world you are
DNS servers need to be specifically configured at a node in a network
Alongside IP address, subnet mask, gateway – DNS server is the final piece of standard modern network configuration that needs to be put into a host
There are 5 primary types of DNS server, and a given server can fulfil many of these roles at once
All domain names in global DNS system have a TTL (in seconds), how long a name server is allowed to cache an entry before it should discard it and perform a full resolution again. In the past, these used to be huge, e.g. ~ 1 day, due to limited bandwidth. These days, they’ve dropped to ~few mins to ~few hours.
So, what does a full recursive resolution look like?
Each computer also usually has its own temporary DNS cache too.
A single DNS request and response can usually fit in a single UDP datagram.
DNS listens on port 53
With TCP, a full DNS lookup would take the exchange of 44 packets. Whereas it’d be 8 UDP datagrams.
How does error recovery work? DNS resolver asks again if it doesn’t get a response, which is the same functionality that TCP provides at the transport layer. DNS over TCP also exists and is frequently used, especially since single DNS req/res no longer expects in a single UDP datagram. In that case, the DNS server responds saying the record is too large, and that the requester needs to open a TCP connection
2:57:36