3 Transport Layer
Click on a tile to change the color scheme:
Transport Layer provides different processes with logic communication.
Protocol of transport layer only works in the end system.
Transport Layer extend the host-to- host delivery service provided by the network layer to a process-to-process delivery service for applications running on the hosts.
Protocol: TCP and UDP
1. Multiplexing and Demultiplexing
Port range: 16bits, 0\~65535
Well-known port numbers: 0~1023
UDP socket can be identified by a two-tuple.
TCP socket can be identified by a quadruple.
2. UDP
UDP socket can be identified by a two-tuple.
There's no source IP, so the receiver will treat UDP segments which has the same source port and dest. port in the same way, although their source IP may be different.
(Source IP is transported by the Network Layer.)
2.1 Checksum
Ref: UDP Checksum
Pseudo Header is not transported. It only serves for computing the checksum.
If the result overflows, we need to "wrap it around".
Also note the "reserved" or "padding" zeros.
If the data size is not an integral multiple of 16 bits, we need to pad zeros behind.
2.1.1 Checking by Receiver
Add all data (certainly including the checksum computed by the sender) as 16 bits numbers together. If the result contains "0", the process must have some error. Even the result consists of "1", we can not be sure that the process is totally right.
3. Reliable Data Transfer
ARQ, Automatic Repeat reQuest: based on pos./neg. acknowledgment
Error detection
Receiver Feedback: ACK and NAK
Retransmission
3.1 Stop and wait
The sender have to wait ACK and NAK before leaving the waiting status and then obtaining following data from upper layer.
3.1.1 rdt 2.0
Add ANK and NAK to know whether the data is delivered.
3.1.2 rdt 2.1
To avoid the corruption of ANK and NAK, add sequence number.
3.1.3 rdt 3.0
To address the problem of packet loss, introduce the countdown timer which causes resending and duplicate data packet.
3.2 Pipelining
Stop and wait has very poor efficiency. We need to stop waiting too much and send more data.
As a result, we have to:
1) Increase the range of sequence numbers. 2) (Optional) Cache in buffer. 3) Know how to respond to lost, corrupted, and overly delayed packets (分组).
3.2.1 Go-Back-N
GBN, sliding-window protocol.
Animation: GBN
Features: Cumulative Ack, base, nextseqnum,
Typically, consider 2 situations of packet(s) loss:
1) The packet of sender with sequence number \(i\) is lost before it reaches the receiver.
The receiver expects sequence number \(i\), but it only receives \(i + 1\). So, it wil drop it (not deliver to the upper layer), and send the packet of ACK with sequence number \(i - 1\) to the sender.
The sender receives \(i - 1\), so \(base == i\) will not change (cause the reaching of \(i - 1\) will lead to \(base = (i - 1) + 1 = i\)). It will wait the ACK \(i\) until timeout, and then re-send it.
2) The responsive packet ACK of the receiver with sequence number \(i\) is lost before it reaches the sender.
The sender will not receive ACK with sequence number \(i\). However, due to the feature of "Cumulative Ack", the reaching of ACK with sequence number \(i + 1\) will tell the sender the successful delivery of data packet \(i\). Thus, the loss of ACK with sequence number \(i\) has no effect.
3.2.2 Selective Repeat
To improve the performance of GBN, because GBN will generate lots of duplicate packets when there are plenty of packets in the pipeline (may caused by the large window length or the large bandwidth-delay product).
The difference: The receiver will acknowledge the correctly delivered packets regardless of the order (sequence number). So there's no cumulative ACK! This leads to a big difference with GBN, which means when the sender receives ACK \(i\), it cannot regard \(i - 1\) as having been transmitted correctly to the receiver. So the sender should wait until timeout.
So it needs cache and buffer and independent timers.
The sender:
When receiving ACK:
if seqnum in window:
receive and "ACK" it!
if seqnum == send_base:
window move forward to min(seqnum that hasn't yet ACK'd)
The receiver:
window: [rcv_base, rcv_base + N - 1]
When receiving packet:
if seqnum in window:
send corresponding ACK
if seqnum/packet is not received before:
cache it
if seqnum == rcv_base:
deliver {packets whose seqnum starting with rcv_base, until the last cached one} to the upper layer
elif seqnum in [rcv_base - N, rcv_base - 1]:
send corresponding ACK!!!
else:
do nothing (ignore)
Note for "!!!": If doing nothing there, the window of sender will not be able to move forward, since the moving forward can only be triggered by seqnum == send_base
. Such case will happen when an ACK is lost on the way coming back to sender, and then timeout of this "seqnum" will cause re-sending. So the receiver must respond to this re-sent packet although it has already cached it.
Issue: The window size must be less than or equal to half the size of the sequence number space for SR protocols!
4. TCP
three-way handshake
maximum segment size (MSS): the maximum amount of application-layer data in the segment, not the maximum size of the TCP segment including headers.
maximum transmission unit, MTU: the length of the largest link-layer frame that can be sent by the local sending host; setting the MSS based on the path MTU value
How many flags ???
Sequence number for a segment is the serial number (序号) (in the whole byte-stream) of the first byte of the segment.
???"Ack. number" in TCP is different from "seq. num" of ACK packets in GBN or SR!!! The acknowledgment number that Host A puts in its segment is the sequence number of the next byte Host A is expecting from Host B. (Cumulative Acknowledgments)
4.1 Telnet
Note that the acknowledgment for client-to-server data is carried in a segment carrying server-to-client data; this acknowledgment is said to be piggybacked on the server-to-client data segment.
4.2 RTT Estimation and Timeout Determination
SampleRTT: the amount of time between when the segment is sent (that is, passed to IP) and when
an acknowledgment for the segment is received.
2 Rules: The SampleRTT is being estimated for only one of the transmitted but currently unacknowledged segments, and for segments that have been transmitted once.
\(EstimatedRTT=(1−α)⋅EstimatedRTT+α⋅SampleRTT\); recommended \(\alpha = 0.125\)
???EstimatedRTT puts more weight on recent samples than on old samples. Exponential weighted moving average (EWMA).
\(DevRTT=(1−β)⋅DevRTT+β⋅|SampleRTT−EstimatedRTT|\); recommended \(\beta = 0.25\)
Determine \(TimeoutInterval=EstimatedRTT+4⋅DevRTT\)
(Init: 1s; When a timeout occurs, the value of TimeoutInterval is doubled; Updated when EstimatedRTT is updated)
4.3 Fast retransmit
??? The reason that the sending side has to wait until the third duplicate ACK is described in RFC2001 as follows:
" Since TCP does not know whether a duplicate ACK is caused by a lost segment or just a reordering of segments, it waits for a small number of duplicate ACKs to be received. It is assumed that if there is just a reordering of the segments, there will be only one or two duplicate ACKs before the reordered segment is processed, which will then generate a new ACK. If three or more duplicate ACKs are received in a row, it is a strong indication that a segment has been lost. "
4.4 Selective Acknowledgement
Ref: SACK
SACK dedicates to avoid the unnecessary retransmission.
TCP uses cumulative Ack. By implementing GBN, the lost of packet 2 will lead to retransmission of packet 3, 4, ... The receiver has no way to tell the sender that it has received packet 3, 4 or the following ones correctly. So TCP appends a SACK option in ACK packets to deliver these informations about the following ones. (See ref. for an example.)
4.5 Comparison with GBN and SR
TCP is a hybrid protocol of GBN and SR.
Ref: An article about comparison
4.6 Flow Control
Animation: Flow Control
For the receiver, \(rwnd=RcvBuffer−[LastByteRcvd−LastByteRead]\)
The receiver tells the sender how much spare room it has in the connection buffer by placing its current value of \(rwnd\) in the receive window field ("Window Size" in the TCP segment figure before) of every segment it sends to the sender.
For the sender, \(LastByteSent – LastByteAcked\), is the amount of unacknowledged data that A has sent into the connection.
Thus, it makes sure throughout the connection’s life that \(LastByteSent−LastByteAcked≤rwnd\)
Issue Addressing: The TCP specification requires Host A to continue to send segments with one data byte when B’s receive window is zero.
4.7 TCP Connection Management
1) SYN: SYN = 1; 2) SYNACK: SYN = 1; 3) SYN = 0
(SYN: synchronize)
When receiving a segment at wrong ports, the host will send RST (reset).
4.8 Congestion Control
Animation: Congestion Control
End-to-end congestion control
Network-assisted congestion control
Congestion window: \(cwnd\)
\(LastByteSent - LastByteAcked \le min\{cwnd, rwnd\}\)
1) limit: adjusting the value of \(cwnd\)
2) perceive: "loss event": timeout or 4 ACKs (1 original + 3 duplicate)
3) change: increase/decrease \(cwnd\) according to the rate at which ACKs arrive (self-clocking)
How to determine the rate? Guiding Principles:
1) Lost segments, lower rate
2) ACK segments, higher rate
3) Keep probing the bandwidth
4.8.1 Slow Start
ssthresh: slow start threshold
4.8.2 Congestion Avoidance
4.8.3 Fast Recovery
FSM Summary:
The additive-increase/multiplicative-decrease (AIMD) feedback control algorithm
Throughput: associated with loss rate
High-bandwidth: average throughput ~ loss rate (L), RTT, maximum segment size (MSS): \(aver(throughput) = {1.22MSS \over RTT \sqrt L}\)
4.8.4 Fairness
Ideal model with same MSS and RTT for a single associated TCP connection:
In real world, those sessions with a smaller RTT will enjoy higher throughput. (open their congestion windows faster ???)
It is possible for UDP sources to crowd out TCP traffic.
There is nothing to stop a TCP-based application from using multiple parallel connections.
4.8.5 ECN
Explicit Congestion Notification
Need recent extensions to work.
2 bits "Type of Service" field of the IP datagram header are used for ECN.
One for the router, another for the sending host (to inform routers the ECN capability).
When the receiving host receives an ECN indication, it will setting the ECE (Explicit Congestion Notification Echo) bit in TCP ACK segment.
Then the TCP sender, reacts to an ACK with an ECE congestion indication by halving the congestion window.