持续思索,探究根本原因
基于 packetdrill TCP 三次握手脚本,通过构造模拟服务器端场景,继续研究测试 TCP 慢启动现象。
基础脚本
# cat tcp_tcp_slow_start_000.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1460>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4#
慢启动
TCP 慢启动是 TCP 拥塞控制的一种初始机制,其核心目的是在连接建立或重传超时后,快速而谨慎地探测网络的可用带宽。发送方通过维护一个拥塞窗口(cwnd)来控制未确认数据量,在慢启动阶段,每收到一个确认数据包(ACK),cwnd 就增加一个最大报文段长度(MSS)。这使得在一个往返时延(RTT)内,能发送的数据量大约会翻倍,从而实现窗口的指数级增长,直至达到慢启动阈值(ssthresh)或检测到数据包丢失,此后连接将转入线性增长的拥塞避免阶段。
在 Linux 实现中,慢启动过程会在三种典型场景下触发:连接建立时(默认 cwnd=10、ssthresh 为极大值),确保必然进入慢启动(除非路由表配置覆盖);RTO 超时重传时将 cwnd 重置为1,同时根据拥塞控制算法重新计算 ssthresh 后重新启动慢启动;连接空闲超过 RTO 时间时也会重新初始化拥塞窗口,通过慢启动重新探测网络状态。这三种机制共同保障 TCP 在不同场景下都能合理适配网络容量。
基础测试
仍然是连接初始建立的场景,在文章中的实验中,发送数据所使用的均是 MSS 大小,那么有人问,cwnd 的增长和数据大小是否有关,是否需要满足 MSS 大小。
对于这个问题,我们可以减少数据段大小,验证即可,在 tcp_slow_start_1_003.pkt 脚本基础上修改 1000 为 100 字节。
# cat tcp_slow_start_1_004.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 %{print (tcpi_snd_cwnd, tcpi_snd_ssthresh)}%+0.01 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 101 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 201 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 301 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 401 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 501 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 601 win 10000+0 %{print (tcpi_snd_cwnd)}%#
运行脚本后,运行脚本后,在收到 ACK Num 之后的 cwnd 值发生了一定变化,ACK Num 1001 增加 1 为 11 ,ACK Num 2001 继续增加 1 为 12,但之后的 ACK Num 3001-6001 都不再变化,维持在 12 。
该实验结果与 tcp_slow_start_1_003.pkt 一致,说明和数据包所发送的大小无关,与 MSS 也无关。
# packetdrill tcp_slow_start_1_004.pkt10 214748364710111212121212## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes21:16:39.127760 tun0 In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 021:16:39.127825 tun0 Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [S.], seq 923791363, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 021:16:39.138164 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 1, win 10000, length 021:16:39.158721 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [P.], seq 1:101, ack 1, win 64240, length 100: HTTP21:16:39.158786 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [P.], seq 101:201, ack 1, win 64240, length 100: HTTP21:16:39.158835 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [P.], seq 201:301, ack 1, win 64240, length 100: HTTP21:16:39.158888 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [P.], seq 301:401, ack 1, win 64240, length 100: HTTP21:16:39.158943 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [P.], seq 401:501, ack 1, win 64240, length 100: HTTP21:16:39.158990 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [P.], seq 501:601, ack 1, win 64240, length 100: HTTP21:16:39.169249 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 101, win 10000, length 021:16:39.169318 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 201, win 10000, length 021:16:39.169363 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 301, win 10000, length 021:16:39.169398 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 401, win 10000, length 021:16:39.169434 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 501, win 10000, length 021:16:39.169479 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [.], ack 601, win 10000, length 021:16:39.183199 ? Out IP 192.168.74.165.8080 > 192.0.2.1.52837: Flags [F.], seq 601, ack 1, win 64240, length 021:16:39.183226 ? In IP 192.0.2.1.52837 > 192.168.74.165.8080: Flags [R.], seq 1, ack 601, win 10000, length 0#
而在实验过程中,曾经有一个小插曲,那就是忘了关闭 Nagle 算法,差点带偏了结论。如下脚本,默认开启了 Nagle 算法。
# cat tcp_slow_start_1_005.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 %{print (tcpi_snd_cwnd, tcpi_snd_ssthresh)}%+0.01 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 write(4,...,100) = 100+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 101 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 201 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 301 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 401 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 501 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 601 win 10000+0 %{print (tcpi_snd_cwnd)}%#
运行脚本后,在收到 ACK Num 之后的 cwnd 值实际并没有变化,一直保持是 10 ,并没有任何增加。
# packetdrill tcp_slow_start_1_005.pkt 10 214748364710101010101010## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes21:26:56.792587 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 021:26:56.792616 tun0 Out IP 192.168.173.236.8080 > 192.0.2.1.50429: Flags [S.], seq 753209366, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 021:26:56.802707 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 1, win 10000, length 021:26:56.822865 tun0 Out IP 192.168.173.236.8080 > 192.0.2.1.50429: Flags [P.], seq 1:101, ack 1, win 64240, length 100: HTTP21:26:56.833053 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 101, win 10000, length 021:26:56.833138 tun0 Out IP 192.168.173.236.8080 > 192.0.2.1.50429: Flags [P.], seq 101:601, ack 1, win 64240, length 500: HTTP21:26:56.833200 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 201, win 10000, length 021:26:56.833216 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 301, win 10000, length 021:26:56.833232 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 401, win 10000, length 021:26:56.833246 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 501, win 10000, length 021:26:56.833258 tun0 In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [.], ack 601, win 10000, length 021:26:56.920009 ? Out IP 192.168.173.236.8080 > 192.0.2.1.50429: Flags [F.], seq 601, ack 1, win 64240, length 021:26:56.920144 ? In IP 192.0.2.1.50429 > 192.168.173.236.8080: Flags [R.], seq 1, ack 601, win 10000, length 0#
为什么会是这样的情况,实际上主要看 tcpdump 数据包就能看出端倪,在 Nagle 算法的作用下,实际 write 的第 2-6 个数据包 Seq 101:601 是无法发送的,只有当第一个发送出去的小数据包 Seq 1:101 得到 ACK 确认后才可以发出,且是合并发出。
那么如之前实验场景分析可知,在收到第一个 ACK 时,在 tcp_is_cwnd_limited() 函数中,由于处于慢启动阶段,return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out 的结果,由于 10 并不小于 2 * 1,因此返回的结果实际为 false ,因此在 cubictcp_cong_avoid() 函数中 if (!tcp_is_cwnd_limited(sk)) 判断成立,直接 return , cwnd 仍然为 10。而在之后的第二至第六个 ACK,实际也和第一个 ACK 处理过程一样,并不会增长 cwnd 。
结合之前实验现象,所以说类似每收到一个确认数据包(ACK),cwnd 就增加一个最大报文段长度(MSS)的说法像是宏观描述,而在现有的 Linux 实现上,拥塞控制算法 cubic 中最主要的就是关于 tcp_snd_cwnd(tp) < 2 * tp->max_packets_out 的比较值。
/* We follow the spirit of RFC2861 to validate cwnd but implement a more* flexible approach. The RFC suggests cwnd should not be raised unless* it was fully used previously. And that's exactly what we do in* congestion avoidance mode. But in slow start we allow cwnd to grow* as long as the application has used half the cwnd.* Example :* cwnd is 10 (IW10), but application sends 9 frames.* We allow cwnd to reach 18 when all frames are ACKed.* This check is safe because it's as aggressive as slow start which already* risks 100% overshoot. The advantage is that we discourage application to* either send more filler packets or data to artificially blow up the cwnd* usage, and allow application-limited process to probe bw more aggressively.*/staticinlinebooltcp_is_cwnd_limited(conststruct sock *sk){const struct tcp_sock *tp = tcp_sk(sk);if (tp->is_cwnd_limited)return true;/* If in slow start, ensure cwnd grows to twice what was ACKed. */if (tcp_in_slow_start(tp))return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out;return false;}
譬如说明中的 cwnd 10,应用发 9 个数据包,当所有数据包被 ACK 确认后,所允许 cwnd 增大到 18;实验中的 cwnd 10,应用发 6 个数据包,当所有数据包被 ACK 确认后,观察的现象是 cwnd 允许增大到 12。
往期推荐
推荐站内搜索:最好用的开发软件、免费开源系统、渗透测试工具云盘下载、最新渗透测试资料、最新黑客工具下载……




还没有评论,来说两句吧...