持续思索,探究根本原因
基于 packetdrill TCP 三次握手脚本,通过构造模拟服务器端场景,继续研究测试 TCP 慢启动现象。
基础脚本
# cat tcp_tcp_slow_start_000.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1460>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4#
慢启动
TCP 慢启动是 TCP 拥塞控制的一种初始机制,其核心目的是在连接建立或重传超时后,快速而谨慎地探测网络的可用带宽。发送方通过维护一个拥塞窗口(cwnd)来控制未确认数据量,在慢启动阶段,每收到一个确认数据包(ACK),cwnd 就增加一个最大报文段长度(MSS)。这使得在一个往返时延(RTT)内,能发送的数据量大约会翻倍,从而实现窗口的指数级增长,直至达到慢启动阈值(ssthresh)或检测到数据包丢失,此后连接将转入线性增长的拥塞避免阶段。
在 Linux 实现中,慢启动过程会在三种典型场景下触发:连接建立时(默认 cwnd=10、ssthresh 为极大值),确保必然进入慢启动(除非路由表配置覆盖);RTO 超时重传时将 cwnd 重置为1,同时根据拥塞控制算法重新计算 ssthresh 后重新启动慢启动;连接空闲超过 RTO 时间时也会重新初始化拥塞窗口,通过慢启动重新探测网络状态。这三种机制共同保障 TCP 在不同场景下都能合理适配网络容量。
基础测试
仍然是连接初始建立的场景,在文章和中的实验中,不知是否注意到过,ACK 数据包的个数以及实验结果,严格上来说,和大多数资料总结所说的每收到一个确认数据包(ACK),cwnd 就增加一个最大报文段长度(MSS)的宏观描述,在实现上还是不一样的,更微观,更精细。
对于这个问题,我们首先在 tcp_slow_start_1_003.pkt 脚本基础上,修改成一个 ACK 数据包。
# cat tcp_slow_start_1_006.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 %{print (tcpi_snd_cwnd, tcpi_snd_ssthresh)}%+0.01 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 6001 win 10000+0 %{print (tcpi_snd_cwnd)}%#
运行脚本后,在收到 ACK Num 之后的 cwnd 值发生了变化,由 10 增长到 16 。
# packetdrill tcp_slow_start_1_006.pkt10 21474836471016## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes10:12:21.072581 tun0 In IP 192.0.2.1.47557 > 192.168.218.28.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 010:12:21.072669 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [S.], seq 4132414984, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 010:12:21.082807 tun0 In IP 192.0.2.1.47557 > 192.168.218.28.8080: Flags [.], ack 1, win 10000, length 010:12:21.103004 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP10:12:21.103019 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP10:12:21.103025 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [P.], seq 2001:3001, ack 1, win 64240, length 1000: HTTP10:12:21.103030 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [P.], seq 3001:4001, ack 1, win 64240, length 1000: HTTP10:12:21.103035 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [P.], seq 4001:5001, ack 1, win 64240, length 1000: HTTP10:12:21.103040 tun0 Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [P.], seq 5001:6001, ack 1, win 64240, length 1000: HTTP10:12:21.113079 tun0 In IP 192.0.2.1.47557 > 192.168.218.28.8080: Flags [.], ack 6001, win 10000, length 010:12:21.213575 ? Out IP 192.168.218.28.8080 > 192.0.2.1.47557: Flags [F.], seq 6001, ack 1, win 64240, length 010:12:21.213601 ? In IP 192.0.2.1.47557 > 192.168.218.28.8080: Flags [R.], seq 1, ack 6001, win 10000, length 0#
🤔 cwnd 不是 10 也不是 12,而是 16,是否意外这个实验结果?话说,对于这个 16,我是挺意外的,之前说法的不会超过 12 呢~
归根结底,还得是代码说的算,仍然是之前文章中所提到的代码内容。当收到 ACK 时,在 tcp_is_cwnd_limited() 函数中,由于处于慢启动阶段,return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out 的结果,由于 10 小于 2 * 6,因此返回的结果实际为 true ,因此在 cubictcp_cong_avoid() 函数中,同样由于处于慢启动阶段,调用了 tcp_slow_start() ,该函数中涉及到具体 cwnd 的增长执行,这个过程与之前一致。
区别就在于 cubictcp_cong_avoid() 函数 acked 参数的值,在这个实验中由于此 ACK 数据包确认的数据段为 6 个,因此值为 6,cwnd 10 + 6 = 16,因此最终 cwnd 为 16。而之前实验虽然有多个 ACK 数据包,但每个 ACK 数据包所确认的段仅为一个,acked 参数的值为 1 ,所以 cwnd + 1 , 逐步计算出来的值就为 10、11、12 。
net.ipv4.tcp_allowed_congestion_control = reno cubicnet.ipv4.tcp_available_congestion_control = reno cubicnet.ipv4.tcp_congestion_control = cubicstaticvoidcubictcp_cong_avoid(struct sock *sk, u32 ack, u32 acked){struct tcp_sock *tp = tcp_sk(sk);struct bictcp *ca = inet_csk_ca(sk);if (!tcp_is_cwnd_limited(sk))return;if (tcp_in_slow_start(tp)) {acked = tcp_slow_start(tp, acked);if (!acked)return;}bictcp_update(ca, tcp_snd_cwnd(tp), acked);tcp_cong_avoid_ai(tp, ca->cnt, acked);}static inline booltcp_is_cwnd_limited(conststruct sock *sk){const struct tcp_sock *tp = tcp_sk(sk);if (tp->is_cwnd_limited)return true;/* If in slow start, ensure cwnd grows to twice what was ACKed. */if (tcp_in_slow_start(tp))return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out;return false;}u32 tcp_slow_start(struct tcp_sock *tp, u32 acked){u32 cwnd = min(tcp_snd_cwnd(tp) + acked, tp->snd_ssthresh);acked -= cwnd - tcp_snd_cwnd(tp);tcp_snd_cwnd_set(tp, min(cwnd, tp->snd_cwnd_clamp));return acked;}EXPORT_SYMBOL_GPL(tcp_slow_start);
我们继续用另一个实验验证,每两个数据段使用一个 ACK 数据包进行确认,脚本如下。
# cat tcp_slow_start_1_007.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 %{print (tcpi_snd_cwnd, tcpi_snd_ssthresh)}%+0.01 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 2001 win 10000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 4001 win 10000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 6001 win 10000+0 %{print (tcpi_snd_cwnd)}%#
运行脚本后,在收到 ACK Num 之后的 cwnd 值发生了变化,由 10 增长到 12 。
当收到第一个 ACK 时,在 tcp_is_cwnd_limited() 函数中,由于处于慢启动阶段,return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out 的结果,由于 10 小于 2 * 6,因此返回的结果实际为 true ,因此在 cubictcp_cong_avoid() 函数中,同样由于处于慢启动阶段,调用了 tcp_slow_start() ,该函数中涉及到具体 cwnd 的增长执行,由 10 增长为 12。
而在之后收到的第二个和第三个 ACK 时,由于 cwnd 12 不小于 2 * 16 的预估,因此并不涉及到 cwnd 的增长,维持在 12 。
# packetdrill tcp_slow_start_1_007.pkt10 214748364710121212## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes15:13:27.052506 tun0 In IP 192.0.2.1.45347 > 192.168.143.169.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 015:13:27.052536 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [S.], seq 4183789466, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 015:13:27.062620 tun0 In IP 192.0.2.1.45347 > 192.168.143.169.8080: Flags [.], ack 1, win 10000, length 015:13:27.082738 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP15:13:27.082753 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP15:13:27.082759 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [P.], seq 2001:3001, ack 1, win 64240, length 1000: HTTP15:13:27.082764 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [P.], seq 3001:4001, ack 1, win 64240, length 1000: HTTP15:13:27.082768 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [P.], seq 4001:5001, ack 1, win 64240, length 1000: HTTP15:13:27.082775 tun0 Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [P.], seq 5001:6001, ack 1, win 64240, length 1000: HTTP15:13:27.092838 tun0 In IP 192.0.2.1.45347 > 192.168.143.169.8080: Flags [.], ack 2001, win 10000, length 015:13:27.102958 tun0 In IP 192.0.2.1.45347 > 192.168.143.169.8080: Flags [.], ack 4001, win 10000, length 015:13:27.113084 tun0 In IP 192.0.2.1.45347 > 192.168.143.169.8080: Flags [.], ack 6001, win 10000, length 015:13:27.182834 ? Out IP 192.168.143.169.8080 > 192.0.2.1.45347: Flags [F.], seq 6001, ack 1, win 64240, length 015:13:27.182861 ? In IP 192.0.2.1.45347 > 192.168.143.169.8080: Flags [R.], seq 1, ack 6001, win 10000, length 0#
再看下一个 RTT 时间内的说法,我们调整下数据包发送和接收 ACK 的时间间隔,脚本如下。
# cat tcp_slow_start_1_008.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 %{print (tcpi_snd_cwnd, tcpi_snd_ssthresh)}%+0.01 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 %{print (tcpi_snd_cwnd)}%+0.02 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 4001 win 10000+0 %{print (tcpi_snd_cwnd)}%+0 <. 1:1(0) ack 6001 win 10000+0 %{print (tcpi_snd_cwnd)}%#
运行脚本后,在收到 ACK Num 之后的 cwnd 值发生了变化,由 10 增长到 14 , cwnd 的增长过程不再赘述,需要注意的是这里的实现,并没有涉及到 rtt 相关,不影响 cwnd 的计算。
# packetdrill tcp_slow_start_1_008.pkt10 214748364710101414## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decodelistening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes15:31:06.492496 tun0 In IP 192.0.2.1.52761 > 192.168.179.148.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 015:31:06.492526 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [S.], seq 4186402169, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 015:31:06.502622 tun0 In IP 192.0.2.1.52761 > 192.168.179.148.8080: Flags [.], ack 1, win 10000, length 015:31:06.522759 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP15:31:06.522779 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP15:31:06.542830 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [P.], seq 2001:3001, ack 1, win 64240, length 1000: HTTP15:31:06.542849 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [P.], seq 3001:4001, ack 1, win 64240, length 1000: HTTP15:31:06.542854 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [P.], seq 4001:5001, ack 1, win 64240, length 1000: HTTP15:31:06.542859 tun0 Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [P.], seq 5001:6001, ack 1, win 64240, length 1000: HTTP15:31:06.552909 tun0 In IP 192.0.2.1.52761 > 192.168.179.148.8080: Flags [.], ack 4001, win 10000, length 015:31:06.552981 tun0 In IP 192.0.2.1.52761 > 192.168.179.148.8080: Flags [.], ack 6001, win 10000, length 015:31:06.643379 ? Out IP 192.168.179.148.8080 > 192.0.2.1.52761: Flags [F.], seq 6001, ack 1, win 64240, length 015:31:06.643908 ? In IP 192.0.2.1.52761 > 192.168.179.148.8080: Flags [R.], seq 1, ack 6001, win 10000, length 0#
确切来说,是从发出 N 个数据到收到第一个 ACK 的这个区间内,tp->max_packets_out 值是多少,上述实验中是 6 ,因此比较的是 10 和 2*6 、14 和 2*6 。
再次调整下数据包发送和接收 ACK 的时间间隔,脚本如下。
# cat tcp_slow_start_1_009.pkt0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0+0 bind(3, ..., ...) = 0+0 listen(3, 1) = 0+0 < S 0:0(0) win 10000 <mss 1000,nop,nop,sackOK>+0 > S. 0:0(0) ack 1 <...>+0.01 < . 1:1(0) ack 1 win 10000+0 accept(3, ..., ...) = 4+0.01 %{print (tcpi_snd_cwnd, tcpi_snd_ssthresh)}%+0.01 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 2001 win 10000+0 %{print (tcpi_snd_cwnd)}%+0.01 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 write(4,...,1000) = 1000+0 %{print (tcpi_snd_cwnd)}%+0.01 <. 1:1(0) ack 6001 win 10000+0 %{print (tcpi_snd_cwnd)}%#
运行脚本后,在收到 ACK Num 之后的 cwnd 值并没有变化,始终维持在 10 。
当收到第一个 ACK 时,在 tcp_is_cwnd_limited() 函数中,由于处于慢启动阶段,return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out 的结果,由于 10 不小于 2 * 2,因此返回的结果实际为 false,并不涉及到 cwnd 的增长,维持在 10。
当收到第二个 ACK 时,仍在 tcp_is_cwnd_limited() 函数中,由于处于慢启动阶段,return tcp_snd_cwnd(tp) < 2 * tp->max_packets_out 的结果,尽管 tp->max_packets_out 在第二次增长到 4 ,但由于 10 仍然不小于 2 * 4,因此返回的结果实际仍然为 false,一样不涉及到 cwnd 的增长,继续维持在 10。
# packetdrill tcp_slow_start_1_009.pkt10 214748364710101010## tcpdump -i any -nn port 8080tcpdump: data link type LINUX_SLL2tcpdump: verbose output suppressed, use -v[v]... for full protocol decode15:44:11.044511 tun0 In IP 192.0.2.1.53009 > 192.168.105.0.8080: Flags [S], seq 0, win 10000, options [mss 1000,nop,nop,sackOK], length 015:44:11.044544 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [S.], seq 952991133, ack 1, win 64240, options [mss 1460,nop,nop,sackOK], length 015:44:11.054640 tun0 In IP 192.0.2.1.53009 > 192.168.105.0.8080: Flags [.], ack 1, win 10000, length 015:44:11.074780 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [P.], seq 1:1001, ack 1, win 64240, length 1000: HTTP15:44:11.074808 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [P.], seq 1001:2001, ack 1, win 64240, length 1000: HTTP15:44:11.084885 tun0 In IP 192.0.2.1.53009 > 192.168.105.0.8080: Flags [.], ack 2001, win 10000, length 015:44:11.095043 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [P.], seq 2001:3001, ack 1, win 64240, length 1000: HTTP15:44:11.095059 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [P.], seq 3001:4001, ack 1, win 64240, length 1000: HTTP15:44:11.095064 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [P.], seq 4001:5001, ack 1, win 64240, length 1000: HTTP15:44:11.095071 tun0 Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [P.], seq 5001:6001, ack 1, win 64240, length 1000: HTTP15:44:11.105135 tun0 In IP 192.0.2.1.53009 > 192.168.105.0.8080: Flags [.], ack 6001, win 10000, length 015:44:11.267371 ? Out IP 192.168.105.0.8080 > 192.0.2.1.53009: Flags [F.], seq 6001, ack 1, win 64240, length 015:44:11.267432 ? In IP 192.0.2.1.53009 > 192.168.105.0.8080: Flags [R.], seq 1, ack 6001, win 10000, length 0#
往期推荐
推荐站内搜索:最好用的开发软件、免费开源系统、渗透测试工具云盘下载、最新渗透测试资料、最新黑客工具下载……




还没有评论,来说两句吧...