Quantcast
Channel: Intel Communities: Message List - Wired Ethernet
Viewing all articles
Browse latest Browse all 9952

Are our servers under-CPUed for their NICs?

$
0
0

Recently we got a few new servers. All have identical configuration. Each has dual E5-2620v3 2.4Ghz CPUs, 128GiB RAM (8 x 16GiB DDR4 DIMMs), 1 dual-40G XL710, and two dual 10G SPF+ mezz cards (i.e. 4 x 10G SPF+ ports). All of them run CentOS 7.1 x86_64.  These XL710s are connected to the 40G ports of QCT LY8 switches using genuine Intel QSFP+ DACs.  All 10G SPF+ ports are connected to Arista 7280SE-68 switches, but using third party DACs.  All systems have been so far minimally tuned:

  • In each's BIOS, the pre-defined "High Performance" profile is selected, furthermore, Intel I/OAT is enabled, VT-d is disabled (We don't need them to run virtual machines, they are for HPC applications).
  • In each's CentOS, the tuned-adm active is set to network-throughput.

 

After the servers have been setup, we have been using iperf3 to run long-running tests among such servers. So far, we have observed consistent packet drops on the receiving side.  An example:


[root@sc2u1n0 ~]# netstat -i

Kernel Interface table

Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg

ens10f0  9000 236406987      0      0 0      247785514      0      0      0 BMRU

ens1f0    9000 363116387      0  2391 0      2370529766      0      0      0 BMRU

ens1f1    9000 382484140      0  2248 0      2098335636      0      0      0 BMRU

ens20f0  9000 565532361      0  2258 0      1472188440      0      0      0 BMRU

ens20f1  9000 519587804      0  4225 0      5471601950      0      0      0 BMRU

lo      65536 19058603      0      0 0      19058603      0      0      0 LRU



We have also observed iperf3 retries at the beginning of a test session and often, during a session (not as often however).  Two examples:


40G pairs:


$ iperf3 -c 192.168.11.100  -i 1 -t 10

Connecting to host 192.168.11.100, port 5201

[  4] local 192.168.11.103 port 59351 connected to 192.168.11.100 port 5201

[ ID] Interval          Transfer    Bandwidth      Retr  Cwnd

[  4]  0.00-1.00  sec  2.77 GBytes  23.8 Gbits/sec  54    655 KBytes  

[  4]  1.00-2.00  sec  4.26 GBytes  36.6 Gbits/sec    0  1.52 MBytes  

[  4]  2.00-3.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.12 MBytes  

[  4]  3.00-4.00  sec  4.53 GBytes  38.9 Gbits/sec    0  2.57 MBytes  

[  4]  4.00-5.00  sec  4.00 GBytes  34.4 Gbits/sec    7  1.42 MBytes  

[  4]  5.00-6.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.01 MBytes  

[  4]  6.00-7.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.47 MBytes  

[  4]  7.00-8.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.88 MBytes  

[  4]  8.00-9.00  sec  4.61 GBytes  39.6 Gbits/sec    0  3.21 MBytes  

[  4]  9.00-10.00  sec  4.61 GBytes  39.6 Gbits/sec    0  3.52 MBytes  

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval          Transfer    Bandwidth      Retr

[  4]  0.00-10.00  sec  43.2 GBytes  37.1 Gbits/sec  61            sender

[  4]  0.00-10.00  sec  43.2 GBytes  37.1 Gbits/sec                  receiver

 

82599 powered 10G pairs:

 

$ iperf3 -c 192.168.15.100 -i 1 -t 10

Connecting to host 192.168.15.100, port 5201

[  4] local 192.168.16.101 port 53464 connected to 192.168.15.100 port 5201

[ ID] Interval          Transfer    Bandwidth      Retr  Cwnd

[  4]  0.00-1.00  sec  1.05 GBytes  9.05 Gbits/sec  722  1.97 MBytes  

[  4]  1.00-2.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.80 MBytes  

[  4]  2.00-3.00  sec  1.10 GBytes  9.42 Gbits/sec  23  2.15 MBytes  

[  4]  3.00-4.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.16 MBytes  

[  4]  4.00-5.00  sec  1.09 GBytes  9.41 Gbits/sec    0  2.16 MBytes  

[  4]  5.00-6.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.17 MBytes  

[  4]  6.00-7.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.18 MBytes  

[  4]  7.00-8.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.22 MBytes  

[  4]  8.00-9.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.27 MBytes  

[  4]  9.00-10.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.34 MBytes  

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval          Transfer    Bandwidth      Retr

[  4]  0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec  745            sender

[  4]  0.00-10.00  sec  10.9 GBytes  9.37 Gbits/sec                  receiver


Looking around, I ran into a 40G NIC Tuning article on the DOE Energy Science Network fast data site, quoted "At the present time (February 2015), CPU clock rate still matters a lot for 40G hosts.  In general, higher CPU clock rate is far more important than high core count for a 40G host.  In general, you can expect it to be very difficult to achieve 40G performance with a CPU that runs more slowly than 3GHz per core.We don't have such fast CPUs The E5-2620v3 is a mid-range CPU from the Basic category, not even the Performance category. So,

  • Are our servers too rich in NICs, but under-powered CPU-wise? 
  • Is there anything that we can do to get these servers to behave at least reasonably?  Especially, not dropping packets?


BTW, a few days ago we updated all servers with the most recent Intel stable i40e and ixgbe drivers, but we have not run the set_irq_affinity CPU yet. Neither we have tuned the NIC (e.g. adjusting rx-usecs value etc). The reason is because each server runs two highly concurrent applications which tend to use all the cores. We are afraid that to use the set_irq_affinity script, we may negatively impact the performance of our applications. But if Intel folks consider running the script beneficial, we are willing to try.

 

Regards,

 

-- Zack


Viewing all articles
Browse latest Browse all 9952

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>