A Survey of High-Performance Interconnection Networks in High-Performance Computer Systems
Round 1
Reviewer 1 Report
Summary:
The paper begins with a survey on the network topologies used in the current Top500 supercomputer list. Following the survey there is a discussion of the challenges going forward when trying to further improve performance and some speculation on how those challenges could be addressed. I found the survey part very interesting, especially as a reader that is not that familiar with the interconnect technologies being used. This is the stronger part of the paper. It is based on data that has been collected and then summarized. The speculation on how to address the challenges is weak and largely lacks any concrete basis.
Overall, I like the structure and idea of this paper. However, it needs improvement as suggested in the detailed comments below. The remainder of this review will address the two parts of the paper separately.
Survey:
The survey portion was the most interesting to me because I am not an HPC person, but I have been interested to learn about what current HPC systems use for their interconnect. The information presented was helpful but still left me with several questions and I think the presentation could be organized much better. I thought that Figure 2 was very informative and a nice way to present that information.
I would like to see you address the following questions more precisely. I think this would help someone like me get much more out of this paper.
One of the questions I have always had is how much these interconnects use very customized hardware versus using more commodity components. By commodity, I mean, for example, Ethernet and Infiniband where I believe you could purchase such components from possibly several vendors and they would be compatible and interchangeable. This is versus what I understand from your paper for an interconnect such as Slingshot. Even with Slingshot, I understand that it has some compatibility with Ethernet. What about for systems such as Fugaku? You do not mention Ethernet, so I'm assuming there is no compatibility. Another way to think of this is whether custom/proprietary chips are used, especially in the switches.
Another question is what kind of protocols are used in the network. I guess you would say an Infiniband network uses Infiniband, but an Ethernet network could use some form of TCP (standard or modified?), UDP, or some other protocol running on top of Ethernet. Can you make some comparison about the properties of the protocols? What kind of latencies do they have for small or large transfers? Or, or they intended specifically for small or large transfers? Characterizations like this would be interesting and possibly identify other opportunities for improvements.
How is reliable communications handled?
What is the available bandwidth? This could include both the raw network bandwidth and the effective bandwidth once all overheads, including software, are considered.
Does everyone run MPI on top of these networks without any adaptation required at the application level? Do the interconnects have their own communication protocols that might be more efficient that MPI?
Are there other such questions that you can think of that I should be asking, and you might then put in the paper? You are presumably the experts, so I rely on you to tell me what is important.
Survey organization:
I think a better organization of the survey could be made. I suggest that you first list the important properties and then identify all the properties that you want to discuss. Provide a list with a short discussion of what each property means. Then for each type of network, the discussion could be organized the same, which would make it easier to follow. If this could all be captured in a summary table, I think that would make it much easier to see a quick comparison of all systems.
Some of the properties to discuss could be the limitations, like what prevents that network from scaling, or improving in performance. This would then naturally segue to the next part of the paper.
Challenges part of the paper:
This part of the paper needs much improvement.
Given the survey, I appreciate that you try to understand the current state-of-the-art and try to discuss how to go forward. Again, starting with a short summary list of the challenges would help with the organization, which you can then expand with sections that you already have. You should make clear connections between discussion in this section and the limitations you identify from the survey.
I also found that a large part of the discussion in this part was very speculative and lacked the references required to substantiate your statements. For example, you make a recommendation for optical computers and silicon-optical integration based on carbon nanotubes. On what basis do you make this statement? Why this approach and not some other? For each challenge, rather than making such claims, it would be better to discuss the approaches that are currently being investigated to address the problem and provide the references. That way, you are not making statements that need to be defended. More so, you are just providing pointers to areas of relevant research that can be referenced and allow the reader to investigate further, or see in what areas they might be able to contribute that can impact HPC. For example, a reader, who is not an HPC person, might read this paper and find out that they could contribute to HPC because they know silicon photonics well and did not realize HPC was an important application space for that technology.
Some specific questions and points:
p. 2. ICN is first used in the Figure 1 caption, I think, but the acronym is never defined.
p. 4. "The HDR standard has a single link rate of 50Gbps and a single port bandwidth of 200Gbps, which can meet the requirements of current mainstream Exascale high-performance computing interconnection network." What are the requirements? Listing the requirements at the beginning, as suggested, would take care of this.
p. 4. What is the post-E-class era?
p. 4. QM8700 series products. What are these? You need a reference.
p. 4. NVM is not defined.
p. 5 at top. "The interconnection network between Summit and Sierra system cabinets adopts the Fat-tree topology as shown in Figure 3 and Figure 4." Figure 4 does not relate to Summit and Sierra.
p. 6. Is Fig. 5 the WFR referred to in the paragraph above the figure? Fig. 6 is the PRR?
p. 7. "In 2019, Cray gave up the new generation..." What do you mean by "gave up"?
p. 8. "As shown in Figure 8, A64FX integrates the Tofu ICN switches with the CPU..." I don't see what you are referring to in the figure. Where is the CPU in the figure? More explanation or better labeling is required.
p. 10. What are "Portals 4 communication primitives"?
p. 11. There are two figures labeled Figure 12 on this page.
p. 12. "According to statistics from the International Interconnection Forum, the doubling of the SerDes rate per generation only reduces single-bit performance by 20%." What does "single-bit performance" mean?
p. 12. "... Tianhe 2 system network router chip integrates 192 chan-nels of 14Gbps SerDes, and SerDes total power consumption About 90 watts, the total power consumption of the chip is about 120 watts, and the throughput rate is 5.376Tbps." This (non-)sentence cannot be parsed and understood.
192 x 14 = 2.688 Tbps. Where is the factor of 2 to achieve 5.376 Tbps?
p. 13. Last sentence of Section 2.2. Are you arguing for optical interconnects between boards and on the board? Can you justify with some numbers?
p. 13. "The international semi-conductor technology development route has predicted that the maximum number..." Needs a reference.
p. 14. What is a "scissor-like develpment trend"?
p. 14. "With the improvement of computing performance, the ratio of communication cabinets to computer cabinets has become the bottleneck of network engineering." Reference needed.
p. 14. Section 4. You list four trends. Where do these come from? Are there observations you have made to see these trends?
p. 14. "... aiming at 100G line speed to increase interconnection communication bandwidth." Why just 100G? I understand that 400G ethernet is already feasible.
p. 15. "... the three main directions of More Moore, More than Moore and Beyond CMOS," Needs a reference.
p. 15. Sect. 4.1 second paragraph is very long!!
p. 15. Section 4.22.5 Section number is strange
p. 15. "The EPAC1.0 test chip using the RISC-V architecture has been taped out and is scheduled to be launched in 2021." Needs reference.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The survey aims to show the latest trends for interconnection networks.
While there are some promising positive aspects, this paper is not yet meeting the quality standards expected from a survey paper.
== Pro ==
* Topic is important and beginners would benefit from a good technological overview
* Many important considerations are mentioned
== Cons ==
* The structure is very basic - Presentation is often just large blocks of facts
* Inter-comparison between technology is barely possible
* The image quality and reference style is not appropriate
Structure: The high-level overview/description of the paper structure is missing.
For example, there is little similarity in the content between sections in the network (e.g., Section 2.X). Text blocks often discuss each technology independently and in different order, which topics are discussed is unclear. A comparative analysis and contrast to other techs would be valuable instead of repeating (known) facts. For instance, creating a table with the most relevant characteristics would have reduced the text block and improved readability.
That would one of the actual contributions of such a survey paper.
Good messages are hidden in large blocks that appear to be too little structured, often
quite lengthy explanation with little content.
The assignment of facts to text block feels partially random.
For example at beginning of Section 4 "challenges" are mentioned but Section 3 should have been about these.The reference list is not compelling for a survey paper.
I believe this can be a good start toward a survey paper.
Unfortunately, at the moment, there is too little contribution and value for readers.
== Suggestions ==
high performance [noun] vs. high-performance [noun] => both are used, unify (high-performance [noun] appears correct).
Use proper supcaptions (Figure 1) -> a) Top 10, b) Top 100.
Figure 1 a) isn't that useful, IB + Cray... b) is more detailed but for the to 100..
Basically, you could say, the ratio stays the same between Top 10 and Top 100 (of course, rounded)...
topXX vs. TopXX vs. TOPXX vs. YYY XX => unify
Serdes vs. SerDes => unify
"The interconnection network between Summit and Sierra system cabinets adopts the Fat-tree topology as shown in Figure 3 and Figure 4." => Figure 4 is labeled as JUWELS IB network topology. This is confusing and not explained.
I'm not convinced you are the original creators of all images in the article.
If you have used an existing figure, you must cite it properly (even in the caption).
Facts regarding the future of Omni-Path are not completely true, Cornelis acquired the IP and will continue the development. e.g.,
https://www.hpcwire.com/2021/07/23/with-new-owner-and-new-roadmap-an-independent-omni-path-is-staging-a-comeback/
The quality of Fig 7, Fig 10, Fig 12 is not acceptable.
Fig 12 y-axes unclear.
"is scheduled to be launched in 2021" => did it launch in 2021?
"Standards , Which"
consumption, For
"Exploration of disruptive technologies such as new devices and new materials" => This title doesn't feel quite right.
"technologies In terms"
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The paper has been significantly approved. Thank you for making the changes. I found the paper much better organized and much more informative about the various ICNs being used and the challenges going forward. With a few more minor changes and some editing for English, I recommend that this paper be published.
p. 2: "et al." at end of first paragraph. While this does mean "and others", it should really only be used in author lists. It's also better to be specific than to just be vague about other topologies. So, either list the others of importance, or leave the sentence as, "The most common ICN topologies used in the Top 500 are direct k-ary cubes, ... and dragonfly."
p. 2 Figure 1: For (a) and (b) use the same colors for the same categories. This makes the visual comparison easier to see as you can see changes by looking for the same colors,
In (b), the "other" label is misplaced and should have a line to the corresponding wedge like the other labels.
p. 11 Figure 9: Thank you for the clearer explanation. You should also indicate that "c" in each of the CMGs is a processor core. At least that is my assumption. Are there really 13 cores as indicated by the figure?
p. 14: There is a problem with references 45 to 48. On this page, I think the references to 45 and 46 in the first paragraph of Sect. 3 should be to 47 and 48. In Sect. 3.1, the references to 47 and 48 should be to 45 and 46.
p. 17: "The performance of high-performance processors doubles every 18 months [45], ..." This is no longer true. Note that [45] is dated 2003, almost 20 years ago!! I've seen graphs of processor performance over the years, and I expect a similar one exists for SERDES, or you should be able to quickly produce one for SERDES rates over the years (look through ISSCC and JSSC, for example). This would actually be a nice addition to your paper. The graph should still show the widening gap you claim. Having such a graph would make your claim stronger.
p. 19: "By integrating ..." This paragraph is talking about what is being called a SmartNIC today, which can be implemented with ASICs, processor arrays and FPGAs. I see that Nvidia includes the ConnectX-7 you mention on page 5 as a SmartNIC: https://blogs.nvidia.com/blog/2021/10/29/what-is-a-smartnic/ You should introduce the SmartNIC terminology here.
p. 19: 4.2 D system integrated packaging... What is meant by "D system"?
Typos I detected:
p. 2: nodesmakes --> nodes makes
p. 4: swtich --> switch
p. 5: Siera --> Sierra
p. 9: Architectureis --> Architecture is
The paper requires editing to clean up some English grammar and style issues that would make it much easier to read and a much better paper.
Author Response
Point 1: p. 2: "et al." at end of first paragraph. While this does mean "and others", it should really only be used in author lists. It's also better to be specific than to just be vague about other topologies. So, either list the others of importance, or leave the sentence as, "The most common ICN topologies used in the Top 500 are direct k-ary cubes, ... and dragonfly."
Response 1: "et al." is deleted, and the sentence has been modified as “the most often used ICN topologies in Top 500 include direct k-ary n-cubes, fat trees, torus and mesh, and dragonfly” as suggested.
Point 2: p. 2 Figure 1: For (a) and (b) use the same colors for the same categories. This makes the visual comparison easier to see as you can see changes by looking for the same colors, In (b), the "other" label is misplaced and should have a line to the corresponding wedge like the other labels.
Response 2: As suggested, Figure 1: (a) and (b) use the same colors for the same categories, including “Infiniband”, “Cray”, and “other”. In (b), the misplaced label of "other" has been placed beside the corresponding wedge like the other labels.
Point 3: p. 11 Figure 9: Thank you for the clearer explanation. You should also indicate that "c" in each of the CMGs is a processor core. At least that is my assumption. Are there really 13 cores as indicated by the figure?
Response 3: Each A64FX contains 48 computing cores and 4 assistant cores. And each CMG contains 12 computing cores and 1 assistant cores. For the clearer explanation, the following sentence is added to th manuscript: “Each CMG contains 12 computing cores and 1 assistant cores, and each core is marked as “c” in Figure 9.”
Point 4: p. 14: There is a problem with references 45 to 48. On this page, I think the references to 45 and 46 in the first paragraph of Sect. 3 should be to 47 and 48. In Sect. 3.1, the references to 47 and 48 should be to 45 and 46.
Response 4: Thanks for the examination. The references to 45 and 46 should be references to 47 and 48, which refer to the work of Liao Xiangke and Lu Yutong respectively. And The references to 47 and 48 should be references to 45 and 46, which refer to the work of Moore and Dennard respectively. The references have been corrected.
Point 5: p. 17: "The performance of high-performance processors doubles every 18 months [45], ..." This is no longer true. Note that [45] is dated 2003, almost 20 years ago!! I've seen graphs of processor performance over the years, and I expect a similar one exists for SERDES, or you should be able to quickly produce one for SERDES rates over the years (look through ISSCC and JSSC, for example). This would actually be a nice addition to your paper. The graph should still show the widening gap you claim. Having such a graph would make your claim stronger.
Response 5:
Discription of Moore’s Law is modified as :” According to Moore’s Law, the performance of high-performance processors doubles every 18 months which was claimed in 1975 [73], although it began to slow sometime around 2000 and by 2018 showed a roughly 15-fold gap between Moore’s prediction and current capability, an observation Moore made in 2003 that was inevitable. The current expectation is that the gap will continue to grow as CMOS technology approaches fundamental limits [45]”.
In terms of SERDES data rate over the years, data rate mainly include four types of standard, including OIF CEI, PCIe, USB and Ethernet, and the develop roadmap has been reorganized by Ohio State University as followings.
In proceedings of ISSCC-2022, Mike Peng Li of Intel also present the talk of “Paving the Way to 200Gb/s Transceivers”, in which he proposed Moore’s Law for High-Speed I/O (HSIO) and ITRS High-Speed I/O Speed Roadmap as follows:
From the above two figures, we can see that Serdes data rate includes several types of standard, therefore, the figure about the widening gap netween the performance of high-performance processors and SERDES rates would be very disorganized, and I don’t draw this figure, and beg your pardon.
Point 6: p. 19: "By integrating ..." This paragraph is talking about what is being called a SmartNIC today, which can be implemented with ASICs, processor arrays and FPGAs. I see that Nvidia includes the ConnectX-7 you mention on page 5 as a SmartNIC: https://blogs.nvidia.com/blog/2021/10/29/what-is-a-smartnic/ You should introduce the SmartNIC terminology here.
Response 6: Introduction of SmartNIC is added to the manuscript as follows: “, which is also called SmartNIC (Smart Network Interface Card) [68, 69] technology. SmartNICs offload from server CPUs an expanding array of jobs required to manage modern distributed applications.” Meanwhile, two references labeled as new 68 and 69 are also added to the manuscript.
[68] What is a SmartNIC? [EB/OL]. [2022-4-7]. https://blogs.nvidia.com/blog/2021/10/29/what-is-a-smartnic/
[69] ConnectX SMARTNICS. [2022-4-7]. https://www.nvidia.com/en-us/networking/ethernet-adapters
Point 7: p. 19: 4.2 D system integrated packaging... What is meant by "D system"?
Response 7: It is a typo, where 2.5 is missed. “D system integrated packaging based on optoelectronic fusion and Chiplet” has been corrected as “2.5D system integrated packaging based on optoelectronic fusion and Chiplet”
Point 8: Typos I detected:
- 2: nodesmakes --> nodes makes
- 4: swtich --> switch
- 5: Siera --> Sierra
- 9: Architectureis --> Architecture is
The paper requires editing to clean up some English grammar and style issues that would make it much easier to read and a much better paper.
Response 8: These above mentioned typos have been corrected, and the manuscript is also checked.
Author Response File: Author Response.pdf
Reviewer 2 Report
The article has been improved by the authors considering most of the easy-achievable suggestions.
With the inclusion of the table, the article quality improves.
However, the structure of the individual technology sections still mostly read like they are unrelated, after the first paragraph, (random) facts are listed.
As a survey article, it would still benefit by refining information for the reader and ensuring that the technology sections follow a similar structure.
Overall, the contribution of this article remains low; e.g., figures are already existing and not structured similarly. However, there is some value in the article for readers new to network technology, therefore, I increase some ratings.
== Minor suggestions ==
28Gbps*2 lanes*10 ports => Format...
A- and C-axes => The axes A and C are
Table 1: Swtich
Author Response
Point 1: 28Gbps*2 lanes*10 ports => Format...
A- and C-axes => The axes A and C are
Table 1: Swtich.
Response 1:
“28Gbps*2 lanes*10 ports”has been deleted, and “one Tofu network router with 20 ports” is modified as “one Tofu network router with 20 ports of 28Gbps data rate”.
“A- and C-axes” => “The axes A and C are”, “Each of X-, Y-, Z- and B- axes”=>” Each axe of X, Y, Z and B”.
Point 2: The structure of the individual technology sections still mostly read like they are unrelated, after the first paragraph, (random) facts are listed. As a survey article, it would still benefit by refining information for the reader and ensuring that the technology sections follow a similar structure.
Response 2: Thanks for your suggestion. The structure of the individual interconnect technology sections are reorganized. Section 2.1~2.6 follows the similar structure: first, introduce the ICN technology and its distribution share in Top 500 in Nov.2021; second, introduce the connotation and develop roadmap of the ICN technology; finally, through Top 10 machine (if the ICN is adopted), introduce the latest techology of the ICN. Consequently, the content of Section 2.1 and 2.2 is refined; in Section 2.1~2.6, only one most relevant figure for each interconnect technology is reserved.
Author Response File: Author Response.pdf
Round 3
Reviewer 2 Report
An article that aims to compare different network architectures in a survey has plenty of opportunities to compare these technologies. The article improved since the first version, it also appears it has reached its limit. The overall merit for readers is low but it might be helpful for readers new to interconnect technology. In that sense, the article is acceptable.