高效能計算實驗室-研究方向與計畫-HPC實驗室-Cluster Computing and Applications
Cluster Computing and Applications
Cluster Computing and Applications
* Genetic PC cluster, Diskless PC cluster
<img>
- Beowulf is a concept of clustering commodity computers to form a parallel, virtual supercomputer. It is easy to build a unique Beowulf cluster from available components. Currently, we conducted and maintained an experimental Linux SMP cluster (SMP PC machines running the Linux operating system), named THPTB (TungHai Parallel TestBed), which is served as a computing resource for testing.
- THPTB is made up of 18 dual ABIT VP6 SMP-based PC MB with 36 IntelO P-III processors. Nodes are connected using Fast Ethernet with a maximum bandwidth of 300Mbps, through three 24-port switches with channel bonded technique.
- Channel bonding is a method where the data in each message gets striped across the multiple network cards installed in each machine. The THPTB is operated as a unit system to share networking, file servers, and other peripherals. There are one server node and 17 computing nodes.
<img>
- The server node has two Intel Pentium-III 1050MHz (1GHz over-clock, FSB 140MHz) processors and 1GBytes of shared local memory. Each Pentium-III has 32K on-chip instruction and data caches (L1 cache), a 256K on-chip four-way second-level cache with full speed of CPU.
- There are two kinds of computing nodes, one kind (dual2 ~ dual10) is dual P-III 1GHz with 768MB shared-memory, and the other kind (dual11 ~ dual18) is dual P-III 950MHz (866MHz over-clock, FSB: 146MHz) with 512MB shared local memory. In order to measure the performance of our cluster, the HPL benchmark is used to demonstrate the performance of our testbed by using LAM/MPI library.
- The benchmark used in the LINPACK Benchmark is to solve a dense system of linear equations. For the TOP500, we used that version of the benchmark that allows the user to scale the size of the problem and to optimize the software in order to achieve the best performance for a given machine.
<img>
- The amount of memory used by HPL is essentially the size of the coefficient matrix. For example, if you have 18 nodes with 512 MB of memory on each, this corresponds to 9 GB total, i.e., 1125M double precision (8 Bytes each) elements. The square root of that number is 33541. One definitely needs to leave some memory for the OS as well as for other things, so a problem size of 32000 is likely to fit. As a rule of thumb, 80% of the total amount of memory is a good guess. Our P-III SMP cluster can achieve 17.38 GFlops/s for the problem size 32000’32000 with channel bonded by using 36 P-III processors.
<img>
Diskless PC cluster