Home > High Performance Computing > HPC FAQ

HPC FAQ


  1. What is a HPC cluster?
    A HPC cluster is a multi-computer architecture which can be used for parallel computations. It is a system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other network. It is a system built using server class hardware components Unix-like operating system, with standard Ethernet adapters, and switches. The cluster uses software like the Linux operating system, Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The Head/Master node controls the whole cluster and serves files to the client nodes, It is also the cluster's console and gateway to the outside world. Nodes are configured and controlled by the server node, and do only what they are told to do.



  2. What is Parallel Computing?
    Parallel computing is a form of computation in which many instructions are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel"). There are several different forms of parallel computing: bit-level parallelism, instruction-level parallelism, data parallelism, and task parallelism. It has been used for many years, mainly in high-performance computing, but interest in it has grown in recent years due to the physical constraints preventing frequency scaling.



  3. What is EKA Supercomputer?
    CRL has built a supercomputer to fulfill its requirements in the design and development of HPC hardware, software and applications. This system is listed as the fastest supercomputer in Asia and the 4th fastest supercomputer in the world with a linpack performance of 132.8TF according to the top500.org list announced in SC07. The ‘EKA’ supercomputer has several key features

    • Versatile and High Performance Systems
      CRL focused on large-scale, high-performance parallel computing, 64-bit systems that runs the Linux operating system, and is capable of providing over 120 TeraFLOPS sustained LINPACK performance across the system. A theoretical peak total CPU performance of 170 TeraFLOPS is desired. Each server has 2 CPUs with 4 core each and 16 GB of shared access memory. The total aggregated memory size is 28 TB.

    • Distributed Parallel Computing Environment
      The system provides a distributed parallel computing environment with support for the Pthread libraries, Message Passing Interface and OpenMPI resources and technologies. This environment, combined with additional software tools, provides effective job execution management for both interactive and batch jobs.


    • Storage and Backup
      With a wide range of users accessing the system data storage was a key concern. Over 80TB of SAS physical storage capacity is installed. A Lustre parallel file system with a total RAID I/O transfer rate of 5.2 GB/second is supported to over 1,800 nodes. The Lustre parallel file system is available over the Infiniband network as well as Gigabit Ethernet network. To secure the data, Backup capacity of 24 Terabyte is provided.


    • Wide Accessibility
      Not content with sheer size, CRL is looking to bring supercomputing to everyday use. The new supercomputing architecture is able to run commercial off-the-shelf and open source applications including structural analysis applications, computational chemistry tools, statistical analysis packages and open source applications.


    • Short Development and Deployment Cycle
      Many supercomputing environments are based on proprietary systems, and are designed for dedicated use. Typically, such systems take years to develop and deploy. EKA was made operational in a very short time with off-the-shelf hardware.




  4. How to submit my job using Scheduler? What schedulers are available? The production run is submitted to EKA with help of scheduler and resource manager. CRL has developed a frontend to facilitate the user to submit their productions run and monitor their job till completion.



  5. How to submit my job directly to a set of Nodes?
    After receiving your node-list you have to prepare your code accordingly to run on the allotted set of nodes and submit your jobs.



  6. What is the architecture of EKA supercomputer?
    The EKA supercomputer is built on the building blocks of the XC Architecture, Unlike the traditional SMP machines and the Mainframe systems. The basic building blocks of EKA are the off-the-shelf nodes with better hardware and performance. The system architecture is used across 1800 nodes to operate at 170TF peak and 120TF sustained performance. All nodes in the cluster are interconnected via Infiniband technology including the SFS20 storage, and are capable of accessing 40TB of hard disk storage in parallel. Incorporating technology from Platform LSF, and Voltaire, as well as the Nagios System Manager software, the EKA runs the 64bit RedHat AS Server operating system version4 update 4 to deliver applications to users and speed scientific algorithms and data processing.

    The typical solution consists of Head node/control node with some of the nodes running services for login, these Optional service nodes share tasks with head node (e.g., log-in, LSF, file server etc). The compute nodes are dedicated to job computation. These entire nodes share same high-performance interconnects, these Nodes are also connected via admin and console and management networks. External network connections from head node are connected to the user network outside the other nodes.
    EKA System Architecture with high speed Infiniband Interconnect and SFS storage.



  7. What compilers, Libraries and softwares are available on EKA?
    GNU C++ Compiler
    Intel® C++ Compiler
    Intel® Fortran Compiler
    Intel® Math Kernel Library
    Intel® Math Thread Checker
    Intel® MPI, HP MPI, Open MPI, MVAPICH
    Intel® Threading Building Blocks,
    Intel® VTune™ Performance Analyzer



  8. How should I register for compute power? Whom should I approach for that? What are the charges?
    Contact the EKA Facility through mail hpcsupport@crlindia.com, they will reply with all the details.



  9. Can I login to EKA remotely/from my office/home? What is the process for that?
    Yes, you can EKA is accessible through Internet also, so you can login to EKA  from office/home. Users who have the valid VPN account can use CRL’s VPN facility to log in to EKA. Otherwise need to apply for the CRL's VPN facility. After the approval you will get login credential for the VPN and the VPN user guide.
    You can refer the VPN user guide for connecting to EKA from remotely.
    There are 5 simple Steps.

    1. Using any java enabled web browser connect to the given IP address.

    2. User has to provide his/her VPN Login id and password

    3. Click the Start Access Link

    4. Find the details of user login from the child window.

    5. Open any ssh client and enter the details IP address, login name and port no to connect to EKA.


  10. How much data I can copy on the machine? What are the limits and charges for that?
    You will be entitled to this once the project is initiated and on requirement basis.



  11. What is the process and modes for carrying my data from EKA on Tape/DVD/CD? Data can be transferred in the following way

    1. Secure FTP if the data is not too large

    2. Via the Media (Tape/DVD)

    3. Data replication – The specific folder which holds results will be replicated to customer’s site.


  12. What are the charges for setting up a user space at the Facility? What facility I get at the setup?
    You can contact hpcsupport@crlindia.com and they will send you the details after initial discussions.



  13. What softwares I can install on EKA? Can I install my own application on EKA? Then what is the procedure for that?
    CRL has expertise in parallel programming. CRL can definitely assist for porting and modifying the codes.



  14. What is the storage setup on EKA? Which filesystem is used on the cluster? What are the performances of the storage?

    • Primary
      Lustre based scalable, secure, robust, highly-available cluster file system.
      Capacity: 72TB, No of OSS pairs:- 5
      Bandwidth: 5.2 GBps, HDD interface: SATA

    • Secondary
      Network File Share
      Capacity: 8TB
      Bandwidth: 640 Mbps
      HDD interface: SATA

    • Performance: The filesystem operates at 5.6GBps bandwidth


  15. Is there a Visualization setup available for EKA? What are the usage charges for that?
    The visualization setup is in progress, it will be ready very soon. For charges and other commercials you should contact hpcsupport@crlindia.com



  16. Is there any backup available for data on EKA? What are the available options for backup?
    A permanent Tape Backup solution is available for EKA with LTO3 and LTO4 options, other tapes also can be provided on requirement basis.
    Backup Tape Library:
       -Capacity: 120 TB
       -Bandwidth: 120 MBps
    Other options like CD, DVD and other media like External Disks is also available.



  17. How secure is my remote login? What are the security methods employed?
    Following are the options by which a remote user can access the EKA HPC facility.

    1. Site-to-Site VPN Used between two IPSec security gateways, which includes security appliances, VPN concentrators, or other devices that support site-to-site IPSec connectivity.

    2. VPN Client achieves secure remote access for VPN clients, such as mobile users. This option lets remote users securely access centralized network resources.

    3. WebVPN Access represents a new technology allowing for secure communication to be established over the Internet through a web browser. The advantage of this is the broad array of platforms and browsers that are compatible with it. Since WebVPN is simply secure web traffic, it is virtually platform and browser independent.

    4. Direct Connectivity from the customer site to the CRL facility. Following options can be worked out based on the feasibility. Leased circuit, E1, E3, DS3, OC3 etc. User will login to his allocated Login node through any of the above mentioned options. After Login user is restricted only to his Login node until job is submitted. After submission of his jobs user gets access to the allocated resources. If the user has a dedicated partition nodes can be accessed in that partition without submitting a job. Access to dedicated partition is restricted to the respective users only.


    Data transfer can be done via Secure FTP over secured VPN tunnel.
    Data security is taken care in the following ways:

    1. When the data is transferred via VPN or Direct Connectivity it will encrypted using MD5, 3DES, AES etc, based on how the customer needs are. Internet Key Exchange (IKE) and Internet Protocol Security (IPSec) tunneling protocols are used to establish and manage the secure connection.

    2. In file system data is protected using standard Linux security model. User cannot read, write and execute others users data. Data security will be in the control of respective users.


  18. Can you give me a dedicated set of nodes for a certain time period? What is the process?
    Yes. It is possible subject to commercial agreement.

Whitepapers

 

 

 

 

  © 2008 Computational Research Laboratories

Site Map | Contact us