Monday, April 2, 2012

Low Latency 2.0 and Big Data Andy Bechtolsheim

At 2012 HPC Linux for Wall Street, Andy Bechtolsheim give two presentations:
Low Latency 2.0
Big Data
I include some picture for Andy's talk with this link
very interest talks



Friday, March 30, 2012

China's Super Computer WJS

The   WJS article  By Bob Davis with title "China's Not-So-Supr Computers" Key Points:
  • Nebulae  星雲 National Supercomputing Center (Shenzhen) in Shenzhen, Guangdong, China.
  • Use of SC is determine by local politicians not the breakthrough technology
  • SW development project is under founded
  • catch up with USA on health care  to automotive design to aviation
  • The strategy: never to lead but to follow"
  • researchers are rewarded according to the number of academic papers they publish rather than the quality and novelty of their work
  •  The battle for software dollars is so intense in China that researchers rarely work as a team on long-term software projects
HPC wire follow up with title "Is Chinese SC a Paper Tigers?" By Robert Geiber
  • Takeaway
    Davis paints a picture of China that shows a far less competitive HPC industry than their hardware prowess may suggest. However, this doesn’t mean the country is uncompetitive. China has 74 computers on the Top500 list and is the second highest investor of HPC in the world. Changes in how that money is spent could make the country a much more formidable supercomputing power.
 INQUIRER by Nebojsa Novakovic title "China's SC are super, just not all of them"
Yes, China has less experience than the USA, Europe or Japan in coding software for large machines like this, but they are learning fast. So, they disagree with the WSJ's suggestion that software 'maturity' will remain a problem, since they are gaining ground there by leaps and bounds
  • Yes, China has less experience than the USA, Europe or Japan in coding software for large machines like this, but they are learning fast. So, they disagree with the WSJ's suggestion that software 'maturity' will remain a problem, since they are gaining ground there by leaps and bounds
Yes, China has less experience than the USA, Europe or Japan in coding software for large machines like this, but they are learning fast. So, they disagree with the WSJ's suggestion that software 'maturity' will remain a problem, since they are gaining ground there by leaps and bounds
  • By the end of 2014, China will have at least three 100PFLOP machines that I know of installed - one each in the brand new supercomputer centres of Guangzhou, Changsha and Chongqing. The latter one, in the new industrial powerhouse city of 34 million in western China, where one fifth of all world's motorbikes and soon similar proportion of all laptops are made, will be fully based on Chinese 'Loongson' MIPS processors with massive SIMD extensions. That doesn't even cover what the northern provinces will have.
Yes, China has less experience than the USA, Europe or Japan in coding software for large machines like this, but they are learning fast. So, they disagree with the WSJ's suggestion that software 'maturity' will remain a problem, since they are gaining ground there by leaps and bounds
Yes, China has less experience than the USA, Europe or Japan in coding software for large machines like this, but they are learning fast. So, they disagree with the WSJ's suggestion that software 'maturity' will remain a problem, since they are gaining ground there by leaps and bounds
In summary, the WSJ article isn't right. China is rapidly maturing its supercomputing technology, especially from the hardware point of view, with its own Alpha, MIPS and SPARC processors, plus a dozen ARM licensees and a couple of its own advanced instruction set architectures out there as well, plus its own interconnects and I/O controllers. The Middle Kingdom can control its complete 'vertical stack' of hardware and avoid security risks or technological dependence on a potentially hostile foreign power.
  • In summary, the WSJ article isn't right. China is rapidly maturing its supercomputing technology, especially from the hardware point of view, with its own Alpha, MIPS and SPARC processors, plus a dozen ARM licensees and a couple of its own advanced instruction set architectures out there as well, plus its own interconnects and I/O controllers. The Middle Kingdom can control its complete 'vertical stack' of hardware and avoid security risks or technological dependence on a potentially hostile foreign power.
  • Also, there's nothing wrong in having 'mundane, every day' uses of supercomputers. For instance, a detailed traffic simulation of a gigantic city like that, where every person, vehicle and road, and their behavioural patterns, are fully simulated, can help save billions in correct positioning or construction of new highways or rail lines. Do all the supercomputers have to simulate nuclear bombs? I guess no, so in that sense the civilian use benefit of Chinese supercomputers is just fine.
 My Observations:
  • HPC project is big money project from gov 
  • all politics is local may be even more in China 
  • key factors in HPC is to push CPU chip "made by china" some are licensed from MIPS and some copycat from alpha and from openSPARC
     
  • China today does not yet has good Fab that can make 28, 22nm, but Taiwan does 
  • Cloud will be part of the HPC project and become big in China and Taiwan 
  • Yes today these HPC supercomputers are under utilized but if some one find a way to provide $$ to use these supercompuer (Just like USA NSF is doing it recently)
  • there are many more research $$ wasted in the USA for user to build small/medium size HPC Supercomputer, but these center does train more Ph.D and postdoc that are from India and China and Russian etc
  • hopefully China will find more usage of these Big Machines but it will take time 
  • most trained Scientist still prefer to stay in USA for various reasons ,hope that they can help the China to use the Supercomputer more 

Thursday, March 29, 2012

rocks+6 , MPI, OFED and SGE test drive

StackIQ release rock+ 6 and it contain very interesting rolls
for 16 node rocks+ is free

rocks+ 6.0.1 Complete Stack

Click the below link to register and download Rocks+ as a complete bootable Big Infrastructure stack (free for up to 16-nodes). The stack includes the following modules (“Rolls”) for Rocks:

Rocks+ 6.0.1 ISO

  • Rocks Base
  • Rocks Core
  • Cassandra (beta)
  • CentOS 6.2
  • CUDA
  • Ganglia
  • Grid Engine
  • Hadoop
  • HPC
  • Kernel
  • MongoDB (beta)
  • OFED
  • Web Serve

6.0.1 Modules “a la carte”:

-All “Rocks+” Rolls require the “Rocks+ Core Roll” to be installed
-Rocks+ requires a license file for systems larger than 16-nodes

After one register you will receive email on how to download the iso

I did a test drive with SGE, OFED and Intel.
It also include version of environment modules.

Observations:
  • rocks+6 installation is almost the same as installing rocks
  • it  includes centos 6.2
  • it include OFED and cuda roll
  • it include Hadoop roll
  • free with 16 nodes
  • some open source rolls host on github
    • roll-base
    • roll-web-server
    • roll-sge
    • roll-os
    • roll-hpc
    • roll-ganglia
    • roll-kernel
    • these rolls seems a fork from open source rock??
    • MPI stack only has openmpi and it doesnot include mpich2 and mpich1
It turn out that MPI stack will still need to be compiled and intel and IB (I could be wrong)
I download the mpi roll from the triton github, e.g get a snapshot
the triton also contain ofd, intel , envmodules, hadoop , moab, myrinet_mx , myri10gbe and other roll.

The README contains very import info
  • This roll source supports building specified flavors of MPI with different compilers and for different network fabrics.  
  • By default, it builds mpich, mpich2, mvapich2, and openmpi using the gnu compilers for ethernet.  
  • To build for a different configuration, use the ROLLMPI, ROLLCOMPILER and ROLLNETWORK make variables, e.g.,
  • make ROLLMPI='mpich2 openmpi' ROLLCOMPILER=intel ROLLNETWORK=mx
  • The build process currently supports one or more of the values "intel" and  "gnu" for the ROLLCOMPILER variable, defaulting to "gnu".  
  • It uses any ROLLNETWORK variable value(s) to load appropriate openmpi modules, assuming that there are modules named openmpi_$(ROLLNETWORK) available (e.g., openmpi_ib, openmpi_mx, etc.).
  • setup envmodule inte/2011-sp1
  • The ROLLMPI, ROLLCOMPILER, and ROLLNETWORK variables values are incorporated into the names of the produced roll and rpms, e.g., 
  • make ROLLMPI=openmpi ROLLCOMPILER=intel ROLLNETWORK=ib produces a roll with a name that begins "mpi_intel_ib_openmpi"; it contains and installs similarly-named rpms 
  • e.g. mpi_intel_ib_openmpi-6.0.1-0.x86_64.disk1.iso
  • now in frontend (need to setup and use module intel/2011_sp1)
    • rocks add roll <path>/mpi_intel_ib_openmpi-6.0.1-0.x86_64.disk1.iso
    • rocks enable roll  mpi_intel_ib_openmpi
    • cd /export/rocks/install
    • rocks create distro
    • rocks run roll mpi_intel_ib_openmpi|bash
    • (install intel_ib_openmpi ... rpm)
    • init 6
  • reinstall all compute nodes
Observations:
  • mpich(1) is broken  one can not build so need to use ROLLMPI="mpich2 mvapich openmpi " or just ROLLMPI=openmpi
  • openmpi Makefile come with  option --with-tm=/opt/torque need to replace by --with-sge=/opt/gridengine in my case.
  • one need to setup envmodule for gnu and intel so one can build with eth, or ib and gnu or intel
  • not sure there is any difference between rock+ and rock build process
  • when one build various roll's iso in frontend  it will install the rpm into the frontend
  • One should try to build these iso in a development appliance





Wednesday, March 28, 2012

co-existence of vxvm with ZFS

This is interesting link that talk about co-existence of vxvm  with ZFS


To reuse a ZFS disk as a VxVM disk

Remove the disk from the zpool, or destroy the zpool.

See the Oracle documentation for details.

Clear the signature block using the dd command:

# dd if=/dev/zero of=/dev/rdsk/c#t#d#s# oseek=16 bs=512 count=1

Where c#t#d#s# is the disk slice on which the ZFS device is configured. If the whole disk is used as the ZFS device, clear the signature block on slice 0.

You can now initialize the disk as a VxVM device using the vxdiskadm command or the vxdisksetup command.


To reuse a VxVM disk as a ZFS disk
If the disk is in a disk group, remove the disk from the disk group or destroy the disk group.

To remove the disk from the disk group:

# vxdg [-g diskgroup] rmdisk diskname

To destroy the disk group:

# vxdg destroy diskgroup

Remove the disk from VxVM control

# /usr/lib/vxvm/bin/vxdiskunsetup diskname

You can now initialize the disk as a ZFS device using ZFS tools.

See the Oracle documentation for details.

You must perform step 1 and step 2 in order for VxVM to recognize a disk as ZFS device

Monday, March 5, 2012

oracle Ldom license

this link detail the oracle license   and v12n key points
from ldom 2.1 one can assign full core and memory block to ldom
  • ldom add-core cid=cid,cid    ,... ldom (  set-core, rm-core)
  • ldom add-mem mblock=PA-start:size, PAstart:size   ldom (set-mem, rm-mem)
 one can also set the physical-bindings constraint to core, memory

Wednesday, February 29, 2012

Raspberry PI $2k TFLOPS HPC/GPGPU cluster

Raspberry Pi computer launch in UK
The Raspberry Pi is a credit-card sized computer board that plugs into a TV and a keyboard. It’s a miniature ARM-based PC which can be used for many of the things that a desktop PC does, like spreadsheets, word-processing and games. It also plays High-Definition video.
  • Unit will use a Chinese manufacturer.
  • distribution by
  • features
    • Broadcom BCM2835 700MHz ARM1176JZFS processor with FPU and Videocore 4 GPU
    • GPU provides Open GL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode
    • GPU is capable of 1Gpixel/s, 1.5Gtexel/s or 24GFLOPs with texture filtering and DMA infrastructure
    • 256MB RAM
    • Boots from SD card, running the Fedora version of Linux
    • 10/100 BaseT Ethernet socket
    • HDMI socket
    • USB 2.0 socket
    • RCA video socket
    • SD card socket
    • Powered from microUSB socket
    • 3.5mm audio out jack
    • Header footprint for camera connection
    • Size: 85.6 x 53.98 x 17mm
  •  about $35
  • Run fedora OS
  • May be one can  build small  diskless/diskfull  HPC  GPGPU cluster
  • TFLOPS ~ 42*24 about $35*42=$1470  one will need a head node and 48 port ethernet switch and some ethernet cables and microUSB power cables so total cost around $2000