Installation notes for ifp-32
From SpeechWiki
To add users
useradd -u <uid> -g<gid> <userName>
so that uid and gid match the existing ones
rocks sync users cluster-fork '/sbin/service autofs restart'
to replicate the login info immediately, otherwise they get sent out once an hour This copies their password/login info and autofs entries into /etc/auto.home for the compute nodes. DO NOT EDIT /etc/auto.home
Allow access to wordpress and ganglia from everywhere via https
cd /etc/sysconfig/ chmod u+w iptables emacs iptables
add line
-A INPUT -m state --state NEW -p tcp --dport https -j ACCEPT
Hash known hosts, so a hacked account on one system won't propagate to others so easily
cd /etc/ssh/ chmod u+w ssh_config emacs ssh_config
add line under Host*
HashKnownHosts yes
This actually shares the dirs through the NFS
chmod u+w exports emacs exports /ws/ifp-32-1 10.0.1.0/255.255.255.0(rw) /ws/ifp-32-2 10.0.1.0/255.255.255.0(rw) /etc/rc.d/init.d/nfs restart
You have to do the analogous thing on the compute nodes, if you want to share their something from them.
This sets up the automounts on the /cworkspace
cd /etc/ chmod u+w auto.* emacs auto.master
add line
/cworkspace /etc/auto.share --timeout=1200
emacs auto.share
add lines
apps ifp-32.local:/export/& install ifp-32.local:/export/home/& c1-1 compute-1-1.local:/ws/c1-1 c1-2 compute-1-2.local:/ws/c1-2 c1-3 compute-1-3.local:/ws/c1-3 c1-4 compute-1-4.local:/ws/c1-4 c1-5 compute-1-5.local:/ws/c1-5 c1-6 compute-1-6.local:/ws/c1-6 c1-7 compute-1-7.local:/ws/c1-7 c1-8 compute-1-8.local:/ws/c1-8 c1-9 compute-1-9.local:/ws/c1-9 c1-10 compute-1-10.local:/ws/c1-10 c1-11 compute-1-11.local:/ws/c1-11 c1-12 compute-1-12.local:/ws/c1-12 c1-13 compute-1-13.local:/ws/c1-13 c1-14 compute-1-14.local:/ws/c1-14 c1-15 compute-1-15.local:/ws/c1-15 c1-16 compute-1-16.local:/ws/c1-16 ifp-32-1 ifp-32.ifp.uiuc.edu:/ws/ifp-32-1 ifp-32-2 ifp-32.ifp.uiuc.edu:/ws/ifp-32-2 ifp-32-3 ifp-32.ifp.uiuc.edu:/ws/ifp-32-3 usr_local_linux_cluster ifp-32.local:/export/usr_local_linux_cluster
cluster-fork '/sbin/service autofs restart'
To change scheduling to equal cpu allocation among waiting users
qconf -mconf
change lines
enforce_user auto auto_user_fshare 100
qconf -msconf
change lines
weight_tickets_functional 10000
To make qstat show jobs of all users
cd /opt/gridengine/default/common emacs sge_qstat
add line
-u *
To limit RAM usage for each process
nano /etc/pam.d/login
add lines
#arthur's ulimits change session required pam_limits.so
nano /etc/security/limits.conf
add lines
#Arthur's change to keep people from gobbling up RAM on the head node 2GB max * hard rss 2000000
To add compute nodes
insert-ethers --cabinet 1 --rank 1
turn on machines in order, bottommost first
get a latish subversion from rpmforge (the one from centos is too old):
wget http://packages.sw.be/subversion/subversion-1.6.6-0.1.el5.rf.x86_64.rpm rpm -ivh subversion-1.6.6-0.1.el5.rf.x86_64.rpm
get a latish version of python
this will install /usr/bin/python2.6 but won't touch the /usr/bin/python that was there originally
wget http://www.python.org/ftp/python/2.6.2/Python-2.6.2.tar.bz2 tar -xjvf Python-2.6.2.tar.bz2 cd Python-2.6.2 ./configure yum install readline-devel make -j 4 make altinstall
reduce the rate of brute-force attack
emacs /etc/sysconfig/iptables
Change the line
-A INPUT -m state --state NEW -p tcp --dport ssh -j ACCEPT
to
#don't allow too many ssh connections from the same IP address #this reduces brute-force attacks and keeps the secure logs clean. Arthur #-A INPUT -m state --state NEW -p tcp --dport ssh -j ACCEPT -A INPUT -m state --state NEW -p tcp --dport ssh -j SSH_CHECK
and add above that
#Arthur's change to limit the number of brute-force ssh attacks #allows only 3 logins from the same ip address every 60 sec. -N SSH_CHECK -A SSH_CHECK -m recent --set --name SSH -A SSH_CHECK -m recent --update --seconds 60 --hitcount 3 --name SSH -j DROP -A SSH_CHECK -j ACCEPT
Power considerations
typical power draw
The following measurements were taken with an instruments MA120 AC/DC clamp meter and a modified powerstrip, where one conductor was pulled out of the power chord, so we could clamp the meter around it. The measured current draw is much lower than the nameplate current draw in the hardware specs.
- Dell powervault MD-1000 with 12 7200RPM disks
- with one of the two redundant power supplies turned off:
- 1.6 Amps idle,
- 1.9 Amps while doing "hdparm -t" benchmark.
- 3.7 Amps At start up while disks are spinning up
- with both of the two redundant power supplies turned on, disconnected from the computer, measuring the sum of current from both power supplies:
- 1.4 Amps idle
- 3.7 Amps At start up while disks are spinning up
- with one of the two redundant power supplies turned off:
- Dell powervault MD-1000 with 8 7200RPM disks
- with one of the two redundant power supplies turned off:
- 1.5 Amps under a typical workload
- 1.7 Amps under typical workload simultanously with "hdparm -t" benchmark.
- with one of the two redundant power supplies turned off:
- Dell Poweredge 2850
- 1.8 Amps with a light workload
- 2.7 with cpuburn running on each of the 4 cpus (full cpu load)
- .5 amps with power off plugged in
- 2.5 amps on startup
- Dell powerconnect RPS-600
- .4 amps idle (nothing plugged into it)
- .5 amps with one powerconnect 5324 plugged into it, and the powerconnect 5324's own power unplugged
- ~.5 amps for both powerconnect RPS-600 and one powerconnect 5324, with 5324 plugged into RPS-600 and the 5324's own power plugged in
- powerconnect 5324 with own power plugged in, and no DC power supply (powerconnect RPS-600) plugged in
- .2 amps
- Dell poweredge sc1425
- 1.1 amps, at idle
- 1.5 amps, with cpuburn on 1 cpu
- 1.9 amps, at full load (with cpuburn on 2 cpus)
- 1.9 amps, immediately after boot
- Dell powerEdge rack console 15FP
- .2 amps
- A powerstrip with 5 Dell poweredge sc1425
- .3-.5 Amp with the machines plugged in, but the power off
- 9.8 amps maximum when all are booted simultaneously (similar to full load). This is 1.96 amps per machine.
Power distribution
We have 3 20amp circuits for the clusters, and our powerstrips and cables are rated for a max of 15amp. For now a safe arrangement is as follows. The practical (non-nameplate) estimated maximum current draw is in () parentheses for each device). We estimate the maximum by extrapolating the measured maximum but some reasonable amount that might come from adding extra components e.g. if the 14-disk tray already has 12 disks, and we add 2 more of roughly the same kind of disk (7200 RPM), the power consumption would go from 1.9 to 2.0 amps, since 8 disks in a tray consume 1.7 amps.
- Circuit 1 (14.8 amps)
- Powerstrip 1-1 (7.4 amps)
- compute nodes 1-4
- Powerstrip 1-2 (7.4 amps)
- compute nodes 5-8
- Powerstrip 1-1 (7.4 amps)
- Circuit 2 (14.8 amps)
- Powerstrip 2-1 (7.9 amps)
- compute nodes 9-12
- switch and switch DC power supply (.5 amps)
- Powerstrip 2-2 (7.4 amps)
- compute nodes 13-16
- Powerstrip 2-1 (7.9 amps)
- Circuit 3
- Powerstrip 3-1 (~6.9 amps)
- head node both power supplies (2.7 amps)
- disk tray 1 both power supplies (2.0 amps)
- disk tray 2 both power supplies (2.0 amps)
- console (.2 amps)
- Powerstrip 3-1 (~6.9 amps)