Installation notes for ifp-32

From SpeechWiki

Jump to: navigation, search

Contents

To add users

useradd -u <uid> -g<gid> <userName>

so that uid and gid match the existing ones

rocks sync users 
cluster-fork '/sbin/service autofs restart'

to replicate the login info immediately, otherwise they get sent out once an hour This copies their password/login info and autofs entries into /etc/auto.home for the compute nodes. DO NOT EDIT /etc/auto.home


Allow access to wordpress and ganglia from everywhere via https

cd /etc/sysconfig/
chmod u+w iptables
emacs iptables

add line

-A INPUT -m state --state NEW -p tcp --dport https -j ACCEPT


Hash known hosts, so a hacked account on one system won't propagate to others so easily

cd /etc/ssh/
chmod u+w ssh_config 
emacs ssh_config

add line under Host*

	HashKnownHosts yes


Share the /cworkspace among all the cluster nodes

This actually shares the dirs through the NFS

chmod u+w exports 
emacs exports 
/ws/ifp-32-1 10.0.1.0/255.255.255.0(rw)
/ws/ifp-32-2 10.0.1.0/255.255.255.0(rw)
/etc/rc.d/init.d/nfs restart

You have to do the analogous thing on the compute nodes, if you want to share their something from them.

This sets up the automounts on the /cworkspace

cd /etc/
chmod u+w auto.*
emacs auto.master

add line

/cworkspace /etc/auto.share --timeout=1200
emacs auto.share

add lines

apps ifp-32.local:/export/&
install ifp-32.local:/export/home/&
c1-1 compute-1-1.local:/ws/c1-1
c1-2 compute-1-2.local:/ws/c1-2
c1-3 compute-1-3.local:/ws/c1-3
c1-4 compute-1-4.local:/ws/c1-4
c1-5 compute-1-5.local:/ws/c1-5
c1-6 compute-1-6.local:/ws/c1-6
c1-7 compute-1-7.local:/ws/c1-7
c1-8 compute-1-8.local:/ws/c1-8
c1-9 compute-1-9.local:/ws/c1-9
c1-10 compute-1-10.local:/ws/c1-10
c1-11 compute-1-11.local:/ws/c1-11
c1-12 compute-1-12.local:/ws/c1-12
c1-13 compute-1-13.local:/ws/c1-13
c1-14 compute-1-14.local:/ws/c1-14
c1-15 compute-1-15.local:/ws/c1-15
c1-16 compute-1-16.local:/ws/c1-16
ifp-32-1 ifp-32.ifp.uiuc.edu:/ws/ifp-32-1
ifp-32-2 ifp-32.ifp.uiuc.edu:/ws/ifp-32-2
ifp-32-3 ifp-32.ifp.uiuc.edu:/ws/ifp-32-3
usr_local_linux_cluster ifp-32.local:/export/usr_local_linux_cluster
cluster-fork '/sbin/service autofs restart'


To change scheduling to equal cpu allocation among waiting users

qconf -mconf

change lines

enforce_user auto
auto_user_fshare 100
qconf -msconf

change lines

weight_tickets_functional 10000


To make qstat show jobs of all users

cd /opt/gridengine/default/common
emacs sge_qstat

add line

-u * 

To limit RAM usage for each process

nano /etc/pam.d/login

add lines

#arthur's ulimits change
session    required     pam_limits.so
nano /etc/security/limits.conf

add lines

#Arthur's change to keep people from gobbling up RAM on the head node 2GB max
*		 hard    rss		 2000000

To add compute nodes

insert-ethers --cabinet 1 --rank 1

turn on machines in order, bottommost first

get a latish subversion from rpmforge (the one from centos is too old):

wget http://packages.sw.be/subversion/subversion-1.6.6-0.1.el5.rf.x86_64.rpm rpm -ivh subversion-1.6.6-0.1.el5.rf.x86_64.rpm

get a latish version of python

this will install /usr/bin/python2.6 but won't touch the /usr/bin/python that was there originally

wget http://www.python.org/ftp/python/2.6.2/Python-2.6.2.tar.bz2
tar -xjvf Python-2.6.2.tar.bz2 
cd Python-2.6.2
./configure 
yum install readline-devel
make -j 4
make altinstall

reduce the rate of brute-force attack

emacs /etc/sysconfig/iptables

Change the line

-A INPUT -m state --state NEW -p tcp --dport ssh -j ACCEPT

to

#don't allow too many ssh connections from the same IP address
#this reduces brute-force attacks and keeps the secure logs clean. Arthur 
#-A INPUT -m state --state NEW -p tcp --dport ssh -j ACCEPT
-A INPUT  -m state --state NEW -p tcp --dport ssh -j SSH_CHECK

and add above that

#Arthur's change to limit the number of brute-force ssh attacks
#allows only 3 logins from the same ip address every 60 sec.
-N SSH_CHECK
-A SSH_CHECK -m recent --set --name SSH
-A SSH_CHECK -m recent --update --seconds 60 --hitcount 3 --name SSH -j DROP
-A SSH_CHECK -j ACCEPT

Power considerations

typical power draw

The following measurements were taken with an instruments MA120 AC/DC clamp meter and a modified powerstrip, where one conductor was pulled out of the power chord, so we could clamp the meter around it. The measured current draw is much lower than the nameplate current draw in the hardware specs.

  • Dell powervault MD-1000 with 12 7200RPM disks
    • with one of the two redundant power supplies turned off:
      • 1.6 Amps idle,
      • 1.9 Amps while doing "hdparm -t" benchmark.
      • 3.7 Amps At start up while disks are spinning up
    • with both of the two redundant power supplies turned on, disconnected from the computer, measuring the sum of current from both power supplies:
      • 1.4 Amps idle
      • 3.7 Amps At start up while disks are spinning up
  • Dell powervault MD-1000 with 8 7200RPM disks
    • with one of the two redundant power supplies turned off:
      • 1.5 Amps under a typical workload
      • 1.7 Amps under typical workload simultanously with "hdparm -t" benchmark.
  • Dell Poweredge 2850
      • 1.8 Amps with a light workload
      • 2.7 with cpuburn running on each of the 4 cpus (full cpu load)
      • .5 amps with power off plugged in
      • 2.5 amps on startup
  • Dell powerconnect RPS-600
      • .4 amps idle (nothing plugged into it)
      • .5 amps with one powerconnect 5324 plugged into it, and the powerconnect 5324's own power unplugged
      • ~.5 amps for both powerconnect RPS-600 and one powerconnect 5324, with 5324 plugged into RPS-600 and the 5324's own power plugged in
  • powerconnect 5324 with own power plugged in, and no DC power supply (powerconnect RPS-600) plugged in
      • .2 amps
  • Dell poweredge sc1425
      • 1.1 amps, at idle
      • 1.5 amps, with cpuburn on 1 cpu
      • 1.9 amps, at full load (with cpuburn on 2 cpus)
      • 1.9 amps, immediately after boot
  • Dell powerEdge rack console 15FP
      • .2 amps
  • A powerstrip with 5 Dell poweredge sc1425
      • .3-.5 Amp with the machines plugged in, but the power off
      • 9.8 amps maximum when all are booted simultaneously (similar to full load). This is 1.96 amps per machine.

Power distribution

We have 3 20amp circuits for the clusters, and our powerstrips and cables are rated for a max of 15amp. For now a safe arrangement is as follows. The practical (non-nameplate) estimated maximum current draw is in () parentheses for each device). We estimate the maximum by extrapolating the measured maximum but some reasonable amount that might come from adding extra components e.g. if the 14-disk tray already has 12 disks, and we add 2 more of roughly the same kind of disk (7200 RPM), the power consumption would go from 1.9 to 2.0 amps, since 8 disks in a tray consume 1.7 amps.

  • Circuit 1 (14.8 amps)
    • Powerstrip 1-1 (7.4 amps)
      • compute nodes 1-4
    • Powerstrip 1-2 (7.4 amps)
      • compute nodes 5-8
  • Circuit 2 (14.8 amps)
    • Powerstrip 2-1 (7.9 amps)
      • compute nodes 9-12
      • switch and switch DC power supply (.5 amps)
    • Powerstrip 2-2 (7.4 amps)
      • compute nodes 13-16
  • Circuit 3
    • Powerstrip 3-1 (~6.9 amps)
      • head node both power supplies (2.7 amps)
      • disk tray 1 both power supplies (2.0 amps)
      • disk tray 2 both power supplies (2.0 amps)
      • console (.2 amps)
Personal tools