OSG

From UF HPC Wiki

Jump to: navigation, search

Installing/Upgrading the OSG Compute Element Software Stack

This document is current as of the OSG-CE-1.0 release.

Docs are at https://twiki.grid.iu.edu/twiki/bin/view/ReleaseDocumentation/WebHome

  • Set OLD_VDT_LOCATION to the current OSG-CE install area

This will preserve some/all of the configuration details from your previous install.

[root@iogw1 osg]# export OLD_VDT_LOCATION=/scratch/ufhpc/osg/osg-ce-0.8

  • Shut down existing OSG services

[root@iogw1 osg]# . $OLD_VDT_LOCATION/setup.sh [root@iogw1 osg]# vdt-control --off disabling cron service gratia-pbs... ok disabling init service syslog-ng... ok disabling init service tomcat-5... ok disabling init service osg-rsv... ok disabling init service apache... ok disabling init service condor-devel... ok disabling cron service vdt-update-certs... ok disabling init service MLD... FAILED! (see vdt-install.log)

   found conflicting, non-VDT init service /etc/rc.d/init.d/MLD
   use the --force option to stop the init service

disabling cron service gums-host-cron... ok disabling cron service edg-mkgridmap... ok disabling init service globus-ws... ok disabling init service mysql... ok disabling inetd service gsiftp... ok disabling inetd service globus-gatekeeper... ok disabling init service gris... FAILED! (see vdt-install.log)

   found conflicting, non-VDT init service /etc/rc.d/init.d/gris
   use the --force option to stop the init service

disabling cron service vdt-rotate-logs... ok disabling cron service fetch-crl... ok

  • Install pacman and put it into your environment:

[root@iogw1 ~]# cd /scratch/ufhpc/osg/pacman [root@iogw1 pacman]# wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-3.26.tar.gz --10:20:20-- http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-3.26.tar.gz

          => `pacman-3.26.tar.gz'

Resolving physics.bu.edu... 128.197.41.42 Connecting to physics.bu.edu|128.197.41.42|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 852,424 (832K) [application/x-gzip]

100%[==========================================================================================================================================================>] 852,424 2.58M/s

10:20:23 (2.58 MB/s) - `pacman-3.26.tar.gz' saved [852424/852424]

[root@iogw1 pacman]# tar --no-same-owner -xzf pacman-3.26.tar.gz [root@iogw1 pacman]# cd pacman-3.26 [root@iogw1 pacman-3.26]# . setup.sh

  • Make an installation area and install:

Notes - Use a shared filesystem path for this. It must be a real directory - use the automounted path at all times. Also, I always use a brand new directory for this - leaving any prior installation area alone. This is because it is useful to look at the configuration of the old installation area sometimes, and also because pacman upgrades never used to work (maybe they do now).

See Troubleshooting area below!!! The OSG:ce 1.0 install did not go smoothly.

[root@iogw1 pacman-3.26]# mkdir /scratch/ufhpc/osg/osg-ce-1.0 [root@iogw1 pacman-3.26]# cd /scratch/ufhpc/osg/osg-ce-1.0 [root@iogw1 osg-ce-1.0]# pacman -get OSG:ce

Answer 'yall' to 'Do you want to add [blah] to [trusted.caches]? (y/n/yall):' or use the '-trust-all-caches' option.

A whole bunch of packages will be downloaded. If you run into any problems (historically, stuff like no mysql group, expired system accounts, etc), it is easiest to just blow the installation directory away and start over from scratch. If everything looks like it is going smoothly, go get coffee or something - it will take a little while.

  • Post-Install
    • Add in local configuration files

We want this in vdt/etc/vdt-local-config.sh and vdt/etc/vdt-local-config.csh:

[root@iogw1 osg-ce-1.0]# cat vdt/etc/vdt-local-setup.sh export GLOBUS_HOSTNAME=`hostname -s`.hpc.ufl.edu export GLOBUS_TCP_PORT_RANGE=40000,50000 export PBS_HOME=/var/spool/torque

[root@iogw1 osg-ce-1.0]# cat vdt/etc/vdt-local-setup.csh setenv GLOBUS_HOSTNAME `hostname -s`.hpc.ufl.edu setenv GLOBUS_TCP_PORT_RANGE "40000,50000" setenv PBS_HOME /var/spool/torque

Set up the environment for the newly-installed software:

[root@iogw1 osg-ce-1.0]# . setup.sh

Since we run Torque, we need to get the associated jobmanager software:

[root@iogw1 osg-ce-1.0]# pacman -get OSG:Globus-PBS-Setup

This will grab a bunch of packages.

  • Configuration
    • Enable the 'edg-mkgridmap' service:

[root@iogw1 osg-ce-1.0]# vdt-register-service --name edg-mkgridmap --enable vdt-register-service: updated cron service 'edg-mkgridmap' vdt-register-service: desired state = enable vdt-register-service: cron time = '4 2,8,14,20 * * *' vdt-register-service: cron command = '/scratch/ufhpc/osg/osg-ce-1.0/edg/sbin/edg-mkgridmap >> /scratch/ufhpc/osg/osg-ce-1.0/edg/log/edg-mkgridmap.log 2>&1'

This will create a cron job that updates the grid-mapfile.

    • Update the /etc/grid-security/certificates link:

[root@iogw1 osg-ce-1.0]# cd /etc/grid-security/ [root@iogw1 grid-security]# rm -f certificates [root@iogw1 grid-security]# ln -s /scratch/ufhpc/osg/osg-ce-1.0/globus/TRUSTED_CA certificates

    • Turn on all the OSG services:

[root@iogw1 osg-ce-1.0]# vdt-control --on enabling cron service fetch-crl... ok enabling cron service vdt-rotate-logs... ok skipping init service 'gris' -- marked as disabled enabling inetd service globus-gatekeeper... ok enabling inetd service gsiftp... ok enabling init service mysql... ok enabling init service globus-ws... ok enabling cron service edg-mkgridmap... ok skipping cron service 'gums-host-cron' -- marked as disabled skipping init service 'MLD' -- marked as disabled skipping cron service 'vdt-update-certs' -- marked as disabled enabling init service condor-devel... ok enabling init service apache... ok skipping init service 'osg-rsv' -- marked as disabled enabling init service tomcat-5... ok enabling init service syslog-ng... FAILED! (see vdt-install.log) enabling cron service gratia-pbs... ok

Fix anything that complains, like syslog-ng above (this one was a simple config file problem).

    • Add some options to xinetd scripts

Add these lines into /etc/xinetd.d/globus-gatekeeper and /etc/xinetd.d/gsiftp:

   env         = GLOBUS_HOSTNAME=iogw1.hpc.ufl.edu
   env        += GLOBUS_TCP_PORT_RANGE=40000,50000

Restart xinetd.

    • Configure OSG

Run monitoring/configure-osg.sh. It will ask lots of questions.

    • Post-configuration

We have to fix some stuff.

      • In gratia/probe/pbs-lsf/urCollector.conf, fix the path to the Torque server logs:

pbsAcctLogDir = "/var/spool/torque/server_priv/accounting/"

      • In gratia/probe/pbs/ProbeConfig, fix the MeterName:

MeterName="pbs:iogw1.hpc.ufl.edu"

      • Fix vdt/setup/configure_gip to allow our routing queue to be advertised:

Change line 774 from:

           if($queue_type eq "Execution"){

to:

           if($queue_type eq "Route"){


      • Re-run monitoring/configure-osg-gip.sh

Take all the defaults when prompted.


      • Fix osg-rsv init script

--- osg-rsv.orig 2008-06-30 23:17:11.000000000 -0400 +++ osg-rsv 2008-06-30 23:23:56.000000000 -0400 @@ -117,7 +117,7 @@

   for file in `find $OSG_RSV_PROBES -name "*.sub"`; do
      if [ `id -u` == 0 ]; then
         su -c "chown -R $RUN_AS_USER /scratch/ufhpc/osg/osg-ce-0.8/osg-rsv/logs/" > /dev/null 2>&1

- su -c "$CONDOR_EXE_SUBMIT $file" $RUN_AS_USER > /dev/null 2>&1 + su - -c "CONDOR_CONFIG=$CONDOR_LOCATION/etc/condor_config $CONDOR_EXE_SUBMIT $file" $RUN_AS_USER > /dev/null 2>&1

      else
         $CONDOR_EXE_SUBMIT $file > /dev/null 2>&1
      fi

@@ -130,7 +130,7 @@

   for file in `find $OSG_RSV_CONSUMERS -name "*.sub"`; do
      if [ `id -u` == 0 ]; then
         su -c "chown -R $RUN_AS_USER /scratch/ufhpc/osg/osg-ce-0.8/osg-rsv/output/" > /dev/null 2>&1

- su -c "$CONDOR_EXE_SUBMIT $file" $RUN_AS_USER > /dev/null 2>&1 + su - -c "CONDOR_CONFIG=$CONDOR_LOCATION/etc/condor_config $CONDOR_EXE_SUBMIT $file" $RUN_AS_USER > /dev/null 2>&1

      else
         $CONDOR_EXE_SUBMIT $file > /dev/null 2>&1
      fi


      • Modify pbs.pm

Apply the following patch:

--- /scratch/ufhpc/osg/osg-ce-1.0/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm.orig 2008-03-03 15:58:52.000000000 -0500 +++ /scratch/ufhpc/osg/osg-ce-1.0/globus/lib/perl/Globus/GRAM/JobManager/pbs.pm 2008-03-03 16:10:02.000000000 -0500 @@ -261,7 +261,8 @@

    elsif($cluster && $cpu_per_node != 0)
    {
        print JOB '#PBS -l nodes=',

- myceil($description->count() / $cpu_per_node), "\n"; + myceil($description->count() / $cpu_per_node), ":serial\n"; + print JOB '#PBS -l pmem=1800mb\n';

    }

    ### SoftEnv extension ###


    • Reboot
    • Validate
      • run a trivial job
      • run site_verify.pl
    • Test installation

Then test to see if you get a reasonable grid-mapfile:

[root@iogw1 osg-ce-0.8]# /scratch/ufhpc/osg/osg-ce-0.8/edg/sbin/edg-mkgridmap --output=/var/tmp/foo.out

    • Troubleshooting

[root@iogw1 osg-ce-1.0]# pacman -pretend-platform linux-rhel-4 -get OSG:ce Do you want to add [1] to [trusted.caches]? (y/n/yall): yall Beginning VDT prerequisite checking script vdt-common/vdt-prereq-check...

All prerequisite checks are satisfied. Package [/scratch/ufhpc/osg/osg-ce-1.0:OSG:ce] not [installed]:

   Package [/scratch/ufhpc/osg/osg-ce-1.0:http://vdt.cs.wisc.edu/vdt_1101_cache:OSG-CE] not [installed]:
       Package [/scratch/ufhpc/osg/osg-ce-1.0:http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-Server] not [installed]:
           Package [/scratch/ufhpc/osg/osg-ce-1.0:http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-Base-RM-Server] not [installed]:
               Package [/scratch/ufhpc/osg/osg-ce-1.0:http://vdt.cs.wisc.edu/vdt_1101_cache:Globus-Base-Jobmanager-Common] not [installed]:
                   can't find file to patch at input line 3

Perhaps you used the wrong -p or --strip option? The text leading up to this was:


|--- globus/lib/perl/Globus/GRAM/JobManager/fork.pm 2007-08-23 13:57:02.000000000 -0500 |+++ globus/lib/perl/Globus/GRAM/JobManager/fork.pm 2007-09-06 14:27:29.000000000 -0500


File to patch: Skip this patch? [y] Skipping patch. 1 out of 1 hunk ignored

Getting help with VDT installation Failures

Apparently you just had a failure installing the VDT. Our sincere apologies!

If you would like help with your problem, please collect whatever information you can from the list below and mail it to us. If you are doing an Open Science Grid installation, mail it to goc@opensciencegrid.org. If you are not doing an Open Science Grid installation, or if you prefer to contact the VDT team directly, please mail it to vdt-support@opensciencegrid.org.

Information to collect:

1) If your VDT installation got far enough to create

  $VDT_LOCATION/vdt/bin/vdt-system-profiler, please run it and send
  us the resulting vdt-profile.txt file. You might need to set
  $VDT_LOCATION in your environment first. 

2) If you can't do that, send us your vdt-install.log, and any

  supporting information: what Pacman command you ran, what OS and
  architecture you are installing on, etc. 

We hope that we can help you fix your problem quickly.


Ugh. It turns out the file that was being patched did not exist yet.

I created a fresh installation area, pulled the pacman file and tarball, edited out the patch command from the .pacman file, and also the 'mv' command for globus/libexec/globus-job-manager-script.pl (that command failed, too - this file didn't exist, either).

Then I installed from my .pacman file:

[root@iogw1 osg-ce-1.0]# pacman -get ../cache:Globus-Base-Jobmanager-Common

After that, I sourced the setup and installed OSG:ce on top of it:

[root@iogw1 osg-ce-1.0]# . setup.sh [root@iogw1 osg-ce-1.0]# pacman -get OSG:ce