![]() |
Royal Institute of Technology
School of Biotechnology Division of Theoretical Chemistry & Biology |
|
| Home | Contact | People | Research | Publications | Internal docs | Education | Links | Networks | |
||
Getting started on Lenngren
Getting an accountYou need an ordinary user account at PDC. Go to PDC accounts page and follow the instructions. KerberosYou need to have kerberos installed on your computer to be able to log into lenngren. Kerberos is available in the form of packages in many linux distrubtions. All theochem workstations have kerberos installed. If you want to be able to run X-client applications (e.g. xterm, emacs windows, graphical debuggers) you should use KTH:s own implementation of kerberos known as heimdal. There are a selection of pre-compiled binaries availble as kerberos "travel-kits" at http://www.pdc.kth.se/support/kerberos/travelkit.html For windows, there is a travelkit as well, but a nicer solution is cygwin which gives you a complete unix-style interface with X-windows and all necessary kerberos components http://support/kerberos/cygwin-install.html Login processMake sure the kerberos executable directory is in your path (at theochem /usr/kerberos/bin) Obtain "tickets": kinit username@NADA.KTH.SENote that they have a finite life-time, the default can be changed with "-l" option. Check tickets: klistLogin telnet [-ax] -F juliana.pdc.kth.se(options -ax should be used from theochem workstations, with the redhat/fedora kerberos distribution). The -F option makes sure that your tickets are forwarded and that they are reforwardable (necessary when you submit jobs) Comment: kerberos provides a mechanism for both authentication (who are you) and authorization (what you are allowed to do). Your home directory at PDC is part of a global file system AFS which is controled by kerberos as well. Therefore, once you have logged in at PDC, you must have valid tickets to have access to your own files! The -F option takes care of this. With your accounts there are special scripts that are executed when you log in. These are called .login in csh and .profile in sh/bash (.bash_profile works for bash too). To be on the safe side you can add the following line in your login script /usr/heimdal/bin/kinit -l 30dthis to make sure that long batch jobs do not fail because the tickets expire. (See Note on kerberos ticket system below!) Users concerned with security may do a kdestroy before logging out. This does not interfere with your batch job, it is the valid tickets at submit time which are copied to the batch system. To summarize: using -F to forward tickets properly, or to regenerate tickets once you have logged in is your choice. Make a habit of checking your tickets before submitting batch jobs. Even the experienced user makes this mistake not having sufficient tickets that will last for the whole batch job. Note on kerberos ticket system. Kerberos does never send your password to the server. Somewhat simplified
the Kerberos works in this way: In the process of authentification upon
login the server sends an encrypted package to your computer. If you
can open the package using your password (on your local computer, never
transmitting the password anywhere) you'll gain access to the server
as requested. This means that you can create a ticket on your local
machine without sending your PDC-password to anyone. Then you forward
the ticket to PDC and use it there.
File transferUse the ftp of your kerberos distribution. The heimdal-version of ftp seems to work only in passive mode (-p option) ftp [-p] -f ftp.pdc.kth.seUse "prot" at the command prompt to obtain encrypted communication With MIT kerberos this is obtained with /usr/kerberos/bin/ftp -x -f ftp.pdc.kth.se There are some firewall issues rcp/rsh. The only working rsh is the heimdal version and in this case only with the "-e" option which disables output to stderr. Unfortunately rcp does not have such a flag. In this case one has to resort writing a script that does rcp over rsh contining e.g. dd if=localfile | rsh -e juliana.pdc.kth.se of=remotefileWith redhat kerberos you can only use ftp. Remember to use the -f flag.
ModulesThe "module" system is used for setting up paths and environment variables I recommend at least the following in your login script module add heimdal easyfor kerberos and batch job commands respectively. Some online help is available with
module help
Disk quotaYour home directory is mounted on an AFS volume with the name "H.user" Your current quota and disk usage is shown with
module add afsws fs lq ~ Note that there is a backup directory called ~/OldFiles (with volume name H.user.backup). This normally contains the backup of your home directory from the previous night - if you accidentally remove a file ~/yo it is easy to repair the damage as by recovering it from ~/OldFiles/yo. It is also possible to recover older versions of a particular file, but then you need assistance from pdc-staff. If your disk space is too small for your normal activities the quota can be raised (within reasonable limits). Contact vahtras@theochem.kth.se Submitting jobsA normal submit is esubmit -n <nodes> -t <minutes> [program program_arguments] Comment: in order to submit jobs you have to belong to a cac (charge account category). Check with the command cac members <username>On the chemistry portions of the cluster you will have a personal cac if the group leaders agree that you are allowed to run. On the snic part you may belong to additional cac's which may be specified with the esubmit command line option -c . You will receive an email from the batch system when the job starts, and finishes If you leave out the program and program_arguments, the nodes will be reserved for the time you have specified. Then you have sole access to the nodes - you can log in and try things interactively. The email that announces that the job has started contains the list of nodes that are available to you To list jobs in the batch queue
spq To delete a job in the queue:
sprelease -j JID where JID is the job id number that is printed by the "spq" command To find out the current queue limit settings there are a few commands that can help. The end of the output from spstatus -s gives the division of the cluster into job classes - the number of nodes that are available for jobs of different length e.g. ----- Space Information ----- D: 61 of 61 available for 4h jobs. D: 4 of 61 excluded for 15h jobs, [2006-05-10 13:00:00, 2006-05-10 18:00:00]. D: 4 of 61 excluded for 60h jobs, [2006-05-08 13:00:00, 2006-05-12 18:00:00]. D: 16 of 61 excluded for 240h jobs, [2006-05-07 02:00:00, 2006-05-14 02:00:00]. D: 48 of 61 excluded for 960h jobs, [2006-05-07 02:00:00, 2006-05-14 02:00:00].means that all nodes can run short jobs (< 4h ), 57 nodes accept jobs in the 4-60h range, 45 nodes accept jobs in the 60-240h range and finally 13 nodes accept jobs in the 240-960h range. The consequence is that for long jobs there are fewer nodes available and it will take longer for the job to start at all. Note that these limitations are fixed with respect to nodes; there are certain nodes for long jobs, so if they're busy you have to wait until they are released nomatter if they just started or just have a few hours left. Also, when allocating nodes the system picks (the available) ones allowing the longest reservation first, even though not needed. The rationale behind this is that, given some variation on jobs in line, it gives shorter (possibly parallel) jobs (slightly) better turnaround time at the expense of longer jobs having to wait slightly longer. (a three week doesn't 'suffer' as much waiting a few days more compared to a two-day job having to wait for three weeks.) Another command that shows limits is
d10n03$ spq -L
INTERVAL NICKNAME NJOB WALLTIME NODETIME
- ]960h,8760h] no_no_no - - - - -
- ]4h,960h] n_other - - 8 - -
- ]4h,960h] nodetime - - - - 480h01
- ]0m01s,4h] n_4h - - 4 - -
which means that requests over 960h are not considered, requests in the
4-960h bracket is tested against the number of queued jobs as well as
the total queued nodetime (obtained by summing for each job a user has
in the queue, the number of requested nodes times the requested time)
Finally, short jobs are tested against the number of queued jobs.
Note that these tests are applied to a submitted job against previously queued jobs (not already running jobs) - for normal load situations this distinction is not that important but for a rare case that the machine is empty one should be able to use the whole machine. Example:if a person has 8 long jobs in the queue a ninth submit will be put in "held" state and not queued for execution (until the the first of the other queued jobs start running). This will happen sooner if the the total number of nodehours for the queued jobs exceed 480h. Similarly, if a user is submitting many short jobs only 4 will be queued for execution and the remaining will be held. Chemistry software status
Comments to vahtras@theochem.kth.se Olav Vahtras |