LSF
LSF has a bug of dropping the jobs into dangling state, where they dont execute anything, just stays using the resources
Need to reboot LSF machines every 2/4 weeks for clean tmp and zombie process cleanup
LSF batch mode
LSF Interactive mode
There is a concept of slot mechanism for allotment in LSF
LSF keeps submitting jobs to first machine eventhough all remaining machines are free. It will increase load into one machine resulting in slower builds and ineffective resource utilization
Filers used across multiple machines face "No such File access" problem, because of more read write access from different machines, result in more read time and they get time out from another instance.
Use of faster filers.
Leverage more local workspace
bjobs to list all jobs
bkill to kill the job
A cleanup job to remove unneccessary jobs.
'
Last updated