🖥️LSF

LSF has a bug of dropping the jobs into dangling state, where they dont execute anything, just stays using the resources

Need to reboot LSF machines every 2/4 weeks for clean tmp and zombie process cleanup

LSF batch mode

LSF Interactive mode

There is a concept of slot mechanism for allotment in LSF

LSF keeps submitting jobs to first machine eventhough all remaining machines are free. It will increase load into one machine resulting in slower builds and ineffective resource utilization

Filers used across multiple machines face "No such File access" problem, because of more read write access from different machines, result in more read time and they get time out from another instance.

Use of faster filers.

Leverage more local workspace

bjobs to list all jobs

bkill to kill the job

A cleanup job to remove unneccessary jobs.

'

Last updated