[Archive] queue management / cluster

Message by Gregorio Suino:
Hello to everyone,
I’m using SPIS for about a month, due to computing power requirements I need to run simulations on a cluster computer using “qsub” queue management system.
Does anyone knows how to launch simulations using “qsub”?
Thank you

Message by EW:
I am using the sbatch queue system, so maybe it is different, but I have found it easiest to define the project on my PC up through defining the global parameters, but not hitting the “Finalize Run and Save Project” button. Then I move the project folder to the supercomputer system and run this command in my sbatch script, “./Spis.sh -b ./mySpisTrack.py”. The file “mySpisTrack.py” is a script I created by modifying the spisTrack.py script in the resources/defaults/scripts folder to include the project path and name for my project - it was an easy edit to make. I don’t know if this is helpful to you, but wanted to chime in just in case. It look me a while to figure out how to work it this way. Another note - I don’t know if I have something set wrong in my simulation, but even with a JAVA memory allocation of 60GB I seem to be limited to around 200,000 tetrahedra (Full PIC, 5 particles per cell).

Message by EW:
Also, check out the thread called “batch mode -p, why x11” for additional edits you have to make in order to run in headless mode. You have to make the same edit to the -b portion of the Spis.sh script to run the way I described below.

Message by ruard:
Dear all,
I agree with the EW’s recommendations.
First create a shell script (.sh) where you launch spis with the wished mode:
./Spis.sh -b /path/to/your/SpisTrack.py
./Spis.sh -p /path/to/your/spisProject.spis5
Some samples of python scripts can be found in the directory {spisDirectoryRoot}/resources/scripts/. The {spisDirectoryRoot}/resources/defaults/tracks/spisTrack.py python script is the script used when SPIS is used with the -p option (./Spis.sh -p /path/to/your/spisProject.spis5).
Be careful, in the spis 5.1.8 version, there is a bug about the offscreen mode. Please, check the recommendation in the “batch mode -p, why x11” topic in the current forum to quote EW.
To finish submit your job with qsub.
Just a comment about the memory allocation. It seems strange to have 60GB and a crash with 200 000 tetrahedra (Full PIC, 5 particles per cell). Are you sure that you have changed the Xmx value in the Spis.sh file?
benjamin jeanty ruard

Message by EW:
Yes, I have the Xmx value set to to 60000. There is something happening in the “finalize run configuration and save project” step, before proceeding to the simulation running window. I can see the memory use climb after I start the finalize run step. Then, the memory use goes back down once that step is complete. It does not take as much memory to actually run the simulation. For example, I am running around 200,000 tetrahedra. It is only taking around 20GB to run the simulation, but it took almost the full 60GB to do the “finalize run configuration and save project” step.

Message by ruard:
The step where the simulation takes around 60GB is the ui to num step. All preprocessing data are converted in objects used by the numerical core.
However, if you allocate less memory, have you got a java “OutOfMemoryError”? In Java, the management memory is different in comparison with other language programming. The virtual machine can use lot of memory because this memory is allocated so the call to the garbage collector can be considered as useless by the virtual machine.
Maybe you can try to use less memory.
benjamin jeanty ruard

Message by Gregorio Suino:
Thank you very much,
I’m working on .py files, I have to face with some errors.

Message by EW:
The specific error I get is, “java.lang.OutOfMemoryError: Java heap space”. I tried different -xmx allocations and always end up with the same message.

Message by EW:
In the log console it says :
40000 Thu Apr 21 10:28:50 CDT 2016 Error in UI2NUM
Java heap space
at spis.Util.Matrix.SparseMatrix.getValues(SparseMatrix.java:374)
at spis.Util.Matrix.SparseMatrix.matMult(SparseMatrix.java:282)
at spis.Util.Matrix.SparseMatrix.matMult(SparseMatrix.java:239)
at spis.Circ.Circ.RCCabsCirc.setPotSources(RCCabsCirc.java:317)
at spis.Circ.Circ.RCCabsCirc.buildRedMatrices(RCCabsCirc.java:186)
at spis.Circ.Circ.RCCabsCirc.(RCCabsCirc.java:161)
at spis.Top.SC.RCCabsSC.deriveCircuitAndMap(RCCabsSC.java:759)
at spis.Top.SC.RLCSC.deriveCircuitAndMap(RLCSC.java:189)
at spis.Top.SC.RCCabsSC.deriveCircuitAndMap(RCCabsSC.java:1029)
at spis.Top.SC.RCCabsSC.init(RCCabsSC.java:283)
at spis.Top.SC.RCCabsSC.(RCCabsSC.java:272)
at spis.Top.SC.RLCSC.(RLCSC.java:149)
at spis.Top.Simulation.SimulationFromUIParams.init(SimulationFromUIParams.java:1078)
at spis.Top.Simulation.SimulationFromUIParams.(SimulationFromUIParams.java:438)
at spis.Top.Top.NumTopFromUI.(NumTopFromUI.java:157)
at org.spis.ui.ui2num.util.Ui2Num.buildNumTopFromUI(Ui2Num.java:355)
at org.spis.ui.ui2num.command.NumTopFromUICommand.run(NumTopFromUICommand.java:162)
at org.spis.ui.ui2num.command.NumTopFromUICommand.run(NumTopFromUICommand.java:52)
at org.keridwen.core.messaging.AbstractCommand.execute(AbstractCommand.java:200)
at org.keridwen.core.messaging.DefaultBundleController$CallableCommand.call(DefaultBundleController.java:260)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Message by EW:
I thought my script was working, but it turns out it isn’t. It doesn’t appear to be loading the groups and circuit properly. I think I need to ass more commands to my script, but I get errors. Can you explain what I need to add to the script to make sure everything is loaded properly? I have all the files defined up to the point of hitting the “Finalize run configuration and save project”. I do not hit this button in the GUI. I save the project, transfer to the supercomputer and try to do the rest with scripts. My script must be missing some steps. The simulation will run, but not with all the inputs I gave it.

Message by Fredrik Johansson:
Hello EW, you seem to have done similar work as me with sbatch on a supercomputer.As from my thread "“batch mode -p, why x11”, you know I’m using the -p mode, and I detail how I do it, does that work for you? I also found a bug using my workaround.

Message by EW:
Yes, thank you that did allow me to use the -p command. But, I also really need to run a script command for “Finalize run configuration and save project” prior to running the project. That is the step that takes too much memory for my personal computer to handle. So, I need to run a script that includes that step in addition to running the project - I still haven’t figured that part out. I can do it with the GUI, but it just takes a long time.