WIEN2k will perform electronic structure calculations of solids using density functional theory (DFT). It is based on the full-potential (linearized) augmented plane-wave ((L)APW) + local orbitals (lo) method, one among the most accurate schemes for band structure calculations. WIEN2k is an all-electron scheme including relativistic effects.
We have two versions of WIEN2k. We recommend you use the latest.
To load WIEN2k v9 type the following:
$ module load compilers/pathscale-3.2 \ blas/acml-4.1.0-pathscale \ mpi/openmpi-1.2.6-pathscale-3.2 \ math/fftw-2.1.5 \ md/wien2k-09
To load WIEN2k v8 type the following:
$ module load compilers/pathscale-3.2 \ blas/acml-4.1.0-pathscale \ mpi/openmpi-1.2.6-pathscale-3.2 \ md/wien-2k
This software is commercial and requires a license. Currently one faculty holds a license and it is installed on one of our clusters.
We have tested the parallel and serial versions of WIEN2k on urdarbrunnr. Here is a sample parallel (16 cpus) submit script:
#!/bin/bash #$ -S /bin/bash # #$ -N wien2k # #$ -cwd #$ -o job.out #$ -e job.err module load compilers/pathscale-3.2 \ blas/acml-4.1.0-pathscale \ mpi/openmpi-1.2.6-pathscale-3.2 \ math/fftw-2.1.5 \ md/wien2k-09 cd ~/BaBr2 runsp_lapw -ec 0.00001
In WIEN2k only lapw0, lapw1, and lapw2 are parallelized. There are plans to parallelize other aspects of WIEN2k but this requires funding/coding. There are three ways to parallelize WIEN2k. First is k-point parallelization, second is hybrid mode, third is mpi-only. We support all three. We build generic machines files for each run but if those don't work, you can always generate your own by writing a script to process $TMPDIR/machines
.
We build a generic machines file for you and put it in $TMPDIR/machines.wien2k-kpoint
.
Here is a sample submit script for a k-point parallel job:
#!/bin/bash #$ -S /bin/bash # #$ -N w2k_kpnt # #$ -cwd #$ -o job.out #$ -e job.err #$ -pe mpi 16 module load compilers/pathscale-3.2 \ blas/acml-4.1.0-pathscale \ mpi/openmpi-1.2.6-pathscale-3.2 \ math/fftw-2.1.5 \ md/wien2k-09 cd ~/BaBr2 # copy the machines file in cp $TMPDIR/machines.wien2k-kpoint .machines # make sure there is a -p for parallel runs runsp_lapw -p -ec 0.00001
Hybrid parallelization does both k-point parallelization and MPI parallelization. The WIEN2k developers claim that this can be faster in some cases, although we haven't seen any evidence for that claim in our benchmarks. We have built a generic machines file for you and put it in $TMPDIR/machines.wien2k-hybrid
. We break the mpi jobs up by node so that the interconnect is not important. This can be useful on machines with slow interconnects.
Here is a sample submit script for a hybrid parallel job:
#!/bin/bash #$ -S /bin/bash # #$ -N w2k_kpnt # #$ -cwd #$ -o job.out #$ -e job.err #$ -pe mpi 16 module load compilers/pathscale-3.2 \ blas/acml-4.1.0-pathscale \ mpi/openmpi-1.2.6-pathscale-3.2 \ math/fftw-2.1.5 \ md/wien2k-09 cd ~/BaBr2 # copy the machines file in cp $TMPDIR/machines.wien2k-hybrid .machines # make sure there is a -p for parallel runs runsp_lapw -p -ec 0.00001
MPI parallelization does only MPI parallelization. This means the tasks will execute one at a time, across all cpus. We have built a generic machines file for you and put it in $TMPDIR/machines.wien2k-mpi
. This type of parallelization can be useful when you have a fast interconnect. Use hybrid if you have a slow interconnect.
Here is a sample submit script for a MPI parallel job:
#!/bin/bash #$ -S /bin/bash # #$ -N w2k_kpnt # #$ -cwd #$ -o job.out #$ -e job.err #$ -pe mpi 16 module load compilers/pathscale-3.2 \ blas/acml-4.1.0-pathscale \ mpi/openmpi-1.2.6-pathscale-3.2 \ math/fftw-2.1.5 \ md/wien2k-09 cd ~/BaBr2 # copy the machines file in cp $TMPDIR/machines.wien2k-mpi .machines # make sure there is a -p for parallel runs runsp_lapw -p -ec 0.00001
This is an untested feature. Here is a note from the user config:
Start "w2web", define "user/password" and select a port. Then point your web-browser to the proper address:PORT.
For proper usage of scfmonitor please add a line in ~/.Xdefaults : gnuplot*raise: off
Here are the results of this parallel benchmark on urdarbrunnr.
We've tested a couple runs. The first is using only mpi-parallelization the second using a hybrid k-point parallelization and mpi parallelization.
In these runs the .machines file looks like the following for a 32-cpu job:
lapw0: compute-0-0:8 compute-0-1:8 compute-0-2:8 compute-0-3:8 1: compute-0-0:8 compute-0-1:8 compute-0-2:8 compute-0-3:8 granularity:1 extrafine:1
Here are the results of a few job sizes.
output-1nodes-2cpus: TIME HAMILT (CPU) = 315.3, HNS = 191.4, HORB = 0.0, DIAG = 1058.6 output-1nodes-2cpus: ===> TOTAL CPU TIME: 1568.1 (INIT = 2.7 + K-POINTS = 1565.3) output-1nodes-4cpus: TIME HAMILT (CPU) = 165.8, HNS = 120.5, HORB = 0.0, DIAG = 877.7 output-1nodes-4cpus: ===> TOTAL CPU TIME: 1166.8 (INIT = 2.7 + K-POINTS = 1164.1) output-1nodes-8cpus: TIME HAMILT (CPU) = 77.5, HNS = 55.5, HORB = 0.0, DIAG = 376.0 output-1nodes-8cpus: ===> TOTAL CPU TIME: 511.8 (INIT = 2.7 + K-POINTS = 509.1) output-2nodes-16cpus: TIME HAMILT (CPU) = 41.6, HNS = 30.7, HORB = 0.0, DIAG = 240.9 output-2nodes-16cpus: ===> TOTAL CPU TIME: 316.0 (INIT = 2.7 + K-POINTS = 313.3) output-4nodes-32cpus: TIME HAMILT (CPU) = 23.8, HNS = 15.6, HORB = 0.0, DIAG = 115.4 output-4nodes-32cpus: ===> TOTAL CPU TIME: 157.6 (INIT = 2.7 + K-POINTS = 154.9)
In these runs the .machines file looks like the following for a 32-cpu job:
lapw0: compute-0-0:8 compute-0-1:8 compute-0-2:8 compute-0-3:8 1: compute-0-0:8 1: compute-0-1:8 1: compute-0-2:8 1: compute-0-3:8 granularity:1 extrafine:1
Here are the results of a few job sizes.
output-1nodes-1cpus: TIME HAMILT (CPU) = 586.3, HNS = 232.3, HORB = 0.0, DIAG = 1368.7 output-1nodes-1cpus: ===> TOTAL CPU TIME: 2189.8 (INIT = 2.6 + K-POINTS = 2187.3) output-1nodes-2cpus: TIME HAMILT (CPU) = 317.7, HNS = 217.1, HORB = 0.0, DIAG = 1250.2 output-1nodes-2cpus: ===> TOTAL CPU TIME: 1787.8 (INIT = 2.7 + K-POINTS = 1785.1) output-1nodes-4cpus: TIME HAMILT (CPU) = 164.9, HNS = 117.7, HORB = 0.0, DIAG = 876.9 output-1nodes-4cpus: ===> TOTAL CPU TIME: 1162.3 (INIT = 2.7 + K-POINTS = 1159.6) output-1nodes-8cpus: TIME HAMILT (CPU) = 77.8, HNS = 56.0, HORB = 0.0, DIAG = 377.2 output-1nodes-8cpus: ===> TOTAL CPU TIME: 513.8 (INIT = 2.7 + K-POINTS = 511.1) output-2nodes-16cpus: TIME HAMILT (CPU) = 41.2, HNS = 30.8, HORB = 0.0, DIAG = 247.3 output-2nodes-16cpus: ===> TOTAL CPU TIME: 322.2 (INIT = 2.7 + K-POINTS = 319.5) output-4nodes-32cpus: TIME HAMILT (CPU) = 24.1, HNS = 15.9, HORB = 0.0, DIAG = 115.0 output-4nodes-32cpus: ===> TOTAL CPU TIME: 157.8 (INIT = 2.7 + K-POINTS = 155.1)
From our tests MPI parallelization only is sufficient. The Hybrid parallelization doesn't speed things up much. According to the WIEN2k developers this may not be true for “real world” runs so YMMV. We encourage folks to use what works for them. However, this may require building a custom machines file.
WIEN2k was built using the following procedure. There were some changes that we had to make to get WIEN2k to work in our environment.
$ cd /opt/src/WIEN2K_08/build $ tar xvf ../WIEN2K_08.tar $ gunzip *.gz $ ./expand_lapw
For some reason there was a bug in WIEN2k 8.3 that prevented compilation with Pathscale. Here is a diff:
$ diff ../orig/SRC_tetra/tetra.f SRC_tetra/tetra.f 265c265 < " for Atom",i4," col=",i3," Energy=",2f8.4,/)') & --- > & " for Atom",i4," col=",i3," Energy=",2f8.4,/)') & $
The following patch adds support for the Pathscale environment at CSE:
$ diff ../orig/siteconfig_lapw siteconfig_lapw 258a259 > P Linux (Pathscale compiler) 313a315,318 > case [p|P]: > set system = linuxpath > set cpfile = generic > breaksw 1572a1578,1589 > # Linux system with Pathscale compiler > linuxpath:FC:pathf90 > linuxpath:MPF:mpif90 > linuxpath:CC:pathcc > linuxpath:FOPT:-O2 -OPT:Ofast -freeform > linuxpath:FPOPT:-O2 -OPT:Ofast -freeform > linuxpath:LDFLAGS:-L../SRC_lib -L/share/apps/acml-4.1.0/pathscale64/lib > linuxpath:R_LIBS:-lacml > linuxpath:DPARALLEL:'-DParallel' > linuxpath:RP_LIBS:-L/share/apps/openmpi-1.2.6/pathscale-3.2/lib64 -L/share/apps/acml-4.1.0/pathscale64/lib -lacml /share/apps/SCALAPACK/openmpi-1.2.6/pathscale-3.2/lib/libscalapack.a /share/apps/SCALAPACK/openmpi-1.2.6/pathscale-3.2/lib/blacsF77.a /share/apps/SCALAPACK/openmpi-1.2.6/pathscale-3.2/lib/blacs.a /share/apps/SCALAPACK/openmpi-1.2.6/pathscale-3.2/lib/blacsF77.a -lmpi > linuxpath:MPIRUN:mpirun _EXEC_ > $
The parallel_options file gets sourced by and must look like this:
setenv USE_REMOTE 0 setenv WIEN_GRANULARITY 1 setenv WIEN_MPIRUN "mpirun _EXEC_"
USE_REMOTE
should be disabled since we are using tight integration and the modules command to load the environment.
To build WIEN2k you need to run configure and build it via the siteconfig script. It will walk you through the configuration process.
$ ./siteconfig
The modulefile will load the necessary environment variables and paths. The person installing the software should run userconfig_lapw once so each subsequent user doesn't have to. The output of that command can be used to install a global environment configuration. Here is what our modulefile looks like:
#%Module1.0 ## wien-2k ## by Scott Beardsley proc ModulesHelp { } { puts stderr "loads the environment for WIEN 2K" } module-whatis "loads the environment for WIEN 2K" prereq compilers/pathscale-3.2 prereq mpi/openmpi-1.2.6-pathscale-3.2 prereq blas/acml-4.1.0-pathscale setenv WIENROOT /share/apps/wien-2k_08 setenv SCRATCH ./ setenv EDITOR vi setenv OMP_NUM_THREADS 1 setenv W2WEB_CASE_BASEDIR $env(HOME)/WIEN2k setenv STRUCTEDIT_PATH $env(WIENROOT)/SRC_structeditor/bin setenv PDFREADER evince setenv OCTAVE_EXEC_PATH $env(WIENROOT):$env(STRUCTEDIT_PATH):$env(PATH):: setenv OCTAVE_PATH $env(STRUCTEDIT_PATH):: prepend-path PATH $env(STRUCTEDIT_PATH) prepend-path PATH $env(WIENROOT)
OpenMPI is tightly integrated with Grid Engine and doesn't require a machines file. WIEN2k on the other hand does so we must generate one. This will be done automatically through the startup script for the parallel environment. We have added the following to the startup script:
# # create a machines file for WIEN2k # mach=`cat $machines|sort -u` echo -n "lapw0: " >$TMPDIR/machines.wien2k for n in $mach; do cpucnt=`grep "^$n$" $machines|wc -l` echo -n "$n:$cpucnt " >>$TMPDIR/machines.wien2k done echo >>$TMPDIR/machines.wien2k for n in $mach; do cpucnt=`grep "^$n$" $machines|wc -l` echo "1:$n:$cpucnt" >>$TMPDIR/machines.wien2k done echo granularity:1 >>$TMPDIR/machines.wien2k echo extrafine:1 >>$TMPDIR/machines.wien2k
This will generate a $TMPDIR/machines.wien2k file that can be copied to the current directory and used.