Validation of MaPHyS thread parallelism

Table of Contents

This document aims at validating the thread parallelism level of MaPHyS. In order to do this, we propose here to perform tests on the plafrim2 cluster.

The validation is done with two versions of MaPHyS:

We assume these versions MaPHyS are already installed on plafrim2 by the user. See http://morse.gforge.inria.fr/maphys/install-maphys-cluster.html#sec-7 for these installations.

The test also allows to test MaPHyS with different preconditioning strategies.

The test is performed on a complex matrix named Chevron4, available on plafrim here: /projets/matrix/Chervron/Chevron4/Chevron4.mtx

This page has been generated from the following emacs org-mode file http://morse.gforge.inria.fr/maphys/maphys-threads-validation.org and also available here:

svn checkout svn://scm.gforge.inria.fr/svnroot/morse/tutorials/maphys maphys_morse/ # or
svn checkout https://scm.gforge.inria.fr/anonscm/svn/morse/tutorials/maphys maphys_morse/

1 Useful links

See also http://morse.gforge.inria.fr/maphys/maphys-test-cases.html for more regular (and less regular with Alain Delon) test cases.

2 Performance report

2.1 Time

The following figure plots time performances obtained on plafrim2 corresponding to runs set in 3 and run in 4. The analysis time is not taken into account. The figure is obtained by executing section 5.

For more details, click on the figure.

mt_Total.png

2.2 Memory

mt_Memtot.png

3 Test case setup

We recall here we assume these versions MaPHyS are already installed on plafrim2 by the user. See http://morse.gforge.inria.fr/maphys/install-maphys-cluster.html#sec-7 for these installations.

Connect on plafrim2

ssh plafrim2

Set path to the top directory of Spack

#export SPACK_ROOT=path/to/your/spack/install
export SPACK_ROOT=${HOME}/spack

This execution makes Spack command available from anywhere

. ${SPACK_ROOT}/share/spack/setup-env.sh

You should also update your MODULEPATH to make the modules generated by spack visible and be able to use spack load command, e.g.

export MODULEPATH=$MODULEPATH:${SPACK_ROOT}/share/spack/modules/linux-x86_64

Create a new directory

rm -rf maphys_mt
mkdir maphys_mt
cd maphys_mt

Create a symbolic link to a test matrix

ln -s /projets/matrix/Chevron/Chevron4/Chevron4.mtx
ln -s /projets/hiepacs/matrices.save/Flan1565.mtx

Create an input file for MaPHyS complex example driver:

export INFILE=complex_template.in
echo "MATFILE = Chevron4.mtx" > $INFILE
echo "SYM = 0" >> $INFILE
echo "ICNTL(4) = 5" >> $INFILE
echo "ICNTL(5) = 1" >> $INFILE
echo "ICNTL(6) = 1" >> $INFILE
echo "ICNTL(7) = 4" >> $INFILE
echo "ICNTL(13) = SDS" >> $INFILE
echo "ICNTL(20) = 3" >> $INFILE
echo "ICNTL(21) = PCDSTRAT" >> $INFILE
echo "ICNTL(24) = 500" >> $INFILE
echo "ICNTL(26) = 500" >> $INFILE
echo "ICNTL(22) = 3" >> $INFILE
echo "RCNTL(21) = 1.0e-8" >> $INFILE
echo "RCNTL(11) = 1.0e-2" >> $INFILE
echo "RCNTL(9) = 1.0e-2" >> $INFILE
echo "" >> $INFILE
echo "# Thread parallelism" >> $INFILE
echo "ICNTL(42) = 1" >> $INFILE
echo "ICNTL(37) = NNODES" >> $INFILE
echo "ICNTL(38) = 24" >> $INFILE
echo "" >> $INFILE
echo "ICNTL(39) = NTHREADS # nth" >> $INFILE
echo "ICNTL(40) = NDOMS # ndom" >> $INFILE
echo "" >> $INFILE
echo "ICNTL(36) = BINDSTRAT # Binding" >> $INFILE

Generate all input files from the template:

#!/bin/bash
for bindst in 2; do
    for sds in 1 2; do    
        for ipcd in 1 2; do
            for ndom in 2 4 8 16; do
                for nth in 1 2 4 6 8 12; do
                    outfile=complex_${sds}_${ipcd}_${ndom}_${nth}_${bindst}.in
                    nnodes=$((ndom/2))
                    sed "s/PCDSTRAT/${ipcd}/g" complex_template.in > $outfile
                    sed -i "s/NTHREADS/${nth}/g" $outfile
                    sed -i "s/NDOMS/${ndom}/g" $outfile
                    sed -i "s/NNODES/${nnodes}/g" $outfile
                    sed -i "s/SDS/${sds}/g" $outfile
                    sed -i "s/BINDSTRAT/${bindst}/g" $outfile 
                done
            done
        done
    done
done

Copy MaPHyS examples for both versions with complex double arithmetic:

cp `spack location -i maphys%intel`/bin/dmph_examplethreadkv ddriver_icc
cp `spack location -i maphys%gcc`/bin/dmph_examplethreadkv ddriver_gcc
cp `spack location -i maphys%intel`/bin/zmph_examplethreadkv zdriver_icc
cp `spack location -i maphys%gcc`/bin/zmph_examplethreadkv zdriver_gcc

Exit from plafrim2:

exit

4 Running the test case

The following bloc tangles slurm jobs from the two next subsections on plafrim2:

emacs maphys-threads-validation.org --batch -f org-babel-tangle --kill

The following block details the slurm job template to be specialized, depending on the input files *.in generated in the last section.

#!/bin/bash -l
#SBATCH -p special
#SBATCH -N NNODES
#SBATCH --ntasks-per-node=NTASKPERNODE
#SBATCH --cpus-per-task=NTHREADS
#SBATCH --exclusive
#SBATCH -J JOBNAME
#SBATCH --mail-type=ALL
#SBATCH --output=JOBOUT
#SBATCH --error=JOBERR

source ../env-spack.sh
source SPACKMODULEFILE
EXPMPIPMI
file=INFILE
echo "Case $file"
export ndom=`echo ${file} |tr _ " "|awk '{print $4}'`
export nth=`echo ${file} |tr _. " "|awk '{print $5}'`
date 
echo "" 
echo "Compiler intel + mkl" 
echo "" 
echo "Loaded modules: $LOADEDMODULES" 
echo ""
echo "Working directory: ${PWD}" 
echo "" 
echo "Command: COMMAND ./zdriver_COMPILER $file " 
echo ""
echo ""
export KMP_AFFINITY=verbose,disabled
export OMP_NUM_THREADS=${nth}
COMMAND ./zdriver_COMPILER $file

Generate job files from the template and input files:

#!/bin/bash
ssh plafrim2
cd maphys_mt
for file  in complex_*_*.in; do
    for compiler in icc gcc; do
        export nth=`echo ${file} |tr _ " "|awk '{print $5}'`
        export ndom=`echo ${file} |tr _ " "|awk '{print $4}'`
        nnodes=$((ndom/2))
        ntaskpernode=2
        export bindst=`echo ${file} |tr _. " "|awk '{print $6}'`
        if [[  $compiler == icc ]]; then
            cmd='srun -n ${ndom}'
            exportmpi='export I_MPI_PMI_LIBRARY=/cm/shared/apps/slurm/14.11.11/lib64/libpmi.so'
        else
            cmd='mpirun -mca mtl psm -np ${ndom}'
            exportmpi=''
        fi
        if [[  $bindst == 0 ]]; then
            bindstr="no-binding"
        elif [[  $bindst == 1 ]]; then
            bindstr="thread-binding"
        else
            bindstr="grouped-binding"
        fi
        export sds=`echo ${file} |tr _. " "|awk '{print $2}'`
        if [[  $sds == 1 ]]; then
            sdsname="mumps"
        else
            sdsname="pastix"
        fi      
        echo $cmd
        export OUTFILE=`echo job-${compiler}_${file}|tr . " "|awk '{print $1}'`.sh
        export OUTJOB=`echo ${sdsname}-${compiler}-${bindstr}_${file}|tr . " "|awk '{print $1}'`.out
        export ERRJOB=`echo ${sdsname}-${compiler}-${bindstr}_${file}|tr . " "|awk '{print $1}'`.err
        sed "s/JOBNAME/${compiler}mphs$ndom/g" job.sh > ${OUTFILE}
        sed -i "s/NNODES/${nnodes}/g" ${OUTFILE}
        sed -i "s/NTHREADS/${nth}/g" ${OUTFILE}
        sed -i "s/NTASKPERNODE/${ntaskpernode}/g" ${OUTFILE}
        sed -i "s/SPACKMODULEFILE/..\/load_maphys_${compiler}.sh/g" ${OUTFILE}
        sed -i "s/INFILE/${file}/g" ${OUTFILE}
        sed -i "s/COMMAND/${cmd}/g" ${OUTFILE}
        sed -i "s=COMPILER=${compiler}=g" ${OUTFILE}
        sed -i "s=BINDSTRAT=${bindstr}=g" ${OUTFILE}
        sed -i "s:EXPMPIPMI:${exportmpi}:g" ${OUTFILE}
        sed -i "s:JOBOUT:${OUTJOB}:g" ${OUTFILE}
        sed -i "s:JOBERR:${ERRJOB}:g" ${OUTFILE}
        cat $OUTFILE
    done
done

Then, we connect on plafrim2, jump to the work directory, and submit the jobs:

module load slurm
cmpt=10
while [[ $cmpt -gt 0 ]] 
do
    cmpt=`squeue |grep mphs|wc -l`
    echo "Waiting for a last grouped run to finish, ${cmpt} jobs to terminate"
    sleep 5
done
cmpt=0
totjob=`ls job-* |wc -l`
for file in job-*; do
    sbatch $file
    ((cmpt=cmpt+1))
    ((totjob=totjob-1))
    if [[ $cmpt == 5 ]]; then
        while [[ $cmpt == 5 ]] 
        do
            cmpt=`squeue |grep mphs|wc -l`
            ((remjob=cmpt+totjob))
            echo "Waiting for one job to terminate, $remjob remaining"
            sleep 2
        done
    fi
done
while [[ $cmpt -gt 0 ]] 
do
    cmpt=`squeue |grep mphs|wc -l`
    echo "Waiting for last ${cmpt} jobs to terminate"
    sleep 5
done

Disconnect from plafrim2:

exit

5 Plot the results

5.1 Extract results and create database

Here, we create a database starting from all the output logs coming from the runs performed in the previous section.

We collect here timers and the overall memory performances.

Then, we create directories and copy results from plafrim2

mkdir -p testdir
mkdir -p testdir/mttest
mkdir -p testdir/mttest/icc
mkdir -p testdir/mttest/gcc
mkdir -p testdir/mttest/input
mkdir -p testdir/mttest/job

scp plafrim2:maphys_mt/{pastix,mumps}-icc*.* testdir/mttest/icc
scp plafrim2:maphys_mt/{pastix,mumps}-gcc*.* testdir/mttest/gcc
scp plafrim2:maphys_mt/complex* testdir/mttest/input
scp plafrim2:maphys_mt/job* testdir/mttest/job

To finish, we create a database from log files

export OUTFILE=testdir/mttest/maphys_mt_test.csv
echo "Compiler,Subdomains,Threads,Precond,Analysis,Factorisation,Preconditioning,Solve,Niter,Memtot,FactoLcSys, FactoSchur, PrecondAssembly, PrecondFacto, SolveDistRhs, SolveGenRhs, SolveIterative, SolveDirect, SolveGatherRhs,MemLocSys,MemFac,MemSolve,MemNodePeak" > "${OUTFILE}"
for file in testdir/mttest/{icc,gcc}/*.out; do
    compiler=`echo $file |tr /_ " " |awk '{print $4}'`
    Precond=`echo $file |tr _ " "|awk '{print $4}'`
    if [[ $Precond == 1 ]]; then
        Precond="Dense"
    elif [[ $Precond == 2 ]]; then
        Precond="Sparse"
    else
        Precond="Sparse+PILUT"
    fi
    NP=`echo $file |tr _ " "|awk '{print $5}'`
    if [[ $NP == 1 ]]; then
        subdom="domain"
    else
        subdom="domains"
    fi

    Nth=`echo $file |tr _ " "|tr . " "|awk '{print $6}'`
    Tanalyze=`cat $file|grep "RINFO     4"|tail -n 1|awk '{print $5}'`
    
    Tfactorise=`cat $file|grep "RINFO     5"|tail -n 1|awk '{print $5}'`
    Tfactolc=`cat $file|grep "RINFO    11"|tail -n 1|awk '{print $5}'`
    Tfactoschur=`cat $file|grep "RINFO    12"|tail -n 1|awk '{print $5}'`
    
    Tprecond=`cat $file|grep "RINFO     6"|tail -n 1|awk '{print $5}'`
    Tpcdassbl=`cat $file|grep "RINFO    13"|tail -n 1|awk '{print $5}'`
    Tpcdfacto=`cat $file|grep "RINFO    14"|tail -n 1|awk '{print $5}'`
    
    Tsolve=`cat $file|grep "RINFO     7"|tail -n 1|awk '{print $5}'`
    Tsolvedistrhs=`cat $file|grep "RINFO    15"|tail -n 1|awk '{print $5}'`
    Tsolvegenrhs=`cat $file|grep "RINFO    16"|tail -n 1|awk '{print $5}'`
    Tsolveits=`cat $file|grep "RINFO    17"|tail -n 1|awk '{print $5}'`
    Tsolvesds=`cat $file|grep "RINFO    18"|tail -n 1|awk '{print $5}'`
    Tsolvegathrhs=`cat $file|grep "RINFO    19"|tail -n 1|awk '{print $5}'`

    Niter=`cat $file|grep "IINFOG    5"|tail -n 1|awk '{print $3}'`

    Memtot=`cat $file|grep "IINFO    24"|awk '{print $5}'|tail -n 1`
    Memtot=`echo ${Memtot} | sed -e 's/[eE]+*/\\*10\\^/'`
    Memtot=$(echo "$Memtot * 1."|bc)

    Memlocsys=`cat $file|grep "IINFO     5"|awk '{print $5}'|tail -n 1`
    Memlocsys=`echo ${Memlocsys} | sed -e 's/[eE]+*/\\*10\\^/'`
    Memlocsys=$(echo "$Memlocsys * 1."|bc)

    Memfac=`cat $file|grep "IINFO    22"|awk '{print $5}'|tail -n 1`
    Memfac=`echo ${Memfac} | sed -e 's/[eE]+*/\\*10\\^/'`
    Memfac=$(echo "$Memfac * 1."|bc)

    Memsolve=`cat $file|grep "IINFO    37"|awk '{print $5}'|tail -n 1`
    Memsolve=`echo ${Memsolve} | sed -e 's/[eE]+*/\\*10\\^/'`
    Memsolve=$(echo "$Memsolve * 1."|bc)

    MemNodePeak=`cat $file|grep "IINFO    35"|awk '{print $4}'|tail -n 1`

    echo "$compiler,$NP $subdom,$Nth,Preconditioner $Precond,$Tanalyze,$Tfactorise,$Tprecond,$Tsolve,$Niter,$Memtot,$Tfactolc,$Tfactoschur,$Tpcdassbl,$Tpcdfacto,$Tsolvedistrhs,$Tsolvegenrhs,$Tsolveits,$Tsolvesds,$Tsolvegathrhs,$Memlocsys,$Memfac,$Memsolve,$MemNodePeak" >> "${OUTFILE}"
done

5.2 Plot results from database

The results are plotted with R. As prerequisites, the following R code blocks requires the following libraries:

  • method
  • ggplot2
  • scales
  • and reshape.

To plot results, we first load libraries:

library('methods')
library('ggplot2')
library('scales')
library('reshape')

Then, we can add an additional column summing factorisation, preconditioning and solve phases:

df <- read.csv('testdir/mttest/maphys_mt_test.csv')
df$Total <- df$Factorisation + df$Preconditioning + df$Solve
df <- df[order(df$Subdomains,df$Compiler,df$Precond,df$Threads),]
print(df)
cols <- c("darkred","red","darkblue","blue")
df$Subdomains <- factor(df$Subdomains, levels = c("2 domains", "4 domains", "8 domains", "16 domains"))

5.2.1 Time

In this subsection, we only consider plotting time performance. However, the database created in the previous subsection includes general memory performances.

Run the following R code block to plot time performances:

meas = c("Total","Analysis","Factorisation","Preconditioning","Solve","FactoLcSys", "FactoSchur", "PrecondAssembly", "PrecondFacto", "SolveDistRhs", "SolveGenRhs", "SolveIterative", "SolveDirect", "SolveGatherRhs")
compiler=unique(df$Compiler)
for(step in meas){
    title <- paste("MaPHyS multithread: ",step,sep="")
    title <- paste(title," time","")
    dataord <- df
    dataord <- melt(dataord,id=c("Subdomains","Compiler","Precond","Threads"),measure=c(step))
    print(dataord)
    p <- ggplot(data=dataord, aes(x=factor(Threads),y=value,color=factor(Compiler),group=factor(Compiler))) 
    p <- p + geom_point() + geom_line()
    p <- p + facet_grid(Subdomains ~ Precond, scales="free")
    p <- p + scale_x_discrete(breaks=unique(dataord$Threads))
    p <- p + xlab("Number of threads")
    p <- p + ylab("Time (seconds)")
    p <- p + scale_color_manual(values=cols,name=paste("MaPHyS version-compiler"),label=compiler)
    p <- p + ggtitle(title) + theme_bw(base_size=11) + theme(legend.position="right")    
    filename <- paste("figures/mt_",step,sep="")
    filename <- paste(filename,".png",sep="")
    ggsave(filename, width=9,height=6)
}

5.2.2 Memory

Run the following R code block to plot time performances:

wbar = .9
meas = c("Memtot","MemLocSys","MemFac","MemSolve","MemNodePeak")
compiler=unique(df$Compiler)
for(step in meas){
    title <- paste("MaPHyS multithread: ",step,sep="")
    title <- paste(title," average memory by subdomain","")
    dataord <- df
    dataord <- dataord[order(dataord$Subdomains,dataord$Compiler,dataord$Precond,dataord$Threads),]
    dataord <- melt(dataord,id=c("Subdomains","Compiler","Precond","Threads"),measure=c(step))
    p <- ggplot(data=dataord, aes(x=factor(Threads),y=value,fill=factor(Compiler))) 
    p <- p + geom_bar(position="dodge",stat="identity",width=wbar) + facet_grid(Subdomains ~ Precond, scales="free")
    p <- p + scale_x_discrete(breaks=unique(dataord$Threads))
    p <- p + xlab("Number of threads")
    p <- p + ylab("Memory (MB)")
    p <- p + scale_fill_manual(values=cols,name=paste("MaPHyS version-compiler"),label=compiler)
    p <- p + ggtitle(title) + theme_bw(base_size=11) + theme(legend.position="right")    
    filename <- paste("figures/mt_",step,sep="")
    filename <- paste(filename,".png",sep="")
    ggsave(filename, width=9,height=6)
}

5.2.3 Other stats

meas = c("Niter")
compiler=unique(df$Compiler)
for(step in meas){
    title <- paste("MaPHyS multithread: ",step,sep="")
    dataord <- df
    dataord <- melt(dataord,id=c("Subdomains","Compiler","Precond","Threads"),measure=c(step))
    print(dataord)
    p <- ggplot(data=dataord, aes(x=factor(Threads),y=value,fill=factor(Compiler),group=factor(Compiler))) 
    p <- p + geom_bar(position="dodge",stat="identity",width=wbar) + facet_grid(Subdomains ~ Precond, scales="free")
    p <- p + facet_grid(Subdomains ~ Precond, scales="free")
    p <- p + facet_grid(Subdomains ~ Precond, scales="free")
    p <- p + scale_x_discrete(breaks=unique(dataord$Threads))
    p <- p + xlab("Number of threads")
    p <- p + ylab("Number of iterations")
    p <- p + scale_fill_manual(values=cols,name=paste("MaPHyS version-compiler"),label=compiler)
    p <- p + ggtitle(title) + theme_bw(base_size=11) + theme(legend.position="right")    
    filename <- paste("figures/mt_",step,sep="")
    filename <- paste(filename,".png",sep="")
    ggsave(filename, width=9,height=6)
}

Author: HiePACS

Created: 2016-08-29 lun. 10:41

Emacs 24.4.1 (Org mode 8.2.10)

Validate