Epiphany

edytuj

Technologia Epiphany[1] i przykładowy program z użyciem OpenCl[2]

Koprocesor Xeon Phi

edytuj

Wstęp do programowania procesorów Xeon i koprocesora Xeon Phi [3]


OpenMP

edytuj
 
Obraz i kod z użyciem OpenMP

testy wstępne

edytuj

wersja

edytuj
  • openmp_version : "integer parameter openmp_version with a value yyyymm where yyyy and mm are 19 the year and month designations of the version of the OpenMP API that the implementation supports"
  • OMP_DISPLAY_ENV


# bash
export  OMP_DISPLAY_ENV="TRUE"


Wywołanie programu korzystajaćego z OpenMP spowoduje wyświetlenie dodatkowego komunikatu:

OPENMP DISPLAY ENVIRONMENT BEGIN
  _OPENMP = '201511'
  OMP_DYNAMIC = 'FALSE'
  OMP_NESTED = 'FALSE'
  OMP_NUM_THREADS = '8'
  OMP_SCHEDULE = 'DYNAMIC'
  OMP_PROC_BIND = 'FALSE'
  OMP_PLACES = ''
  OMP_STACKSIZE = '0'
  OMP_WAIT_POLICY = 'PASSIVE'
  OMP_THREAD_LIMIT = '4294967295'
  OMP_MAX_ACTIVE_LEVELS = '2147483647'
  OMP_CANCELLATION = 'FALSE'
  OMP_DEFAULT_DEVICE = '0'
  OMP_MAX_TASK_PRIORITY = '0'
OPENMP DISPLAY ENVIRONMENT END


procesor

edytuj

Sprawdzamy ile rdzeni (wątków) ma procesor:[4]

grep 'processor.*:' /proc/cpuinfo | wc -l 

Im więcej tym lepiej. Jeśli 1 to nie ma potrzeby używania OpenMP.

kompilator

edytuj

W konsoli sprawdzamy wersję:[5]

 gcc --version ## get compiler version

lub [6]

 echo |cpp -fopenmp -dM |grep -i open

przykładowy wynik:

 #define _OPENMP 201307

biblioteki

edytuj

Biblioteki:

  • libgomp (GNU Offloading and Multi Processing Runtime Library)[7]

Wyszukujemy pakiety (uwaga na openmpi - to inna biblioteka)

 sudo apt-cache search openmp


Stan biblioteki:

  dpkg --status libgomp1

wyniki:

Package: libgomp1
Status: install ok installed
Priority: optional
Section: libs
Installed-Size: 156
Maintainer: Ubuntu Core developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Multi-Arch: same
Source: gcc-5
Version: 5.4.0-6ubuntu1~16.04.4
Depends: gcc-5-base (= 5.4.0-6ubuntu1~16.04.4), libc6 (>= 2.17)
Breaks: gcc-4.3 (<< 4.3.6-1), gcc-4.4 (<< 4.4.6-4), gcc-4.5 (<< 4.5.3-2)
Description: GCC OpenMP (GOMP) support library
 GOMP is an implementation of OpenMP for the C, C++, and Fortran compilers
 in the GNU Compiler Collection.
Homepage: http://gcc.gnu.org/
Original-Maintainer: Debian GCC Maintainers <debian-gcc@lists.debian.org>


lub:

  ldconfig -p | grep gomp

przykładowy wynik:

 libgomp.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libgomp.so.1

Instalacja

edytuj

Nie instalujemy biblioteki, ponieważ OpenMP jest to opcja kompilatora[8]

ale:

 apt-get install libgomp1
 sudo apt-get install gcc-multilib

użycie

edytuj

Użycie OpenMP[9][10] polega na dodaniu do istniejącego kodu (tu przykład z kodem w C):

  • dyrektywy preprocesora: #include <omp.h>
  • Tworzenie wątków za pomocą dyrektywy: #pragma omp parallel
  • kompilacji z dodaną opcję: -fopenmp[11]

Ta niewielka ingerencja w istniejący kod[12] pozwala na przyspieszenie wykonywania n-razy (n jest liczbą możliwych wątków = liczba rdzeni * liczba wątków_na_rdzeń).

Przykłady

edytuj

Hello world!

edytuj
/*
export  OMP_DISPLAY_ENV="TRUE"
gcc.c.c -fopenmp -Wall
./a.out

*/

#include <omp.h>
#include <stdio.h>

int main() {
 #pragma omp parallel
 printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());


}

Zagnieżdzone pętle ( ang. nested loops)

edytuj
/*
 C program that uses nested for loops
 gcc o.c -Wall -fopenmp
 ./a.out
 
 
 i = 10000 	 xMax*yMax = 10000

 
 i = 444514 = 0.444514 *xMax*yMax 	 where xMax*yMax = 1000000 // no reduction
i = 437097 = 0.437097*(xMax*yMax) 	 where xMax*yMax = 1000000


 i = 1000000 = 1.000000*(xMax*yMax) 	 where xMax*yMax = 1000000 // reduction

*/ 


#include <stdio.h>
#include <stdlib.h>
#include <omp.h>		// OpenMP


int i= 0; 

int main()
{

	int x;
	int xMax = 1000;
	int y;
	int yMax = 1000;
	int all = xMax*yMax;
	
	
	
	

	#pragma omp parallel for collapse(2) schedule(dynamic) reduction(+:i)
	for (x = 0; x < xMax; x++) {

		for (y = 0; y < yMax; y++)
			{i++;}
		}

	printf("i = %d = %f*(xMax*yMax) \t where xMax*yMax = %d\n", i, (double)i/all,  all);
	return 0;
}

Wypełnianie tablicy

edytuj
// fill array using symmetry of image 
// uses global var :  ...
int FillArraySymmetric(unsigned char data[] )
{
   
 unsigned char Color; // gray from 0 to 255 

printf("axis of symmetry \n"); 
iy = iyAxisOfSymmetry; 
#pragma omp parallel for schedule(dynamic) private(ix,Color) shared(ixMin,ixMax, iyAxisOfSymmetry)
for(ix=ixMin;ix<=ixMax;++ix) PlotPoint(ix, iy, GiveColor(ix, iy)); 


/*
The use of ‘shared(variable, variable2) specifies that these variables should be shared among all the threads.
The use of ‘private(variable, variable2)’ specifies that these variables should have a seperate instance in each thread.
*/

#pragma omp parallel for schedule(dynamic) private(iyAbove,ix,iy,Color) shared(iyAboveMin, iyAboveMax,ixMin,ixMax, iyAxisOfSymmetry)

// above and below axis 
for(iyAbove = iyAboveMin; iyAbove<=iyAboveMax; ++iyAbove) 
  {printf(" %d from %d\n", iyAbove, iyAboveMax); //info 
  for(ix=ixMin; ix<=ixMax; ++ix) 

  { // above axis compute color and save it to the array
    iy = iyAxisOfSymmetry + iyAbove;
    Color = GiveColor(ix, iy);
    PlotPoint(ix, iy, Color ); 
    // below the axis only copy Color the same as above without computing it 
    PlotPoint(ixMax-ix, iyAxisOfSymmetry - iyAbove , Color ); 
   } 
}  
 return 0;
}

zbiór Mandelbrota

edytuj
 
Mandelbrot set with field lines

Etapy:

  • przydziel 1 tablicę w pamięci dla całego obrazu przed pętlą
  • wykonuj renderowanie równolegle z każdą iteracją zapisując wynik w tablicy
  • zapisz całą tablicę na dysk po pętli renderowania
   

#include <complex.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <omp.h> //OpenM

/*
fork of 
mandelbrot-book	how to write a book about the Mandelbrot set by Claude Heiland-Alle
https://code.mathr.co.uk/mandelbrot-book/blob/HEAD:/book/

-----------------------------------------------------
hobold: Let's remove the iteration bands and change the relative sizes of black and white checkerboard areas. Now the structural similarity is more emphasized.
https://fractalforums.org/fractal-mathematics-and-new-theories/28/plotting-field-lines-during-iteration/4233

The white curve in my first image is a polyline connecting the (numerically approximated) corners of the checkers of some specific iteration band. I believe the limit was 27 iterations before the smallest "square" became too tiny to distinguish adjacent vertices with double precision.

The pattern in the 2nd image was done by looking at the final value of the iterated Z_n, just after it escapes. The usual checkerboarded "binary decomposition" looks at just the sign of the imaginary part. But you really can choose any axis through the origin and color based on what side of the axis you end up on. Or you can choose a sector smaller than 180 degrees like I did:
Code: [Select]
return (fabs(z.x)*0.1 < fabs(z.y));

The bailout radius needs to be reasonably large for those field lines to align nicely between iteration bands. Because this analogy with field lines is only strictly true for an infinite bailout. So with a minimal bailout radius of 2.0, the binary decomposition ends up being very visibly distorted and misaligned.
------------------------------------------------------------------------





gcc m.c -lm -Wall -Wextra -fopenmp

./a.out >f.pgm

*/


 double cnorm(double _Complex z)
{
  return creal(z) * creal(z) + cimag(z) * cimag(z);
}

int main()
{
  //int aa = 4;
  int w = 2000 ; // width in piels
  int h = 2000 ; // height in pixels
  int kMax = 1024; // iteration max
  
  double r = 2; // plane radius: https://en.wikibooks.org/wiki/Fractals/Computer_graphic_techniques/2D/plane#radius
  // The bailout radius needs to be reasonably large for those field lines to align nicely between iteration bands
  double ER = 2000; // Escape Radius = b
  double ER2 = ER * ER; // escape_radius^2
  
  unsigned char *img = malloc(w * h);
  #pragma omp parallel for
  for (int j = 0; j < h; ++j)
  {
    double y = (h/2 - (j + 0.5)) / (h/2) * r;
    for (int i = 0; i < w; ++i)
    {
      double x = (i + 0.5 - w/2) / (h/2) * r;
      double _Complex c = x + I * y;
      double _Complex z = 0;
      int k;
      for (k = 0; k < kMax; ++k)
      {
        z = z * z + c;
        if (cnorm(z) > ER2)
          break;
      }
      
      //  fabs(z.x)*0.1 < fabs(z.y)   
      img[j * w + i] = (k < kMax && fabs(creal(z))*0.1 < fabs(cimag(z))) ? 255 : 0;
    }
  }
  printf("P5\n%d %d\n255\n", w, h);
  fwrite(img, w * h, 1, stdout);
  free(img);
  return 0;
}


Wersja 2

   
// https://people.sc.fsu.edu/~jburkardt/c_src/c_src.html 
// mandelbrot_openmp, a C code which generates an ASCII Portable Pixel Map (PPM) image of the Mandelbrot fractal set, using OpenMP for parallel execution;
// by John Burkardt
# include <stdlib.h>
# include <stdio.h>
# include <math.h>
# include <time.h>

# include <omp.h>

int main ( );
int i4_min ( int i1, int i2 );
void timestamp ( );

/******************************************************************************/

int main ( )

/******************************************************************************/
/*
  Purpose

    MAIN is the main program for MANDELBROT_OPENMP.

  Discussion:

    MANDELBROT_OPENMP computes an image of the Mandelbrot set.

  Licensing:

    This code is distributed under the GNU LGPL license. 

  Modified:

    03 September 2012

  Author:

    John Burkardt

  Local Parameters:

    Local, int COUNT_MAX, the maximum number of iterations taken
    for a particular pixel.
*/
{
  int m = 500;
  int n = 500;

  int b[m][n];
  int c;
  int count[m][n];
  int count_max = 2000;
  int g[m][n];
  int i;
  int j;
  int jhi;
  int jlo;
  int k;
  char *output_filename = "mandelbrot_openmp.ppm";
  FILE *output_unit;
  int r[m][n];
  double wtime;
  double x_max =   1.25;
  double x_min = - 2.25;
  double x;
  double x1;
  double x2;
  double y_max =   1.75;
  double y_min = - 1.75;
  double y;
  double y1;
  double y2;

  timestamp ( );
  printf ( "\n" );
  printf ( "MANDELBROT_OPENMP\n" );
  printf ( "  C/OpenMP version\n" );
  printf ( "\n" );
  printf ( "  Create an ASCII PPM image of the Mandelbrot set.\n" );
  printf ( "\n" );
  printf ( "  For each point C = X + i*Y\n" );
  printf ( "  with X range [%g,%g]\n", x_min, x_max );
  printf ( "  and  Y range [%g,%g]\n", y_min, y_max );
  printf ( "  carry out %d iterations of the map\n", count_max );
  printf ( "  Z(n+1) = Z(n)^2 + C.\n" );
  printf ( "  If the iterates stay bounded (norm less than 2)\n" );
  printf ( "  then C is taken to be a member of the set.\n" );
  printf ( "\n" );
  printf ( "  An ASCII PPM image of the set is created using\n" );
  printf ( "    M = %d pixels in the X direction and\n", m );
  printf ( "    N = %d pixels in the Y direction.\n", n );

  wtime = omp_get_wtime ( );
/*
  Carry out the iteration for each pixel, determining COUNT.
*/
# pragma omp parallel \
  shared ( b, count, count_max, g, r, x_max, x_min, y_max, y_min ) \
  private ( i, j, k, x, x1, x2, y, y1, y2 )
{
# pragma omp for

  for ( i = 0; i < m; i++ )
  {
    y = ( ( double ) (     i - 1 ) * y_max   
        + ( double ) ( m - i     ) * y_min ) 
        / ( double ) ( m     - 1 );

    for ( j = 0; j < n; j++ )
    {
      x = ( ( double ) (     j - 1 ) * x_max   
          + ( double ) ( n - j     ) * x_min ) 
          / ( double ) ( n     - 1 );

      count[i][j] = 0;

      x1 = x;
      y1 = y;

      for ( k = 1; k <= count_max; k++ )
      {
        x2 = x1 * x1 - y1 * y1 + x;
        y2 = 2 * x1 * y1 + y;

        if ( x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2 )
        {
          count[i][j] = k;
          break;
        }
        x1 = x2;
        y1 = y2;
      }

      if ( ( count[i][j] % 2 ) == 1 )
      {
        r[i][j] = 255;
        g[i][j] = 255;
        b[i][j] = 255;
      }
      else
      {
        c = ( int ) ( 255.0 * sqrt ( sqrt ( sqrt ( 
          ( ( double ) ( count[i][j] ) / ( double ) ( count_max ) ) ) ) ) );
        r[i][j] = 3 * c / 5;
        g[i][j] = 3 * c / 5;
        b[i][j] = c;
      }
    }
  }
}

  wtime = omp_get_wtime ( ) - wtime;
  printf ( "\n" );
  printf ( "  Time = %g seconds.\n", wtime );
/*
  Write data to an ASCII PPM file.
*/
  output_unit = fopen ( output_filename, "wt" );

  fprintf ( output_unit, "P3\n" );
  fprintf ( output_unit, "%d  %d\n", n, m );
  fprintf ( output_unit, "%d\n", 255 );

  for ( i = 0; i < m; i++ )
  {
    for ( jlo = 0; jlo < n; jlo = jlo + 4 )
    {
      jhi = i4_min ( jlo + 4, n );
      for ( j = jlo; j < jhi; j++ )
      {
        fprintf ( output_unit, "  %d  %d  %d", r[i][j], g[i][j], b[i][j] );
      }
      fprintf ( output_unit, "\n" );
    }
  }

  fclose ( output_unit );
  printf ( "\n" );
  printf ( "  Graphics data written to \"%s\".\n", output_filename );
/*
  Terminate.
*/
  printf ( "\n" );
  printf ( "MANDELBROT_OPENMP\n" );
  printf ( "  Normal end of execution.\n" );
  printf ( "\n" );
  timestamp ( );

  return 0;
}
/******************************************************************************/

int i4_min ( int i1, int i2 )

/******************************************************************************/
/*
  Purpose:

    I4_MIN returns the smaller of two I4's.

  Licensing:

    This code is distributed under the GNU LGPL license.

  Modified:

    29 August 2006

  Author:

    John Burkardt

  Parameters:

    Input, int I1, I2, two integers to be compared.

    Output, int I4_MIN, the smaller of I1 and I2.
*/
{
  int value;

  if ( i1 < i2 )
  {
    value = i1;
  }
  else
  {
    value = i2;
  }
  return value;
}
/******************************************************************************/

void timestamp ( )

/******************************************************************************/
/*
  Purpose:

    TIMESTAMP prints the current YMDHMS date as a time stamp.

  Example:

    31 May 2001 09:45:54 AM

  Modified:

    24 September 2003

  Author:

    John Burkardt

  Parameters:

    None
*/
{
# define TIME_SIZE 40

  static char time_buffer[TIME_SIZE];
  const struct tm *tm;
  time_t now;

  now = time ( NULL );
  tm = localtime ( &now );

  strftime ( time_buffer, TIME_SIZE, "%d %B %Y %I:%M:%S %p", tm );

  printf ( "%s\n", time_buffer );

  return;
# undef TIME_SIZE
}
#! /bin/bash
#
gcc -c -Wall -fopenmp mandelbrot_openmp.c
if [ $? -ne 0 ]; then
  echo "Compile error."
  exit
fi
#
gcc -fopenmp mandelbrot_openmp.o -lm
if [ $? -ne 0 ]; then
  echo "Load error."
  exit
fi
rm mandelbrot_openmp.o
mv a.out $HOME/binc/mandelbrot_openmp
#
echo "Normal end of execution."

Liczniki

edytuj

Tutaj korzystamy z liczników będących zmiennymi globalnymi :

/*
  2 basins, both superattracting
 
  - exterior
  - interior
  - unknown ( possibly empty set )

  pixel counters work not good with OpenMP !!!

*/


unsigned char ComputeColor_FatouBasins_superattracting (complex double z)
{

  int i;			// number of iteration
  for (i = 0; i < IterMax; ++i)
    {

      // 2 basins, both superattracting
      cabs2z = cabs2(z);
      // infinity is superattracting here !!!!!	
      if ( cabs2z > ER2 ){ uExterior +=1; return iColorOfExterior;}
	
      // second superAttraction basins. Here one of superattracting point is zp= 0
      if ( cabs2z < AR2 ){ uInterior += 1;return iColorOfInterior;}
		
      			
     
      z = f(z);		//  iteration: z(n+1) = f(zn)
	

    }

  uUnknown += 1;
  return iColorOfUnknown;
}

jest to spowodowane przeplataniem operacji odczytu/modyfikacji/zapisu różnych wątków, na przykład jednym możliwym przeplataniem:

wątek 1: odczytaj zmienną z pamięci do rejestru
wątek 2: odczytaj zmienną z pamięci do rejestru
wątek 1: rejestr przyrostu
wątek 1: zapisz zmienną z rejestru do pamięci
wątek 2: rejestr inkrementacji
wątek 2: zapisz zmienną z rejestru do pamięci

efektem końcowym jest to, że niektóre przyrosty mogą zostać pominięte, w sposób niedeterministyczny

Rozwiązania:

  • Użyj operacji atomowych (najłatwiejsza poprawka kodu)
  • Lub dla większej wydajności (mniej rywalizacji o operacje atomowe), miej lokalne liczniki i połącz je na końcu


#pragma omp atomic
uInterior += 1;


void some_function(int *myExterior)
{
  *myExterior += 1;
}

int main(int argc, char **argv)
{
  int exterior = 0;
  #pragma omp parallel for reduction(+:exterior)
  for (int j = 0; j < 9002; ++j)
  {
     int myExterior = 0;
     for (int i = 0; i < 9002; ++i)
     {
       some_function(&myExterior);
     }
     exterior += myExterior;
  }
  printf("%d\n", exterior);
  return 0;
}

Błędy

edytuj

Najczęstsze błędy[13][14]

Poradniki

edytuj

Źródła

edytuj
  1. Technologia Epiphany - superkomputer dla każdego
  2. Mandelbrot set using OpenCL on Epiphany
  3. Tutorial: Parallel programming for Xeon and Phi at goparallel
  4. OpenMP in 30 Minutes By Joe Landman
  5. gcc wiki: openmp
  6. stackoverflow question how-to-check-the-version-of-openmp-on-linux
  7. libgomp
  8. askubuntu question how-can-i-install-openmp-in-ubuntu
  9. OpenMP w wikipedii
  10. OpenMP in 30 Minutes By Joe Landman
  11. OpenMP - kompilatory
  12. Kurs openMP - Paweł Przybyłowicz - asystent na Wydziale Matematyki Stosowanej AGH.
  13. 32 OpenMP Traps For C++ Developers 20.11.2009 by Alexey Kolosov, Andrey Karpov, Evgeniy Ryzhkov
  14. Common Mistakes in OpenMP and How To Avoid Them A Collection of Best Practices by Michael Suß and Claudia Leopold