Programowanie w systemie UNIX/CPU
CPU
edytujEpiphany
edytujTechnologia Epiphany[1] i przykładowy program z użyciem OpenCl[2]
Koprocesor Xeon Phi
edytujWstęp do programowania procesorów Xeon i koprocesora Xeon Phi [3]
OpenMP
edytujtesty wstępne
edytujwersja
edytuj- openmp_version : "integer parameter openmp_version with a value yyyymm where yyyy and mm are 19 the year and month designations of the version of the OpenMP API that the implementation supports"
- OMP_DISPLAY_ENV
# bash
export OMP_DISPLAY_ENV="TRUE"
Wywołanie programu korzystajaćego z OpenMP spowoduje wyświetlenie dodatkowego komunikatu:
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP = '201511'
OMP_DYNAMIC = 'FALSE'
OMP_NESTED = 'FALSE'
OMP_NUM_THREADS = '8'
OMP_SCHEDULE = 'DYNAMIC'
OMP_PROC_BIND = 'FALSE'
OMP_PLACES = ''
OMP_STACKSIZE = '0'
OMP_WAIT_POLICY = 'PASSIVE'
OMP_THREAD_LIMIT = '4294967295'
OMP_MAX_ACTIVE_LEVELS = '2147483647'
OMP_CANCELLATION = 'FALSE'
OMP_DEFAULT_DEVICE = '0'
OMP_MAX_TASK_PRIORITY = '0'
OPENMP DISPLAY ENVIRONMENT END
procesor
edytujSprawdzamy ile rdzeni (wątków) ma procesor:[4]
grep 'processor.*:' /proc/cpuinfo | wc -l
Im więcej tym lepiej. Jeśli 1 to nie ma potrzeby używania OpenMP.
kompilator
edytujW konsoli sprawdzamy wersję:[5]
gcc --version ## get compiler version
lub [6]
echo |cpp -fopenmp -dM |grep -i open
przykładowy wynik:
#define _OPENMP 201307
biblioteki
edytujBiblioteki:
- libgomp (GNU Offloading and Multi Processing Runtime Library)[7]
Wyszukujemy pakiety (uwaga na openmpi - to inna biblioteka)
sudo apt-cache search openmp
Stan biblioteki:
dpkg --status libgomp1
wyniki:
Package: libgomp1 Status: install ok installed Priority: optional Section: libs Installed-Size: 156 Maintainer: Ubuntu Core developers <ubuntu-devel-discuss@lists.ubuntu.com> Architecture: amd64 Multi-Arch: same Source: gcc-5 Version: 5.4.0-6ubuntu1~16.04.4 Depends: gcc-5-base (= 5.4.0-6ubuntu1~16.04.4), libc6 (>= 2.17) Breaks: gcc-4.3 (<< 4.3.6-1), gcc-4.4 (<< 4.4.6-4), gcc-4.5 (<< 4.5.3-2) Description: GCC OpenMP (GOMP) support library GOMP is an implementation of OpenMP for the C, C++, and Fortran compilers in the GNU Compiler Collection. Homepage: http://gcc.gnu.org/ Original-Maintainer: Debian GCC Maintainers <debian-gcc@lists.debian.org>
lub:
ldconfig -p | grep gomp
przykładowy wynik:
libgomp.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libgomp.so.1
Instalacja
edytujNie instalujemy biblioteki, ponieważ OpenMP jest to opcja kompilatora[8]
ale:
apt-get install libgomp1 sudo apt-get install gcc-multilib
użycie
edytujUżycie OpenMP[9][10] polega na dodaniu do istniejącego kodu (tu przykład z kodem w C):
- dyrektywy preprocesora:
#include <omp.h>
- Tworzenie wątków za pomocą dyrektywy:
#pragma omp parallel
- kompilacji z dodaną opcję:
-fopenmp
[11]
Ta niewielka ingerencja w istniejący kod[12] pozwala na przyspieszenie wykonywania n-razy (n jest liczbą możliwych wątków = liczba rdzeni * liczba wątków_na_rdzeń).
Przykłady
edytujHello world!
edytuj/*
export OMP_DISPLAY_ENV="TRUE"
gcc.c.c -fopenmp -Wall
./a.out
*/
#include <omp.h>
#include <stdio.h>
int main() {
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
Zagnieżdzone pętle ( ang. nested loops)
edytuj/*
C program that uses nested for loops
gcc o.c -Wall -fopenmp
./a.out
i = 10000 xMax*yMax = 10000
i = 444514 = 0.444514 *xMax*yMax where xMax*yMax = 1000000 // no reduction
i = 437097 = 0.437097*(xMax*yMax) where xMax*yMax = 1000000
i = 1000000 = 1.000000*(xMax*yMax) where xMax*yMax = 1000000 // reduction
*/
#include <stdio.h>
#include <stdlib.h>
#include <omp.h> // OpenMP
int i= 0;
int main()
{
int x;
int xMax = 1000;
int y;
int yMax = 1000;
int all = xMax*yMax;
#pragma omp parallel for collapse(2) schedule(dynamic) reduction(+:i)
for (x = 0; x < xMax; x++) {
for (y = 0; y < yMax; y++)
{i++;}
}
printf("i = %d = %f*(xMax*yMax) \t where xMax*yMax = %d\n", i, (double)i/all, all);
return 0;
}
Wypełnianie tablicy
edytuj// fill array using symmetry of image
// uses global var : ...
int FillArraySymmetric(unsigned char data[] )
{
unsigned char Color; // gray from 0 to 255
printf("axis of symmetry \n");
iy = iyAxisOfSymmetry;
#pragma omp parallel for schedule(dynamic) private(ix,Color) shared(ixMin,ixMax, iyAxisOfSymmetry)
for(ix=ixMin;ix<=ixMax;++ix) PlotPoint(ix, iy, GiveColor(ix, iy));
/*
The use of ‘shared(variable, variable2) specifies that these variables should be shared among all the threads.
The use of ‘private(variable, variable2)’ specifies that these variables should have a seperate instance in each thread.
*/
#pragma omp parallel for schedule(dynamic) private(iyAbove,ix,iy,Color) shared(iyAboveMin, iyAboveMax,ixMin,ixMax, iyAxisOfSymmetry)
// above and below axis
for(iyAbove = iyAboveMin; iyAbove<=iyAboveMax; ++iyAbove)
{printf(" %d from %d\n", iyAbove, iyAboveMax); //info
for(ix=ixMin; ix<=ixMax; ++ix)
{ // above axis compute color and save it to the array
iy = iyAxisOfSymmetry + iyAbove;
Color = GiveColor(ix, iy);
PlotPoint(ix, iy, Color );
// below the axis only copy Color the same as above without computing it
PlotPoint(ixMax-ix, iyAxisOfSymmetry - iyAbove , Color );
}
}
return 0;
}
zbiór Mandelbrota
edytujEtapy:
- przydziel 1 tablicę w pamięci dla całego obrazu przed pętlą
- wykonuj renderowanie równolegle z każdą iteracją zapisując wynik w tablicy
- zapisz całą tablicę na dysk po pętli renderowania
#include <complex.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <omp.h> //OpenM
/*
fork of
mandelbrot-book how to write a book about the Mandelbrot set by Claude Heiland-Alle
https://code.mathr.co.uk/mandelbrot-book/blob/HEAD:/book/
-----------------------------------------------------
hobold: Let's remove the iteration bands and change the relative sizes of black and white checkerboard areas. Now the structural similarity is more emphasized.
https://fractalforums.org/fractal-mathematics-and-new-theories/28/plotting-field-lines-during-iteration/4233
The white curve in my first image is a polyline connecting the (numerically approximated) corners of the checkers of some specific iteration band. I believe the limit was 27 iterations before the smallest "square" became too tiny to distinguish adjacent vertices with double precision.
The pattern in the 2nd image was done by looking at the final value of the iterated Z_n, just after it escapes. The usual checkerboarded "binary decomposition" looks at just the sign of the imaginary part. But you really can choose any axis through the origin and color based on what side of the axis you end up on. Or you can choose a sector smaller than 180 degrees like I did:
Code: [Select]
return (fabs(z.x)*0.1 < fabs(z.y));
The bailout radius needs to be reasonably large for those field lines to align nicely between iteration bands. Because this analogy with field lines is only strictly true for an infinite bailout. So with a minimal bailout radius of 2.0, the binary decomposition ends up being very visibly distorted and misaligned.
------------------------------------------------------------------------
gcc m.c -lm -Wall -Wextra -fopenmp
./a.out >f.pgm
*/
double cnorm(double _Complex z)
{
return creal(z) * creal(z) + cimag(z) * cimag(z);
}
int main()
{
//int aa = 4;
int w = 2000 ; // width in piels
int h = 2000 ; // height in pixels
int kMax = 1024; // iteration max
double r = 2; // plane radius: https://en.wikibooks.org/wiki/Fractals/Computer_graphic_techniques/2D/plane#radius
// The bailout radius needs to be reasonably large for those field lines to align nicely between iteration bands
double ER = 2000; // Escape Radius = b
double ER2 = ER * ER; // escape_radius^2
unsigned char *img = malloc(w * h);
#pragma omp parallel for
for (int j = 0; j < h; ++j)
{
double y = (h/2 - (j + 0.5)) / (h/2) * r;
for (int i = 0; i < w; ++i)
{
double x = (i + 0.5 - w/2) / (h/2) * r;
double _Complex c = x + I * y;
double _Complex z = 0;
int k;
for (k = 0; k < kMax; ++k)
{
z = z * z + c;
if (cnorm(z) > ER2)
break;
}
// fabs(z.x)*0.1 < fabs(z.y)
img[j * w + i] = (k < kMax && fabs(creal(z))*0.1 < fabs(cimag(z))) ? 255 : 0;
}
}
printf("P5\n%d %d\n255\n", w, h);
fwrite(img, w * h, 1, stdout);
free(img);
return 0;
}
Wersja 2
// https://people.sc.fsu.edu/~jburkardt/c_src/c_src.html
// mandelbrot_openmp, a C code which generates an ASCII Portable Pixel Map (PPM) image of the Mandelbrot fractal set, using OpenMP for parallel execution;
// by John Burkardt
# include <stdlib.h>
# include <stdio.h>
# include <math.h>
# include <time.h>
# include <omp.h>
int main ( );
int i4_min ( int i1, int i2 );
void timestamp ( );
/******************************************************************************/
int main ( )
/******************************************************************************/
/*
Purpose
MAIN is the main program for MANDELBROT_OPENMP.
Discussion:
MANDELBROT_OPENMP computes an image of the Mandelbrot set.
Licensing:
This code is distributed under the GNU LGPL license.
Modified:
03 September 2012
Author:
John Burkardt
Local Parameters:
Local, int COUNT_MAX, the maximum number of iterations taken
for a particular pixel.
*/
{
int m = 500;
int n = 500;
int b[m][n];
int c;
int count[m][n];
int count_max = 2000;
int g[m][n];
int i;
int j;
int jhi;
int jlo;
int k;
char *output_filename = "mandelbrot_openmp.ppm";
FILE *output_unit;
int r[m][n];
double wtime;
double x_max = 1.25;
double x_min = - 2.25;
double x;
double x1;
double x2;
double y_max = 1.75;
double y_min = - 1.75;
double y;
double y1;
double y2;
timestamp ( );
printf ( "\n" );
printf ( "MANDELBROT_OPENMP\n" );
printf ( " C/OpenMP version\n" );
printf ( "\n" );
printf ( " Create an ASCII PPM image of the Mandelbrot set.\n" );
printf ( "\n" );
printf ( " For each point C = X + i*Y\n" );
printf ( " with X range [%g,%g]\n", x_min, x_max );
printf ( " and Y range [%g,%g]\n", y_min, y_max );
printf ( " carry out %d iterations of the map\n", count_max );
printf ( " Z(n+1) = Z(n)^2 + C.\n" );
printf ( " If the iterates stay bounded (norm less than 2)\n" );
printf ( " then C is taken to be a member of the set.\n" );
printf ( "\n" );
printf ( " An ASCII PPM image of the set is created using\n" );
printf ( " M = %d pixels in the X direction and\n", m );
printf ( " N = %d pixels in the Y direction.\n", n );
wtime = omp_get_wtime ( );
/*
Carry out the iteration for each pixel, determining COUNT.
*/
# pragma omp parallel \
shared ( b, count, count_max, g, r, x_max, x_min, y_max, y_min ) \
private ( i, j, k, x, x1, x2, y, y1, y2 )
{
# pragma omp for
for ( i = 0; i < m; i++ )
{
y = ( ( double ) ( i - 1 ) * y_max
+ ( double ) ( m - i ) * y_min )
/ ( double ) ( m - 1 );
for ( j = 0; j < n; j++ )
{
x = ( ( double ) ( j - 1 ) * x_max
+ ( double ) ( n - j ) * x_min )
/ ( double ) ( n - 1 );
count[i][j] = 0;
x1 = x;
y1 = y;
for ( k = 1; k <= count_max; k++ )
{
x2 = x1 * x1 - y1 * y1 + x;
y2 = 2 * x1 * y1 + y;
if ( x2 < -2.0 || 2.0 < x2 || y2 < -2.0 || 2.0 < y2 )
{
count[i][j] = k;
break;
}
x1 = x2;
y1 = y2;
}
if ( ( count[i][j] % 2 ) == 1 )
{
r[i][j] = 255;
g[i][j] = 255;
b[i][j] = 255;
}
else
{
c = ( int ) ( 255.0 * sqrt ( sqrt ( sqrt (
( ( double ) ( count[i][j] ) / ( double ) ( count_max ) ) ) ) ) );
r[i][j] = 3 * c / 5;
g[i][j] = 3 * c / 5;
b[i][j] = c;
}
}
}
}
wtime = omp_get_wtime ( ) - wtime;
printf ( "\n" );
printf ( " Time = %g seconds.\n", wtime );
/*
Write data to an ASCII PPM file.
*/
output_unit = fopen ( output_filename, "wt" );
fprintf ( output_unit, "P3\n" );
fprintf ( output_unit, "%d %d\n", n, m );
fprintf ( output_unit, "%d\n", 255 );
for ( i = 0; i < m; i++ )
{
for ( jlo = 0; jlo < n; jlo = jlo + 4 )
{
jhi = i4_min ( jlo + 4, n );
for ( j = jlo; j < jhi; j++ )
{
fprintf ( output_unit, " %d %d %d", r[i][j], g[i][j], b[i][j] );
}
fprintf ( output_unit, "\n" );
}
}
fclose ( output_unit );
printf ( "\n" );
printf ( " Graphics data written to \"%s\".\n", output_filename );
/*
Terminate.
*/
printf ( "\n" );
printf ( "MANDELBROT_OPENMP\n" );
printf ( " Normal end of execution.\n" );
printf ( "\n" );
timestamp ( );
return 0;
}
/******************************************************************************/
int i4_min ( int i1, int i2 )
/******************************************************************************/
/*
Purpose:
I4_MIN returns the smaller of two I4's.
Licensing:
This code is distributed under the GNU LGPL license.
Modified:
29 August 2006
Author:
John Burkardt
Parameters:
Input, int I1, I2, two integers to be compared.
Output, int I4_MIN, the smaller of I1 and I2.
*/
{
int value;
if ( i1 < i2 )
{
value = i1;
}
else
{
value = i2;
}
return value;
}
/******************************************************************************/
void timestamp ( )
/******************************************************************************/
/*
Purpose:
TIMESTAMP prints the current YMDHMS date as a time stamp.
Example:
31 May 2001 09:45:54 AM
Modified:
24 September 2003
Author:
John Burkardt
Parameters:
None
*/
{
# define TIME_SIZE 40
static char time_buffer[TIME_SIZE];
const struct tm *tm;
time_t now;
now = time ( NULL );
tm = localtime ( &now );
strftime ( time_buffer, TIME_SIZE, "%d %B %Y %I:%M:%S %p", tm );
printf ( "%s\n", time_buffer );
return;
# undef TIME_SIZE
}
#! /bin/bash
#
gcc -c -Wall -fopenmp mandelbrot_openmp.c
if [ $? -ne 0 ]; then
echo "Compile error."
exit
fi
#
gcc -fopenmp mandelbrot_openmp.o -lm
if [ $? -ne 0 ]; then
echo "Load error."
exit
fi
rm mandelbrot_openmp.o
mv a.out $HOME/binc/mandelbrot_openmp
#
echo "Normal end of execution."
Liczniki
edytujTutaj korzystamy z liczników będących zmiennymi globalnymi :
/*
2 basins, both superattracting
- exterior
- interior
- unknown ( possibly empty set )
pixel counters work not good with OpenMP !!!
*/
unsigned char ComputeColor_FatouBasins_superattracting (complex double z)
{
int i; // number of iteration
for (i = 0; i < IterMax; ++i)
{
// 2 basins, both superattracting
cabs2z = cabs2(z);
// infinity is superattracting here !!!!!
if ( cabs2z > ER2 ){ uExterior +=1; return iColorOfExterior;}
// second superAttraction basins. Here one of superattracting point is zp= 0
if ( cabs2z < AR2 ){ uInterior += 1;return iColorOfInterior;}
z = f(z); // iteration: z(n+1) = f(zn)
}
uUnknown += 1;
return iColorOfUnknown;
}
jest to spowodowane przeplataniem operacji odczytu/modyfikacji/zapisu różnych wątków, na przykład jednym możliwym przeplataniem:
wątek 1: odczytaj zmienną z pamięci do rejestru wątek 2: odczytaj zmienną z pamięci do rejestru wątek 1: rejestr przyrostu wątek 1: zapisz zmienną z rejestru do pamięci wątek 2: rejestr inkrementacji wątek 2: zapisz zmienną z rejestru do pamięci
efektem końcowym jest to, że niektóre przyrosty mogą zostać pominięte, w sposób niedeterministyczny
Rozwiązania:
- Użyj operacji atomowych (najłatwiejsza poprawka kodu)
- Lub dla większej wydajności (mniej rywalizacji o operacje atomowe), miej lokalne liczniki i połącz je na końcu
#pragma omp atomic
uInterior += 1;
void some_function(int *myExterior)
{
*myExterior += 1;
}
int main(int argc, char **argv)
{
int exterior = 0;
#pragma omp parallel for reduction(+:exterior)
for (int j = 0; j < 9002; ++j)
{
int myExterior = 0;
for (int i = 0; i < 9002; ++i)
{
some_function(&myExterior);
}
exterior += myExterior;
}
printf("%d\n", exterior);
return 0;
}
Błędy
edytujPoradniki
edytuj- commons:Category:OpenMP - kod uzywajacy OpenMP do tworzenia obrazów
- http://www.cmmsigma.eu/labs/parallel_comp/openmp_tutorial/index.pl.html
- https://gcc.gnu.org/wiki/Offloading
- https://computing.llnl.gov/tutorials/openMP/exercise.html
- https://gcc.gnu.org/wiki/openmp
- https://www.ibm.com/developerworks/aix/library/au-aix-openmp-framework/index.html
- https://docs.loni.org/wiki/Introduction_to_OpenMP
- Guide into OpenMP: Easy multithreading programming for C++. By Joel Yliluoma
- Fractal - Mandelbrot Set using OpenGL C++ OpenMP video by Marek Bruchatý
- beej blog : parallel-programming-openmp
- https://github.com/clang-omp/clang_trunk/tree/master/test/OpenMP
Źródła
edytuj- ↑ Technologia Epiphany - superkomputer dla każdego
- ↑ Mandelbrot set using OpenCL on Epiphany
- ↑ Tutorial: Parallel programming for Xeon and Phi at goparallel
- ↑ OpenMP in 30 Minutes By Joe Landman
- ↑ gcc wiki: openmp
- ↑ stackoverflow question how-to-check-the-version-of-openmp-on-linux
- ↑ libgomp
- ↑ askubuntu question how-can-i-install-openmp-in-ubuntu
- ↑ OpenMP w wikipedii
- ↑ OpenMP in 30 Minutes By Joe Landman
- ↑ OpenMP - kompilatory
- ↑ Kurs openMP - Paweł Przybyłowicz - asystent na Wydziale Matematyki Stosowanej AGH.
- ↑ 32 OpenMP Traps For C++ Developers 20.11.2009 by Alexey Kolosov, Andrey Karpov, Evgeniy Ryzhkov
- ↑ Common Mistakes in OpenMP and How To Avoid Them A Collection of Best Practices by Michael Suß and Claudia Leopold