Welcome, Guest
Username: Password: Remember me

TOPIC: Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs"

Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs" 3 years 10 months ago #32

  • Alex
  • Alex's Avatar
Hello,

Am reading your blog post above (from
parallel-computing.pro/index.php/2-uncat...-on-nvidia-cuda-gpus).
Many thanks for this write-up, I didn't knew at all that this is
actually possible...

Anyway, I tried to follow your instructions, however, when trying to
run "make" in order to compile example.c test program, it complains
that:
LIBRARY_PATH=/opt/openmp4/llvm/build/bin/../lib clang-3.8 -fopenmp
-omptargets=nvptx64sm_20-nvidia-linux -g -O3 -std=c99 example.c -o
example
ptxas warning : Too big maxrregcount value specified 64, will be ignored
nvlink error   : Undefined reference to '__kmpc_kernel_init' in
'/tmp/example-2ef739-69142a.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_init_4' in
'/tmp/example-2ef739-69142a.cubin'
nvlink error   : Undefined reference to '__kmpc_for_static_fini' in
'/tmp/example-2ef739-69142a.cubin'
clang-3.8: error: nvlink command failed with exit code 255 (use -v to
see invocation)
make: *** [example] Error 255

I tried to use "nm" to find where these symbols may be reference, but
to no avail. So, any suggestions here? Note that my CUDA SDK is at
version 7.5, and that I've added "20" to "-DOMPTARGET_NVPTX_SM=30,35"
when running CMake for "libomptarget" as NVIDIA GPU on my laptop is CC
2.0 only.

Kind regards,
Alex
The administrator has disabled public write access.

Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs" 3 years 10 months ago #33

  • dmikushin
  • dmikushin's Avatar
  • OFFLINE
  • Administrator
  • Posts: 9
  • Thank you received: 1
  • Karma: 3
Hi Alex,

First of all, please add "-v" option to clang invocation:
LIBRARY_PATH=/opt/openmp4/llvm/build/bin/../lib clang-3.8 -fopenmp -omptargets=nvptx64sm_20-nvidia-linux -g -O3 -std=c99 example.c -o example -v

Somewhere close to the bottom of lengthy output you should see the line similar to:
"/opt/cuda/bin/nvlink" -o /tmp/example-1de069.so -g -v -arch sm_30 -lomptarget-nvptx /tmp/example-47cdfb-36af06.cubin -L/home/marcusmae/forge/openmp4/llvm/install/bin/../lib

- this is where linking fails in your case. It's device-side linking of GPU cubin produced by OpenMP4 backend against the runtime functions, that live in the static library libomptarget-nvptx.a (you should be able to see missing symbols defined in it). This library is a part of github.com/clang-omp/libomptarget , which you should have installed. I'd first try to make sure this library really exists in the path specified by LIBRARY_PATH. Apparently, nvlink silently skips the linked library if it is not found.

Best,
- D.
The administrator has disabled public write access.

Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs" 3 years 10 months ago #34

  • Alex
  • Alex's Avatar
Hi Dmitry,

Thanks for your prompt reply. I think I did all the steps right, but
that there are two issues with the procedure you described in your
blog post:

1. If libomptarget.* libs get copied into
$HOME/forge/openmp4/llvm/install/lib/ directory, I think you should
add $HOME/forge/openmp4/llvm/install/bin to PATH, so that "$(shell
dirname $(shell which clang-3.8))/../lib" in makefile can find
appropriate lib directory. You have "export PATH=..." twice in your
post, but same directory get added to PATH both times, so I guess you
meant to add $HOME/forge/openmp4/llvm/install/bin on second
invocation.

2. When I add "-v" to "clang-3.8" invocation in the makefile, I can
see that offending command is (as you noticed from my previous
message, instead of $HOME/forge/openmp4, I'm doing everything in
/opt/openmp4):
"/opt/cuda/bin/nvlink" -o /tmp/example-d63ff6.so -g -v -arch sm_20
-lomptarget-nvptx /tmp/example-ffa1bb-b73a50.cubin
-L/opt/openmp4/llvm/install/bin/../lib

Here, I think that the problem is that "-lomptarget-nvptx" should be
at the end, as otherwise "-L..." has no effect. However, even if I
execute commands from the log generated by make invocation step by
step, and if I change the link command so that I move
"-lomptarget-nvptx" at the end, same errors that I mentioned in my
first message still get reported. I checked with "nm" and indeed
omptarget-nvptx.a has these symbols defined, so at the moment I have
no further ideas about why else to try...

Regards,
Alex
The administrator has disabled public write access.

Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs" 3 years 10 months ago #35

  • dmikushin
  • dmikushin's Avatar
  • OFFLINE
  • Administrator
  • Posts: 9
  • Thank you received: 1
  • Karma: 3
Hi Alex,

1. Yes, both are exported when after LLVM+Clang installation:
$ export PATH=$PATH:$HOME/forge/openmp4/llvm/build/bin/
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/forge/openmp4/llvm/install/lib

The second time I add build/bin (note: build):
export PATH=$PATH:$HOME/forge/openmp4/llvm/build/bin/
- this is solely for llvm-lit to be found during openmp runtime compilation.

2. The order of -L's (uppercase) does not matter. The order of -l's (lowercase) does matter. Please feel free to send me your object file and libomptarget-nvptx.a such that I can track down the problem on my local machine. I'd expect that libomptarget-nvptx.a is not in the path, or contains no code for sm_20. I think you are missing the "install" part, i.e. instead of
LIBRARY_PATH=/opt/openmp4/llvm/build/bin/../lib
you should have
LIBRARY_PATH=/opt/openmp4/llvm/install/bin/../lib
The administrator has disabled public write access.

Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs" 3 years 10 months ago #36

  • dmikushin
  • dmikushin's Avatar
  • OFFLINE
  • Administrator
  • Posts: 9
  • Thank you received: 1
  • Karma: 3
Hi Alex,

I removed the second export line, as it is creating problems for which-based path retrieval. Everything should work fine without it.

Sorry about this and thanks for the report!
- D.
The administrator has disabled public write access.

Blog post "OpenMP 4.0 on NVIDIA CUDA GPUs" 3 years 10 months ago #37

  • Alex
  • Alex's Avatar
Hi Dmitry,

The problem was indeed in the fact that, for some reason, there was no
sm_20 code in libomptarget-nvptx.a. Namely, when I reverted back to
building test program for sm_30, the build went fine, and I was then
able to transfer the program to a machine with CC 3.0 capable card,
and run it successfully.

Thanks for you help, and thanks again for the blog post - this is
really an interesting development.

Best,
Alex
The administrator has disabled public write access.
Time to create page: 0.073 seconds