Numpyro with GPU works on Windows

Hi Folks,

The Numpyro documentation warns :
Limited Windows Support: Note that NumPyro is untested on Windows, and might require building jaxlib from source. See this JAX issue for more details. Alternatively, you can install Windows Subsystem for Linux and use NumPyro on it as on a Linux system. See also CUDA on Windows Subsystem for Linux if you want to use GPUs on Windows.

I just wanted to share my experience with Numpyro on Windows. I hope it may come useful to someone.

  1. The CPU-only version works perfectly with Windows Subsystem for Linux. This is by far the easiest option.
  2. Getting GPU to work on WSL is possible but I do not recommend it. At time of writing (March 2021) there is a known issue that makes the performace of CUDA on WSL very bad. If you want to try nonetheless, here are the instructions. Make sure you are enrolled in Dev channel in Windows Insider Program. CUDA on WSL User Guide
  3. GPU-backed Numpyro works on Windows! You will have to build jax and jaxlib from source but jax has useful instructions.

I have used this release: Jaxlib v0.1.61, Jax v0.2.9. Then I was able to install numpyro==0.5.0 via pip and everything just worked.

I have a rudimentary gaming GPU (Geforce 960M) and relatively good CPU (8-core, 16-thread i7) but, despite that I am getting 5-10 times speedup on matrix multiplication with jax+GPU vs numpy+CPU and comparable speedups on fitting a matrix factorization model with Numpyro.

So, to sum up, if you use windows and want CPU-only Numpyro, WSL will work just fine. For GPU support I recommend investing time in building jax from source on Windows.

Thanks to amazing Jax and Numpyro teams.

3 Likes

Thank you for sharing the tips, @Elchorro! Could you help us enhance the Limited Windows Support section in README? It would be nice to have a link there to point to this post. :slight_smile: edit: readme is updated now

Hi, I am very interested in running Numpyro in Windows.
I already have Visual Studio, CUDA, MSYS2, Bazel and Anaconda installed.
In MSYS2 I was able to use pacman install “patch” but for “realpath” I got: “error: target not found”.
Any suggestions? Thanks

Hi there.

I believe realpath is shipped with MSYS by default. Did you check is it is available from MSYS console (rather than anaconda prompt or powershell)? For me it is located in msys64\usr\bin and I remember I needed to add this folder to the path to complete installation of jaxlib.

Let me know how you are getting on.

Hi,
You are correct. “realpath --help” worked from MSYS.
I will continue and let all know my results. Thanks

Hi,
I got the following ERROR during compilation:
“ERROR: C:/users/inter/_bazel_inter/iofngzaw/external/com_google_protobuf/BUILD:301:11: Compiling src/google/protobuf/compiler/objectivec/objectivec_helpers.cc failed: (Exit 2): python.exe failed: error executing command”
Do you have any suggestions? Thanks!

The full output is the following:
(base) C:\Apps\jax>python .\build\build.py --enable_cuda --cuda_path=“C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1” --cudnn_path=“C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1” --cuda_compute_capabilities=“7.5” --cuda_version=“10.1” --cudnn_version=“7.6.3”

     _   _  __  __
    | | / \ \ \/ /
 _  | |/ _ \ \  /
| |_| / ___ \/  \
 \___/_/   \/_/\_\


Bazel binary path: C:\Apps\Bazel\bazel.EXE
Python binary path: C:/Users/inter/AppData/Local/Continuum/anaconda3/python.exe
Python version: 3.7
MKL-DNN enabled: yes
Target CPU features: release
CUDA enabled: yes
CUDA toolkit path: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1
CUDNN library path: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1
CUDA compute capabilities: 7.5
CUDA version: 10.1
CUDNN version: 7.6.3
TPU enabled: no
ROCm enabled: no

Building XLA and installing it in the jaxlib source tree...
C:\Apps\Bazel\bazel.EXE run --verbose_failures=true --config=short_logs --config=mkl_open_source_only --config=cuda --define=xla_python_enable_gpu=true :build_wheel -- --output_path=C:\Apps\jax\dist
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=80
INFO: Reading rc options for 'run' from c:\apps\jax\.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Options provided by the client:
  Inherited 'build' options: --python_path=C:/Users/inter/AppData/Local/Continuum/anaconda3/python.exe
INFO: Reading rc options for 'run' from c:\apps\jax\.bazelrc:
  Inherited 'build' options: --repo_env PYTHON_BIN_PATH=C:/Users/inter/AppData/Local/Continuum/anaconda3/python.exe --action_env=PYENV_ROOT --python_path=C:/Users/inter/AppData/Local/Continuum/anaconda3/python.exe --repo_env TF_NEED_CUDA=1 --action_env TF_CUDA_COMPUTE_CAPABILITIES=7.5 --repo_env TF_NEED_ROCM=0 --action_env TF_ROCM_AMDGPU_TARGETS=gfx803,gfx900,gfx906,gfx1010 --distinct_host_configuration=false -c opt --apple_platform_type=macos --macos_minimum_os=10.9 --announce_rc --define open_source_build=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true --spawn_strategy=standalone --strategy=Genrule=standalone --enable_platform_specific_config --action_env CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1 --action_env CUDNN_INSTALL_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1 --action_env TF_CUDA_VERSION=10.1 --action_env TF_CUDNN_VERSION=7.6.3
INFO: Found applicable config definition build:short_logs in file c:\apps\jax\.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:mkl_open_source_only in file c:\apps\jax\.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1
INFO: Found applicable config definition build:cuda in file c:\apps\jax\.bazelrc: --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:windows in file c:\apps\jax\.bazelrc: --copt=/D_USE_MATH_DEFINES --host_copt=/D_USE_MATH_DEFINES --copt=-DWIN32_LEAN_AND_MEAN --host_copt=-DWIN32_LEAN_AND_MEAN --copt=-DNOGDI --host_copt=-DNOGDI --copt=/Zc:preprocessor --cxxopt=/std:c++14 --host_cxxopt=/std:c++14 --linkopt=/DEBUG --host_linkopt=/DEBUG --linkopt=/OPT:REF --host_linkopt=/OPT:REF --linkopt=/OPT:ICF --host_linkopt=/OPT:ICF --experimental_strict_action_env=true
DEBUG: Rule 'io_bazel_rules_docker' indicated that a canonical reproducible form can be obtained by modifying arguments shallow_since = "1556410077 -0400"
DEBUG: Repository io_bazel_rules_docker instantiated at:
  C:/apps/jax/WORKSPACE:34:10: in <toplevel>
  C:/users/inter/_bazel_inter/iofngzaw/external/org_tensorflow/tensorflow/workspace0.bzl:105:34: in workspace
  C:/users/inter/_bazel_inter/iofngzaw/external/bazel_toolchains/repositories/repositories.bzl:37:23: in repositories
Repository rule git_repository defined at:
  C:/users/inter/_bazel_inter/iofngzaw/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //build:build_wheel (183 packages loaded, 16215 targets configured).
INFO: Found 1 target...
ERROR: C:/users/inter/_bazel_inter/iofngzaw/external/com_google_protobuf/BUILD:301:11: Compiling src/google/protobuf/compiler/objectivec/objectivec_helpers.cc failed: (Exit 2): python.exe failed: error executing command
  cd C:/users/inter/_bazel_inter/iofngzaw/execroot/__main__
  SET CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1
    SET CUDNN_INSTALL_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1
    SET INCLUDE=C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared;C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um;C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt
    SET LIB=C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64;C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64
    SET PATH=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\Current\bin\Roslyn;C:\Program Files (x86)\Windows Kits\10\bin\10.0.17763.0\x64;C:\Program Files (x86)\Windows Kits\10\bin\x64;C:\Windows\Microsoft.NET\Framework64\v4.0.30319;C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\Tools\;;C:\WINDOWS\system32
    SET PWD=/proc/self/cwd
    SET RUNFILES_MANIFEST_ONLY=1
    SET TEMP=C:\Users\inter\AppData\Local\Temp
    SET TF_CUDA_COMPUTE_CAPABILITIES=7.5
    SET TF_CUDA_VERSION=10.1
    SET TF_CUDNN_VERSION=7.6.3
    SET TF_ROCM_AMDGPU_TARGETS=gfx803,gfx900,gfx906,gfx1010
    SET TMP=C:\Users\inter\AppData\Local\Temp
  C:/Users/inter/AppData/Local/Continuum/anaconda3/python.exe -B external/local_config_cuda/crosstool/windows/msvc_wrapper_for_nvcc.py /nologo /DCOMPILER_MSVC /DNOMINMAX /D_WIN32_WINNT=0x0600 /D_CRT_SECURE_NO_DEPRECATE /D_CRT_SECURE_NO_WARNINGS /D_SILENCE_STDEXT_HASH_DEPRECATION_WARNINGS /bigobj /Zm500 /J /Gy /GF /EHsc /wd4351 /wd4291 /wd4250 /wd4996 /Iexternal/com_google_protobuf /Ibazel-out/x64_windows-opt/bin/external/com_google_protobuf /Iexternal/com_google_protobuf/src /Ibazel-out/x64_windows-opt/bin/external/com_google_protobuf/src /showIncludes /MD /O2 /DNDEBUG /D_USE_MATH_DEFINES -DWIN32_LEAN_AND_MEAN -DNOGDI /Zc:preprocessor /std:c++14 /DHAVE_PTHREAD /wd4018 /wd4065 /wd4146 /wd4244 /wd4251 /wd4267 /wd4305 /wd4307 /wd4309 /wd4334 /wd4355 /wd4506 /wd4514 /wd4800 /wd4996 /Fobazel-out/x64_windows-opt/bin/external/com_google_protobuf/_objs/protoc_lib/objectivec_helpers.obj /c external/com_google_protobuf/src/google/protobuf/compiler/objectivec/objectivec_helpers.cc
Execution platform: @local_execution_config_platform//:platform
Target //build:build_wheel failed to build
INFO: Elapsed time: 523.971s, Critical Path: 1.11s
INFO: 22 processes: 22 internal.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
  File ".\build\build.py", line 521, in <module>
    main()
  File ".\build\build.py", line 516, in main
    shell(command)
  File ".\build\build.py", line 51, in shell
    output = subprocess.check_output(cmd)
  File "C:\Users\inter\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "C:\Users\inter\AppData\Local\Continuum\anaconda3\lib\subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['C:\\Apps\\Bazel\\bazel.EXE', 'run', '--verbose_failures=true', '--config=short_logs', '--config=mkl_open_source_only', '--config=cuda', '--define=xla_python_enable_gpu=true', ':build_wheel', '--', '--output_path=C:\\Apps\\jax\\dist']' returned non-zero exit status 1.

Hi, if build without CUDA using:

python .\build\build.py I get another error:

ERROR: C:/users/inter/_bazel_inter/iofngzaw/external/zlib/BUILD.bazel:5:11: Compiling trees.c failed: (Exit 2): cl.exe failed: error executing command

Hi Elchoro:
Can you make JaxLib (for Windows) available for me to download?
You could start with a version without CUDA.
Thanks!

@caxelrud I think you can also find some help in this topic.

@caxelrud, I tried building cpu version right now and also having problems building it. I am not Jax’s developper, I apologize, I can’t help you. As fahiepsi mentioned Jax’s github is probably better forum to seek help.

A quick update a year on.

Issues with GPU support in WSL have now been resolved (at least for those using Windows 11), see here. The loss of performace (compared to native linux distribution) is negligable.

For those wanting to try it, follow this link to enable GPU. Critically:

Normally, CUDA toolkit for Linux will have the device driver for the GPU packaged with it. On WSL 2, the CUDA driver used is part of the Windows driver installed on the system, and, therefore, care must be taken not to install this Linux driver.

My code now runs faster on WSL than on Windows so I can recommend it.

Hi there,

Thanks for linking this. May I ask, does this also mean that I can now install jax with cuda on WSL in the same way as on a native linux system? Or do I still need to build jax form source to get it working on WSL?

Hey @Elchorro,
The updates you provided about the improved GPU support in Windows Subsystem for Linux (WSL) are particularly exciting. It’s great to hear that it has become a viable option for running Numpyro with GPU acceleration on Windows, especially with the negligible performance loss compared to native Linux distributions. Thanks for sharing with us!