For me, WSL1 always worked fine on Windows for ARM. But Windows . This thread is about nested hypervisors. Windows without its own hypervisor-features is castrated, this is not about just a separate Ubuntu VM in Parallels (which is easy). Its about the hypervisor-based security features in Windows, VSCode WSL2 integration,... This is e.g. why I still have a separate physical Windows machine (nowadays a Surface Laptop 7), but I want to get rid of it and upgrade my M2 MacBook Air to an M4 Pro MacBook Pro instead of my current 2 machines.
Also the performance of WSL1 is bad. Here a comparison of llama.cpp llama-2 CPU-only on the M2 (4 p-cores) vs. Snapdragon X Elite (12 cores). M2 running Windows in Parallels and Ubuntu native in Parallels and in WSL1, Snapdragon running Ubuntu in WSL2.
TLDR: WSL2/Ubuntu runs as fast as native Windows on e.g. the Snapdragon, Parallels has massive performance impatcs. Windows+WSL1 being the worst impact.
llama-bench numbers run as in llama.cpp github issue #4167, but with current version of llama.cpp:
| model                          |       size |     params | backend    | ngl/CPU |          test |                  t/s |
| M2, MacOS 15.1 native -------- | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Metal,BLAS |   0 / 4 |         pp512 |         58.12 ± 2.41 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | Metal,BLAS |   0 / 4 |         tg128 |         14.99 ± 0.14 |
| M2, MacOS 15.1, Parallels 20.1.1, Ubuntu 24.04.1 ------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |       4 |         pp512 |         22.60 ± 0.58 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |       4 |         tg128 |         11.20 ± 0.92 |
| M2, MacOS 15.1, Parallels 20.1.1, Windows 11 24H2 -----: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |       4 |         pp512 |         22.18 ± 0.50 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |       4 |         tg128 |         12.40 ± 0.32 |
| M2, MacOS 15.1, Parallels 20.1.1, Windows 11 24H2, WSL1, Ubuntu 24.04 | ------: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |       4 |         pp512 |          9.39 ± 0.36 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |       4 |         tg128 |          6.33 ± 0.68 |
| Snapdragon X Elite, Windows 11 24H2 ------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |      12 |         pp512 |         63.53 ± 6.79 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |      12 |         tg128 |         20.60 ± 2.40 |
| Snapdragon X Elite, Windows 11 24H2, WSL2, Ubuntu 24.01  | ---------- | ------: | ------------: | -------------------: |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |      12 |         pp512 |         61.75 ± 8.94 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | CPU        |      12 |         tg128 |         21.48 ± 3.99 |
Click to expand...