Mac x86 with Apple Hypervisor Rail CPU & Unresponsive

Discussion in 'macOS Virtual Machine' started by TreyB2, Feb 18, 2022.

  1. TreyB2

    TreyB2 Bit poster

    Messages:
    3
    We setup Parallels for Mac Business a few months ago and have been navigating around optimal settings for 24/7/365 uptime for software build agents within R&D.

    After hitting uptime performance issues with the Parallels Hypervisor and rare QEMU display crashes on VMs, we recently found the same configuration with the Apple Hypervisor to be much more performant and retain performance with uptime.

    However, out of 20 VMs total per week, we seem to hit a situation in which the Mac VM uses 100% of all CPUs allocated and is effectively unresponsive and dead. I have been resolving this by stopping the VM and starting it. From our build history in Azure DevOps, thus far every time it has happened there were no builds going on. As you can see from other VMs, normally idle VMs use very little CPU. So something seems to randomly starve all threads allocated and effectively kill the VM.
    upload_2022-2-18_15-22-31.png

    Is there any logging from Parallels or Apple's hypervisor to narrow this down? We have both 10.14 Mojave and 12.x Monterey VMs. We're in the process of migrating everything to Monterey, and I'm waiting on it reproducing again on Monterey before I bother reporting this to Apple. But it would be nice to know where things are going wrong - on Parallels side or Apple's Hypervisor.

    I understand this market is incredibly niche and Apple is changing things rapidly, so I doubt there will be a true fix, but at least I'm doing my due diligence of reporting this and narrowing it down before I script Ansible to detect this, restart the VM, and update Azure Pipelines to auto-retry on Mac disconnect - which is a bit ridiculous but I suspect what will be needed.
     
  2. TreyB2

    TreyB2 Bit poster

    Messages:
    3
    Update:
    So we discovered that the Apple Hypervisor is far more brittle than we realized. I developed pipelines to spin up VMs and cache a couple hundred GiB of source control files which takes about 3 hours. Out of 10 runs, Apple froze 8 times. I then repeated with Parallels 5 times, it passed all 5. So my assumption that it randomly freezes was incorrect, and we didn't realize how unstable it was due to low build activity.

    During this discovery, Parallels Support suggested enabling Adaptive Hypervisor, which we did and it did not change the rate of the Apple hypervisor freezing.

    For anyone that's trying to optimize uptime performance, what we found:
    • There is irrecoverable decay in VM performance with weeks of Parallels uptime. We resolved this by scheduling a pipeline to gracefully shut down VMs and restart Parallels during our already planned IT maintenance window.
      • The performance decay is non-trivial. Snippets of one of our larger build jobs increased in execution time gradually to about 7x. After which, Parallels becomes unstable, sometimes activity monitor shows it as not responding. Clearly some sort of thread starvation is occurring.
      • I disabled the update check because that appeared to add an attended popup on unattended startup of Parallels.
      • The host that originally had the trial license of Parallels was missing the plist needed to startup Parallels unattended. I tried creating it manually, and Mac OS would delete it within minutes each time. I resolved this by reinstalling Parallels. Fortunately, the control center inventory of VMs was entact.
    • There is ephemeral decay in performance with 8+ VMs on a single host when heavily used
      • The same benchmark mentioned above would increase 3-4x in execution time with 10 VMs
      • But once VMs became idle or we shutdown a couple, performance restored.
    Hopefully this is useful to someone. My impression is that type 2 hypervisors (eg, lives within the host's OS) are fundamentally more prone to error because in this case Parallels cannot tightly control the stability of the Host OS or the version. Type 1 hypervisors like Proxmox or ESXi are much more stable, but no such class of hypervisor is available on Mac OS right now.
     
    DennisG8 and warnergt like this.
  3. TreyB2

    TreyB2 Bit poster

    Messages:
    3
    Just to follow up, the workaround above mostly works but we found that 6 VMs is the limit for 99.99% uptime. Seemingly something in a shared part of the Parallels implementation can get obsessed about something in 1 VM and thread starve the others.

    Fortunately, we were able to upgrade our Mac Pro hosts to Mac OS Monterey 12.4 from Catalina, and we found the Apple hypervisor freeze bug was fixed. I'm assuming this hypervisor is implemented by Apple or is somehow conditionally versioned in the Parallels software because we only updated the OS of the host, not Parallels. I bootstrapped and cached our git/p4 repos 20 times. This process takes about 3 hours, and with the host as Catalina, it would freeze 80% of the time before completing. Not a single freeze with Monterey. Furthermore, we've had 10 VMs running without issue for over a week in production after running a torture test for a day on all 10 to verify things are stable.

    Now if only Apple silicon had all of the features.
     

Share This Page