Ik heb een performance probleem in een video decoding software (professioneel) welke nog met een 32-bit media decoding proces werkt. Ik heb wat interactie gehad met Gemini en ik vraag me af of hier waarde in zit. Ik vergelijk hierbij xVMS (64-bit native) met yVMS (32-bit media processen met een 64bits UI proces).
Waar ik vooral nieuwsgierig naar ben zijn de volgende statements:
yVMS uses multiple 32-bit processes (VideoPlayer.exe) to decode video. On a high-core-count Xeon, this creates a massive management overhead:
For a CPU to "talk" to a GPU, it maps a portion of the GPU's memory into its own address space.
Waar ik vooral nieuwsgierig naar ben zijn de volgende statements:
yVMS uses multiple 32-bit processes (VideoPlayer.exe) to decode video. On a high-core-count Xeon, this creates a massive management overhead:
- The Bottleneck: To keep the UI responsive, Windows must constantly swap focus between these dozens of 32-bit processes and the 64-bit main application. This is called Context Switching.
- Xeon Penalty: Xeons have high "inter-core latency." Moving data between cores (across the Mesh fabric) to sync all these tiny 32-bit processes is slow.
- xVMS Advantage: Because xVMS is a single 64-bit process, it keeps all its "worker threads" in one memory space. It doesn't have to ask the Windows Kernel to move data between different processes, which is significantly more efficient on Xeon’s many-core architecture.
- yVMS (32-bit processes): 32-bit applications generally cannot utilize the advanced instruction sets of modern Xeons, specifically AVX-512. They are often stuck using older, narrower instructions (SSE2 or AVX). This means for every "tick" of the CPU, yVMS might only process 128 bits of video data.
- xVMS (64-bit native): A 64-bit process can natively leverage AVX-512 (Advanced Vector Extensions). This allows the Xeon to process 512 bits of data in a single clock cycle.
- The Result: Even if the Xeon has a lower clock speed than the i9, xVMS is doing 4x more work per clock cycle than a 32-bit process can. This is why xVMS doesn't "choke" on the Xeon like yVMS does.
For a CPU to "talk" to a GPU, it maps a portion of the GPU's memory into its own address space.
- 32-bit Limit: A 32-bit process has a total addressable range of only 4 GB. If your GPU has 12 GB or 24 GB of VRAM, a 32-bit process can only "see" a tiny fraction (usually a 256MB window called an "aperture") at one time.
- The Xeon Penalty: To process 50+ cameras, the Xeon has to constantly shift this tiny 256MB window around to see different parts of the GPU memory. This requires "interrupts" and kernel calls that are much slower on a Xeon’s high-latency Mesh architecture than on an i9’s Ring bus.
- xVMS(64-bit): It can map the entire GPU memory into its address space simultaneously. No shifting, no interrupts, just direct high-speed access.