Ik had dit nog in mijn handheld staan:
3D/OpenGL Video accelerator hardware programming specifics are usually very difficult to find documented anywhere because hardware manufacturers typically treat such information as confidential since knowledge of the internal operations can allow competitor manufacturers to gain an insight into the card's operations and propriety acceleration techniques.
All modern operating systems restrict direct hardware access to kernel-level processes/drivers. Writing a kernel-mode dnetc client is not something we are prepared to do because of the extensive design differences between supported operating systems.
Of course if low-level direct register manipulation is going to be done, it will be extremely video card specific.
Additionally, your video card would presumably already have a driver bound to it for normal display purposes. Interfacing in any way that allowed direct access to video card registers would require you to not loading your conventional video driver and instead use a "dnetc video driver", likely eliminating the ability to use that video card for normal display purposes.
It is doubtful that any 3D Accelerator cards would even contain any functionality that could be utilized in a meaningful way for accurate mathematical computation. Presumably most all of the operations provided by video cards are designed for eventual display rendering, and not general purpose math output.
For example, it might be necessary to use each pixel in a hidden buffer to represent an accumulator register. To multiply or divide by a scalar, an ambient light source of appropriate intensity might have to be created, taking into account the expected environment algorithms that the card might be applying. To average an arbitrary set of values with another set of values, you might need to creating an alpha blended texture map of the second set of values and be sure to achieve 1-to-1 pixel alignment by disabling perspective, disable texture map interpolation, and smoothing or output-enhancements.
In any case, the main CPU is having to perform additional computations to determine all of the parameters that might have to be supplied to the video card. And the video card is doing a lot of wasted matrix transformations or other operations that are targets strictly for visual output. It is also entirely unclear what level of interfacing will be possible/necessary. To complicate the issue, no technical documentation is publicly available for most of the commercial 3D Accelerators.
If you believe you have some information regarding the register-level operations that can be done, you are welcomed to attempt to put together your own core. Our client source code is publicly available for download.