Wow that sounds like some black magic wizardry. How can a hardware-independent multiplatform API give the kind of low-level optimizations and performance of hardware-specific API? Sounds like something that breaks the very definition of the thing. And is there still an incentive for console makers to develop proprietary APIs?
Others would be much better qualified to explain this, but my understanding comes down to a few different points:
- Older APIs like OpenGL and DX11 had a lot of overhead (e.g. expensive draw calls) and limitations (e.g. running off a single thread), that have been dropped with Vulkan. Both OpenGL and DirectX originated in a time of single-core CPUs and fixed function GPUs, so Vulkan can benefit a lot from being built from the ground up with modern hardware in mind.
- Modern GPU architecture isn't actually all that different from each other. Fundamentally all GPUs from AMD, Nvidia, Intel, ARM, etc., are built around a collection of arbitrarily programmable SIMD arrays, with an assortment of relatively standard fixed function hardware for handling texture sampling, rasterisation, etc. The specifics of each of these blocks will change a bit from one GPU to the next, but the core paradigm is the same. As such, a lot of what we may think of as "low-level" optimisation, is actually independent of any particular hardware.
- Much hardware-specific optimisation can be done within a hardware agonistic API. For example, optimising your workloads around the wavefront/warp sizes, or the LDS capacities of the GPU in question.
- For anything which isn't properly exposed by the API, the hardware designer can just implement is as an extension to the API. I'd expect Switch's NVN to have a lot of these, such as providing support for conservative rasterisation (not that I expect much use of it on Switch), and perhaps more explicit control over how the GPU performs tiling.
I'd liken it to the difference between high-level and low-level programming languages. C++ is a much lower-level language than, say Visual Basic, but doesn't need to limit itself to just one CPU to do that. It requires a better programmer to utilise it properly, and to really get the most out of a fixed platform you would want to understand the CPU you're working on, but it still does its job right across different CPUs and ISAs.
I was talking about ID giving a keynote. They like to talk about the magic they pull off. If Doom on Switch is a solid 60 and looks comparable to the bigger consoles (like the reveal video), they'll want to brag about how they did it...
Ah, I misinterpreted you. I still wouldn't expect Nintendo to allow it, I don't think I've ever seen a GDC talk that covers technical details of a Nintendo platform (although admittedly we haven't seen envelope-pushing engines on Nintendo platforms for quite a while).