The Move Engines most definitely help Durango with bandwidth, but they don't do so by giving Durango more bandwidth than what it already has. It helps Durango with bandwidth by using a max of up to 25GB/s of bandwidth from existing overall bandwidth to offload work from the rest of the GPU. That might not sound like a big enough number to some, but what's important about the Move Engines is that they can operate simultaneously with GPU computation, transferring data that would otherwise be transferred by using a shader, which will use notably more bandwidth speed than the Move Engines would need to. By having the move engines take care of some of this work, you remove the need for the GPU to do it on its own, which the GPU would utilize far more performance and speed doing, essentially giving the GPU more time to worry about other things than it would have without 4 Move Engines helping out.
During compute heavy situations, Move Engine operations are basically free because there is plenty enough bandwidth to spare to allow the Move Engines to do their job without taking away from bandwidth that is specifically needed elsewhere. During bandwidth heavy situations, move engine operations can still end up being effectively free if they use an alternate memory path than the one already being utilized at the moment by a shader. And even though while all 4 move engines are operating at the same time, they are sharing the same 25GB/s bandwidth speed between them all, this isn't necessarily a downside because you're able to transfer 4 separate streams of data in both directions at the same time at rather little bandwidth cost. The point of the Move Engines isn't to do things the fastest way possible, but to do very important things fast enough while using very little resources to do so, saving the GPU from having to do it itself, thus leaving it's available power and resources for other things.
So, if the purpose of the move engines is to get work out of the way so the GPU doesn't have to bother using even more bandwidth speed to get the job done, then they are indeed leading to more efficient use of memory bandwidth. That's the primary purpose of DMAs on GCN architecture and all 4 Move Engines have a DMA engine. Even the Radeon 7970 only has 2 DMA engines. And all 4 move engines can take care of another important task for the GPU, tiling and untiling, and it still goes on to have other important uses beyond that, but you get the point.