Yeah, blu would be a much better person to ask. I don't really get the Flipper architecture. I simply assumed that each pixel pipeline ended in a ROP, and that those ROPs were part of the TEV unit (jumping to conclusions by looking at performance metrics). Now that I look at the die, that's probably not how it worked.
Ok, Flipper time.
Flipper's TEV is a part of Flipper's 'classic' pixel pipeline, which, as wsippel noted, used to end with a
ROP unit (and thus, 4 pipelines = 4 ROPs), but nowadays this rule is no more, i.e. the logical unit referred to as ROP is not 1:1 matched versus 'pipelines', mainly because 'pipelines' are lost as such. Some modern architectures don't even have dedicated ROP logic per se, e.g. NV's since G80 (IIRC) where the shader units have read access to the fb (technically, to an fb cache), so the ROP functionality is done via shader ops. Back to TEV, though. TEV is a very sophisticated 'texture combiner' unit - a unit that takes inputs from multiple textures (and interpolants from the rasterizer) and does blending operations between those, in a cascaded (sometimes looped-back) manner. TEV was tex combiner's 'swan song' - the 'missing link' between texture combiners and the early pixel shader hw (which pretty much was tex combiners on steroids) - the differences between TEV's blending stages and PS1.1-1.4 op slots are really not that big. Actually, in some aspects TEV can do more than those early PS units, i.e. TEV's 16 stages, every other of which can do a dependent texture read (aka EMBM read), versus PS 1.4's 8 + 8 (via loop-back) op stages, but only 6 tex addr registers.
So, the difference between a TMU and a TEV is that, well, they are orthogonal units. A modern-day TMU does:
(1) tex addressing computations, i.e. strq->uv0..n (not necessarily though - some architectures use shader ops for that)
(2) tex filtering (using the tex caches as raw input)
so the output from a TMU is a filtered texture sample, ready to be fed to the shader code. OTOH, TEV pretty much does the equivalent of shader code - for (1) and (2) Flipper has its dedicated logic (i.e. TF and TC blocks). So in essence, TEV is not a TMU - TEV is more akin to a shader unit. As whether TEV can do 4 pixels at a time, or there are 4x TEVs - it's really semantics.
So I guess you see now why adding a TEV to a modern shader architecture does not make much sense. For a current (read: unified shader architecture) GPU to successfully emulate Flipper, fat dependent texture read limits and fat low-latency texture caches are needed, not TEVs. IOW, adding Flipper's 1MB of tex cache and Flipper's TF and/or TC logic would contribute more to Flipper emulation than TEV.