I know I'm responding to these posts a good half a year late, but as SimonChiu has already bumped the post, I may as well give my two cents if Blu and Durante are still paying attention.
I'm well aware. I was programming the Intel SCC back when that was new. My point is that you have all those neat interconnects, and then when you need actual usable performance you buy a cluster of Xeons.
I haven't touched SCC, but I've had the, erm, pleasure to work with KNC, which has a ring bus, and is a material proof why that tech is bad choice for large-scale SMP. I'm eager to see what mesh they came up with for the KNL.
As re the cluster of xeons - I beg to disagree with that generalization : )
I would be willing to wager good money that this is all about yields. Basically, if your interconnect topology is of dimension greater than one, it's impossible to disable a single core in a way which leaves the chip topologically consistent (i.e. in a way which is invisible to software). A ring is a 1-dimensional torus, which makes it the lowest-latency topology where cores can be disabled individually to improve yields. If you go up to a 2-dimensional topology, say a simple 2D mesh (e.g. Epiphany) or a 2D torus (e.g. KNL) then you have to disable a full row or column of cores when a single core fails, and likely both a row and a column if two cores fail.
When you're producing large dies on immature low-yield processes, like Intel do with their big Xeons, maximising the number of usable dies per wafer, and maximising the value of those dies, is probably going to be more important than implementing the most efficient interconnect. To take a simple example, consider a 16-core Xeon die implemented either with a ring bus or a 2D torus. With a single faulty core, they can sell the ring-bus version as a 15-core chip, and the 2D torus version as a 12-core chip. With two faults, that reduces to 14-core and 9-core respectively. Three faults, 13-core and 6-core, etc. Intel can make more money selling higher core-count chips with less efficient interconnects than vice versa, so that's what they'll do.
Xeon Phi is different, both because the large number of cores make the interconnect bottleneck bigger, and because the large number of cores make the cost of disabling a full row or column proportionally lower (as it's the square root of the number of nodes). With Knights Landing they're going with a 6x6 torus of paired cores for 72 cores total on the full chip. I don't think they've talked about a binned version, but when they do I'm guessing it's going to have one row and one column disabled for 50 usable cores, allowing them to sell dies with two faulty nodes (or more if they're lucky with their relative positions).
The move to a more yield-sensitive topology may well partly explain why we're seeing KNL arrive so late. The 22nm KNC was fully available only around 6 months after the first 22nm Ivy Bridge CPUs, yet we're now around a year and a half after the first 14nm Broadwell chips, and we've only just seen the availability of KNL-based development systems start to trickle out. Of course, Intel's general yield issues on 14nm certainly won't be helping.