I think most people don't even consider the possibility of reaching peak-write performance approaching that problem at first(it's a pretty big departure from earlier problems), which limits the things you tend to think about.
The final cycle trick is neat, but like you said, you inevitably reach it by elimination, as there's only so many things you can write on first clock,
.
The final cycle trick is neat, but like you said, you inevitably reach it by elimination, as there's only so many things you can write on first clock,
and when your solution is <20 instructions you'd already be in the hot/warm area of what you use in the first place