There's no one reason for this (and no it's not the hardware primarily either - not in the literal sense people think anyway). But if we had to summarize primary reasons - these would be the most likely top 3:
Online proliferation. There's no mature/feasible solutions for doing large scale interactive physics online, at cost (no, Crackdown wasn't it either). Possibly this may never happen until cloud-costs come crashing down, or some massive company just swallows the costs long enough to make it happen.
Also I should point out - people tend to focus on destruction which is like... 90% cosmetic anyway(and actually relatively easy to do, even online). The hard stuff is proper physical interactivity, which is already pretty bad in single-player games (yes, even in HL2) but when you put it online becomes another order of magnitude harder to do, and a lot more expensive to operate (and thus a question of how to extract those costs from your users - the answer is usually, you don't).
Which brings us to the second point - physics compute costs, unlike most other processing in games, scale exponentially, not linearly with amount of 'stuff'. Eg. doubling processing might give you 2x the pixels on screen - but only 10% more physical entities to work with. Combined with the fact that 'non-cosmetic' physics doesn't really lend itself to different 'LODs' for variety of specs you run on, you are mostly stuck with lowest-common denominator to base your interactivity on - and the whole thing is pretty stuck/on a very slow progression curve.
Finally - there's the problem that increase in graphics fidelity greatly outpaces what is possible to dynamically simulate, and introducing low-fidelity simulation breaks immersion far worse than keeping things mostly static. Combined with the fact high-fidelity graphics often brings more static elements in early stages (eg. Nanite) and people mostly just looking at pretty pictures over anything else - justifications of investing in the space are few and far in between.
This list is by far not exhaustive - tons more that could be said, but the point is the forces at work are systemic, meaning market at large isn't likely to change anytime soon, outside of occasional title here and there that tries different things at the expense of all other expectations.
You’re mostly there but not quite on some of your points.
Most of the above can be and have been modulated in the past to provide layers of physics and interactivity that add value to the gameplay experience sufficiently and don’t cost the earth in terms of frame time.
The major reasons why we haven’t seen significant advancements in these areas over the last decade in line with advances in processing power are as follows:
1. Advancements in graphics have moved at pace with consumer expectations. To achieve ever increasing advancements in fidelity have required lots of trade offs which have meant GFX techniques orientated towards static methods instead of dynamic methods to keep the FPS up. E.g. static global illumination, static scene shadowing etc all put hard constraints on the ability to allow changes to environment meshes, props or other objects in the world. This basically kills the option of say environmental destruction, because you can’t re-bake at real-time.
It becomes a commercial decision too; either try to get your IP noticed by making it more shiny to please the fans at E3/Gamescom or add more interactivity & dynamism, risking inflating your production budget/time (more bugs due to more edge cases to play test and balance), delays as well as failing to really hit the bang for buck needed to get the title noticed and drive sales.
2. The proliferation of commodity game engines (Unreal, Unity). This is a big one. Dev times & budgets have increased from one generation to the next, increasing scope of customer expectations where they want more, bigger, shinnier and with more spectacle. In order to reduce execution risk as well as time to market, studios move over to market leading game tech with comprehensive feature sets, great tools and a large community of support (not to mention a well established talent pool who can hit the ground running).
Problem is this becomes a blessing and a curse. Suddenly studios are locked into said feature sets and it makes it much harder to innovate around adding further interactivity or dynamisms beyond what they constraints of the engine framework supports. Studios can always go off the beaten path and customise it, but it’s a huge risks as it can inhibit or add unnecessary complexity to the consumption of follow-on access to much needed support, bug fixes, upgrades and new features. So in order to mitigate risk, most studios will work with what the engines provide and rely on the engine provider to (hopefully) respond to requests for new functionality through their own product roadmap, which has its own independent set of priorities.
3. Finally there’s the pure economic cost of trying to implement more dynamism in big budget games. This has always been costly in terms of time to design, implement, create art for, refine, play test & balance, & it’s a problem set that incrementally increases the time it takes to do those things for each new piece of functionality layered in (for heavily coupled dynamic systems, you create scope for emergent behaviours and thus the permutations of edges cases increase non-linearly). This work is ever more costly when fidelity increases, as suddenly the same methods for e.g. destruction that may have worked before, may require even more cost in terms of simulation granularity/fidelity to not break the immersion of the overall visuals of the game-in-motion (e.g. more particles, more detailed models, more detailed effects, more granular physics simulations). This has a compounding effect on the total cost per frame whereby increasing graphics fidelity means you “have to” increase physics and dynamism fidelity, which say 8x increases the overall cost increment vis-a-vis say the previous gen (rough illustration).
On the flip side, the good news is with systems like granite and lumen in Unreal Engine 5, advances in compute have started to re-orient systems back towards dynamic models for GFX (lumen especially). Granite is streaming based, of which whilst it doesn’t support dynamic mesh modification yet, theoretically it’s possible (with the right framework for managing mesh modification writebacks to disk, streaming block re-organisation and stream cache invalidation/synchronisation).
The major push for this is largely around mitigating the cost of iteration during production for high fidelity content creation (over the past two gens, the increased use of static baking and static content have lead to heavy increases in iteration times, as the turnaround of getting a new asset from DCC tool into the game required longer and longer off-line baking cycles, and where every change required a re-submission of the asset through this pipeline, dramatically expanding dev times).
Hopefully this opens the door to much more dynamisms & interactivity moving forward, where the constraints in supporting more dynamic worlds at bleeding edge levels of graphics fidelity start to get rolled away. Thus opening the door to engine providers being able to invest more in supporting more advanced physics (e.g. Unreal Engine 5’s new “Chaos Physics” system) and more dynamic and interactive systems to further differentiate their own offerings, whilst at the same time allowing game devs to benefit hugely from these economies of scale and incorporate them more into their games, minimising cost and effort to achieve.
Further investment by the engine providers in tools and technologies to better assist in balancing these dynamic and interactive systems efficiently (cutting production costs) could also lead to a much wider explosion of inclusion of more dynamism in games, as developers can cut playtest and balancing cycles and allow themselves to plan and design more gameplay systems to incorporate.
Anyway that’s my 2 cents