-O2 vs -O3 adds
-fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides
I don’t think any of these optimizations require more modern hardware?
I was reasonably certain, but left it open in case OP knew of some edge case where flags that are intended to be machine independent caused bugs on different architectures