Out of order execution enables a CPU to execute instructions in a different order than the one specified in the program, as long as that is allowed by their dependencies. It uses a number of additional "hidden" (not visible in the ISA) registers and an instruction queue to store operations and intermediate results. It's one of many techniques in a modern CPU to increase throughput by hiding instruction latencies and increasing the amount of instruction-level parallelism (ILP) and thus improve instructions per clock (IPC).
Basically, the worse (in terms of instruction scheduling) the code generated by a compiler (or written by a human) is, the larger the benefit of OOE. However, there are also situations where a CPU capable of OOE can perform better (clock-for-clock) than one without OOE, even if an oracle (perfect) instruction scheduler was available to the compiler.
Conversely, with in-order processing, the quality of the binary code matters far more, since the CPU hardware can not mitigate bad instruction schedules. All of this of course makes the idea that developers are too used to in-order architectures and thus incapable of "optimizing for OOE" hilarious.
Ba-dum-ching.