Yes, the biggest boost to x86 performance in the move to 64 bit came mostly because they doubled the number of regsiters available to what was previously a very register poor ISA.
The ARM ISA has different legacy issues: the design makes it much more expensive than it otherwise would be to pipeline the CPU in order to get decent performance. The ISA was designed for an in-order CPU: at the time pipelining was something that mainframe / workstation class CPUs did. (ARM was designed in the very early 80s; Intel released a pipelined CPU in 1989 - the 486, and then only for simple instructions.)
For instance, every ARM32 assembly instruction has a bunch of condition codes which determine whether to execute the instruction depending on the state of the status bits in the program counter. You can also determine whether a given instruction will set the relevant status bits in the processor. This means you can do nice things like encode an if (R2 < 0) then (Add 1 to R3) else (Add 1 to R4) in just three instructions: one for the test, an instruction if the relevant flag is set & a second instruction to run if it's not set. No branches! You can also branch on any or all of the condition flags. This makes for very compact code. The trouble is that it's hell to pipeline because you have to keep track of all the possible states of the status bits and follow all the possible branch paths that result, whilst keeping track of all the dependencies.
They've also done things like simplify the exception handling so that the CPU needs fewer shadow registers, which again reduces power requirements.
The ARM ISA has different legacy issues: the design makes it much more expensive than it otherwise would be to pipeline the CPU in order to get decent performance. The ISA was designed for an in-order CPU: at the time pipelining was something that mainframe / workstation class CPUs did. (ARM was designed in the very early 80s; Intel released a pipelined CPU in 1989 - the 486, and then only for simple instructions.)
For instance, every ARM32 assembly instruction has a bunch of condition codes which determine whether to execute the instruction depending on the state of the status bits in the program counter. You can also determine whether a given instruction will set the relevant status bits in the processor. This means you can do nice things like encode an if (R2 < 0) then (Add 1 to R3) else (Add 1 to R4) in just three instructions: one for the test, an instruction if the relevant flag is set & a second instruction to run if it's not set. No branches! You can also branch on any or all of the condition flags. This makes for very compact code. The trouble is that it's hell to pipeline because you have to keep track of all the possible states of the status bits and follow all the possible branch paths that result, whilst keeping track of all the dependencies.
They've also done things like simplify the exception handling so that the CPU needs fewer shadow registers, which again reduces power requirements.