There are probably very few ways of getting to know an architecture better than trying to optimize the hell out of a piece of assembly code written for it, so I think I know PIC12 quite well by now. It's a cute little architecture - a very small number of instructions (though could be smaller - why are TRIS and OPTION instructions rather than memory locations?). Most operations work on the W register and/or a memory location, so it's either like having a single register or 32 registers (depending on whether you consider the memory locations registers or not).
As you might expect, the W register does become a bit of a bottleneck on occasion (though not completely - you can set, clear and test individual bits of any memory location without affecting W, amongst a few other things). The upside is that there are so few instructions that it's easy to commit the architecture to memory.
It's a bit unfortunate that returning from a function always sets W to a constant - I very rarely found use for this feature and it seemed to get in the way fairly often. At the very least a RET variant which doesn't set W would have been helpful.
Most instructions are 1 cycles or 2 for jumps just as with the AVR8 architecture, however one of these "cycle"s is actually 4 clock cycles. This makes me suspect that "1 cycle operation" is mostly a marketing feature - they might have been able to make the instructions 2-5 clock cycles but making them 4 or 8 (while slower) makes cycle counting easier. It may also simplify the internal circuitry, and it may also improve code density (not so many NOPs needed to make different code paths take the same time).
It may be that the 8-bit AVR devices do something similar but also have an internal frequency multiplier so that they really do 1 cycle operation. I suspect that "multiply frequency by 4" circuitry is much more complicated and finicky than "divide frequency by 4" circuitry.
The architecture has call and return instructions, but it's prudent to avoid them in the most time-critical code, since they cost 4 cycles (2 for the call, 2 for the return). Also, the stack depth is very limited (just two return addresses) so often a continuation-stashing style can be useful (especially since an indirect jump to W is just a single cycle).