A shift in the computer industry has occurred. It wasn't a shift that happened yesterday -- the year was 2005 and Moore's Law took a deviation from the path that it had been traveling on for over 35 years. Up until this point, improved processor performance was mainly due to frequency scaling, but when the core speed reached ~3.8GHz, the situation quickly became cost prohibitive due to the physics involved with pushing beyond this barrier (factors such as core current, voltage, heat dissipation, structural integrity of the transistors, etc.). Thus, processor manufacturers (and Moore's Law) were forced to take a different path. This was the dawning of the massive symmetrical multiprocessing era (or what we refer to today as 'multicore').
The shift to superscalar symmetrical multiprocessing (SMP) architectures required a specialized skill set in parallel programming to fully realize the performance increase across the numerous processor resources. It was no longer enough to rely on frequency scaling for better application response times and throughput. More than a decade later, a severe gap persists in our ability to harness the power of multicore, mainly due to either a lack of understanding of parallel programming or the inherent difficulty in porting a well-established application framework to a parallel programming construct.
Perhaps virtualization is also responsible for some of the gap since the entire concept of virtualization (specifically compute virtualization) is to create many independent virtual machines whereby each one can run the same application simultaneously and independently. Within this framework, the demand for parallelism at the application level may have diminished since the parallelism is handled by the abstraction layer and scheduler within the compute hypervisor (and no longer as necessary for the application developer -- I'm just speculating here). So, while databases and hypervisors are largely rooted in parallelism, there is one massive area that still suffers from a lack of parallelism - storage.
THE PARALLEL STORAGE REVOLUTION BEGINS
In 1998, DataCore Software began work on a framework specifically intended for driving storage I/O. This framework would become known as a storage hypervisor. At the time, the best multiprocessor systems that were commercially available were multi-socket single-core systems (2 or 4 sockets per server). From 1998 to 2005, DataCore perfected the method of harnessing the full potential of common x86 SMP architectures with the sole purpose of driving high-performance storage I/O. For the first time, the storage industry had a portable software-based storage controller technology that was not coupled to a proprietary hardware frame.
In 2005, when multicore processors arrived in the x86 market, an intersection formed between multicore processing and increasingly parallel applications such as VMware's hypervisor and parallel database engines such as Microsoft SQL and Oracle. Enterprise applications started to slowly become more and more parallel, while surprisingly, the storage subsystems that supported these applications remained serial.
The serial nature of storage subsystems did not go unnoticed, at least by storage manufacturers. It was well understood that at the current rate of increase in processor density coupled with wider adoption of virtualization technologies (which drove much higher I/O demand density per system), a change was needed at the storage layer to keep up with increased workloads.
In order to overcome the serial limitation in storage I/O processing, the industry had to make a decision to go parallel. At the time, the path of least resistance was to simply make disks faster, or taken from another perspective, make solid state disks, which by 2005 had been around in some form for over 30 years, more affordable and with higher densities.
As it turns out, the path of least resistance was chosen, either because alternative methods of storage I/O parallelization were unrealized or perhaps there was an unwillingness by the storage industry to completely recode their already highly complex storage subsystem programming. The chosen technique, referred to as [Hardware] Device Parallelization, is now used by every major storage vendor in the industry. The only problem is that it doesn't drastically address the fundamental problem of storage performance which is latency.
Chris Mellor from The Register wrote recently in an article, "The entire recent investment in developing all-flash arrays could have been avoided simply by parallelizing server IO and populating the servers with SSDs."