Cosmin Lehene wrote a great 2-part post about his team's experience with HBase. Here are links to part 1 and part 2. I hear one of his partners-in-crime, Andrei, is working on another interesting post on performance testing related to this work.
I’ve been looking into Intel’s thread building blocks during the early morning hours here in Bucharest (jet-lag) and ran across an interesting library that provides non-blocking, lock-free, wait-free, synchronization mechanisms.
Check out this tutorial page with small code snippets and some sample pipelines/farms:
“FastFlow is a parallel programming framework for multi-core platforms based upon non-blocking lock-free/fence-free synchronization mechanisms. The framework is composed of a stack of layers that progressively abstracts out the programming of shared-memory parallel applications. The goal of the stack is twofold: to ease the development of applications and make them very fast and scalable. FastFlow is particularly targeted to the development of streaming applications.”
“Fastflow is implemented as a template library that offers a set of low-level mechanisms to support low-latency and high-bandwidth data flows in a network of threads running on a cache-coherent multi-core. On these architectures, the key performance issues concern memory fences, which are required to keep the various caches coherent. Fastflow provides the programmer with two basic mechanisms: efficient communication channels and a memory allocator. Communication channels, as typical is in streaming applications, are unidirectional and asynchronous. They are implemented via lock-free (and memory fence-free) Multiple-Producer-Multiple-Consumer (MPMC) queues. The memory allocator is built on top of these queues, thus taking advantage of their efficiency.”
I'd like to have seen a larger set of tests thrown at this one, but you have to love all the auto-enthusiast references in this anandtech.com review of the new 12-core Opteron versus the newer 6-core Xeon.
That's two, 6-core Instanbul chips bolted together. Reminds me a bit of the Pentium D with a much larger cache coherency problem (imagine how much of a problem this is going to be as we keep adding cores to chips).
WD has released their next generation VelociRaptor (10K RPM, 2.5" disk). It has a new 6Gbps interface and 600 GB of space. There's an interesting review comparing this disk versus a couple of non-enterprise SSDs here.
Speaking of 24 core motherboards with loads of RAM, I ran across this new SuperMicro motherboard the other day when doing some research. It's truly terrifying how many cores and RAM you can toss onto one box now.
Assuming one core is dedicated to a Dom0, you could have 23 VMs each with a dedicated core and over 8GB or RAM if you add all 192GB of RAM.
Here are some specs from the link above:
Quad Intel® 64-bit Xeon® MP Support 1066 MHz FSB
Intel® 7300 (Clarksboro) Chipset
Up to 192GB DDR2 ECC FB-DIMM (Fully Buffered DIMM)
OCZ is ready to mass produce it’s PCI-e SSDs with field replaceable MLC NAND flash modules.
This makes the MLC versus SLC debate a bit moot if you can just replace the NAND when it wears out like a bad disk. Did I mention that it has 8 separate Indlinx controllers, up to 2TBs of space, and has peak transfer rates of 1.4GB/s for reads and writes (that’s gigabytes not gigabits)? I can’t imagine what will happen with a Sandforce controller version of one of these monsters.
This is some seriously interesting temporary storage for a virtualization cluster that needs some fast DAS. With 2 TB, you could carve up 87 gigabytes for 23 VMs on a 24-core virtualization box. That’s mighty interesting.