Untitled

Permalink
“Cloudscale. [McColl is the CEO of Cloudscale. See his bio below.] For realtime analytics on big data, it’s essential to break free from the constraints of batch processing. For example, if you’re looking to continuously analyze a stream of events at a rate of one million events per second per server, and deliver results with a maximum latency of five seconds between data in and analytics out, then you need a real-time data flow architecture. The Cloudscale architecture provides this kind of realtime big data analytics, with latency that is up to 10,000X faster than batch processing systems such as Hadoop. Applications: Algorithmic trading, fraud detection, mobile advertising, location services, marketing intelligence. MPI and BSP. Many supercomputing applications require complex algorithms on big data, in which processors communicate directly at very high speed in order to deliver performance at scale. Parallel programming tools such as MPI and BSP are necessary for this kind of high performance supercomputing. Applications: Modelling and simulation, fluid dynamics. Pregel. Need to analyse a complex social graph? Need to analyse the web? It’s not just big data, it’s big graphs! We’re rapidly moving to a world where the ability to analyse very-large-scale dynamic graphs (billions of nodes, trillions of edges) is becoming critical for some important applications. Google’s Pregel architecture uses a BSP model to enable highly efficient graph computing at enormous scale. Applications: Web algorithms, social graph algorithms, location graphs, learning and discovery, network optimisation, internet of things. Dremel. Need to interact with web-scale data sets? Google’s Dremel architecture is designed to support interactive, ad hoc queries over trillion-row tables in seconds! It executes queries natively without translating them into MapReduce jobs. Dremel has been in production since 2006 and has thousands of users within Google. Applications: Data exploration, customer support, data center monitoring. Percolator (Caffeine). If you need to incrementally update the analytics on a massive data set continuously, as Google now has to do on its index of the web, then an architecture like Percolator (Caffeine) beats Hadoop easily; Google Instant just wouldn’t be possible without it. “Because the index can be updated incrementally, the median document moves through Caffeine over 100 times faster than it moved through the company’s old MapReduce setup.” Applications: Real time search.”
Posted on