MENU

Executive Series: Hadoop Technical Overview

October 1, 2016 Hadoop, Peaxy Aureum, Executive Series

What is Hadoop? Hadoop is a distributed architecture and infrastructure for storing and processing Big Data. See the previous post, What is Hadoop and Why Should I Care? What are the challenges of building a system like Hadoop? Mainly, reliability and scalability. Suppose the data is distributed across 1,000 commodity computers. With that many nodes, the […]

Continue Reading

Executive Series: What is Hadoop and Why Should I Care?

September 1, 2016 Hadoop, Peaxy Aureum, Executive Series

What is Hadoop? Why do I need to know about it? Suppose your company collects a lot of data—not just gigabytes but terabytes or petabytes. To make that data useful, you need a system to store all that data reliably and retrieve and manipulate it quickly. Hadoop is a distributed architecture and infrastructure for storing and […]

Continue Reading

Better Storage for Hadoop

July 15, 2016 Hadoop

Hadoop is able to make solving many real-world problems much easier, but sometimes the system design can mean unnecessary duplication of data and limited access. One example is in the digitization of paper-based content. Rasterizing PDF in commercial printing One of the key features of a PDF is that each page is independent of all other […]

Continue Reading

Hadoop and Paper-Based Digitization

March 15, 2016 Hadoop, Digitization, Aureum

Hadoop is able to make solving many real-world problems much easier, but sometimes the system design can mean unnecessary duplication of data and limited access. One example is in the digitization of paper-based content. Rasterizing PDF in commercial printing One of the key features of the PDF print description language is that each page is […]

Continue Reading