America is a nation of pioneers: if you own a house, you do not hire a network of handymen… you collect your tools and do your own maintenance. Some people can do everything with a Swiss army knife, but most people quickly find out that life is better when you use the proper tool for each job. Tool organization varies widely. Some people have a tool crate in a corner of their garage or basement, others organize them in drawers or hang them on a pegboard. It’s a lifestyle choice.
But there’s other clutter that has moved from physical to virtual. Records, invoices, receipts, photos, movies, letters, etc. are only kept as digital items. This is the metaphor: these digital items are knowledge worker tools — a presentation slide, a financial model spreadsheet, video simulation or a data analysis report.
Weeks ago: The Global File System
In the old days, each of us had a computer and we organized our digital items in a hierarchical file system on our local drive. Sometimes we had to share digital items, in which case we would use a file server. For example, in the early ’80s Russ Atkinson implemented the Cedar Global File System, which made the file server (including the trickle-charged remote backup servers) transparent to us users — it just looked like one big local disk.
The Global File System was like a big toolbox with drawers. You knew how it was organized, with drawers, separations in the drawers, which could again be subdivided. When you looked for a tool, you knew how your toolbox was organized and knew where to look for a tool. You had organized your digital items in a hierarchical file system with meaningful file names.
A paradigm shift came in 1988 when Mark Weiser came up with the concept of ubiquitous computing, where each of us would not have a computer, but many computers, ranging from a powerful workstation to a tiny pad computer, with the ability to seamlessly switch work from one of our computers to another.
Initially, ubiquitous computing took place in a walled garden, but when the Internet became public in the early ’90s, ubiquitous came to mean intergalactic. Unfortunately, Atkinson’s work had not been widely published and was not known. Instead of an Internet-scale global file system, we were bestowed with the cloud.
The implementation of the global file system had some complex implementation steps. For transparency, remote replication happened asynchronously via trickle-charging; when the server went down, we were not aware of it, because our data was sitting in the remote server, but the system had to ensure the file versions did not get mixed up. There was also the issue of managing all the file locks in a multi-user, multi-location context.
Yesterday: Shifting to the Cloud
The early history of cloud storage is somewhat nebulous, but the implementations were not file systems. They were object storage, where you can deliver a digital item with its metadata for storage and would get back an identifier like 0Bz-MX6JiBuh1QmhkZXlyQ3U3WG8 to retrieve it later. With this identifier you could later retrieve the digital item or delete it — but not modify it — and you could edit some of its metadata. In our metaphor, cloud storage is like the tool crate in the corner of the garage, instead of being a toolbox with drawers: it becomes arduous to find tools.
To reiterate, in the cloud each digital item is stored in an object. When an item is stored, its digital representation is transferred to the storage system, which finds free space on a drive and after storing the bits, returns an identifier like 0Bz-MX6JiBuh1QmhkZXlyQ3U3WG8. Whenever I need my course slides, I just connect to the cloud and the storage system retrieves object 0Bz-MX6JiBuh1QmhkZXlyQ3U3WG8 using a REST protocol.
One drawback is that I cannot update my presentation. When I need to edit a slide, I have to upload a new digital item and get a new object ID. My wet memory is not good at remembering identifiers like 0Bz-MX6JiBuh1QmhkZXlyQ3U3WG8. When I look at my drive in the cloud, I have many versions of the presentation. I can add metadata like a label with a name and an upload date. However, I have various versions of the presentation for different audiences and events, and it becomes hard to tell them apart.
I could do the same thing as the people with the tool crate in the garage, but search does not work well, because these presentations are all very similar. Worse, in my personal storage I have 101,014 digital items, too many to survive on an object storage system. I need to organize them. The best data structure is a tree, because it has a root to which I can always go back and between any two nodes there is exactly one path.
More recent versions of cloud storage attempt to imitate the structure of a hierarchical file system. So, within limits, I can choose object names that could appear to be hierarchical pathnames. Still, this is not a good surrogate for a real hierarchy. There is no concept of directories, and it is hard to manage all the items with a common stem. Renaming a “directory” part of an object name is not supported, and so on.
For your digital life, you do not want an object storage system but a (tree) file system, i.e., you want to be at a higher abstraction layer.
In cloud storage this is accomplished with a client. When you install and run this client on your computer, instead of using the Web browser for managing your digital items in the cloud, the client creates a local subtree with your digital items on the cloud copied in your local file system as a special subtree in your home directory. Now you can edit a digital item like any local file. In the background, when you save your file, the cloud client uploads it to the cloud as a new version of the existing digital item, while the clients on your other computers in the background download the new version to replace the previous version on all the local drives.
Today: Distributed File Systems
This works reasonably well in the old context of ubiquitous computing. However, it no longer reflects the reality of our working habits in this millennium. We no longer have a 1:n relationship from one digital item in the cloud to n computers on which we process the digital item. In a community there are X people who all have digital items and need to share them to accomplish their work as a team. Thus, we really have a 1:nx relationship.
In Atkinson’s old global file system, this made sense because when you worked with shared files, you explicitly “brought collections of digital items over” and “s-modeled” them when you were done. With object storage in the cloud, you quickly get a local storage problem. At enterprise scale you cannot fit all of a company’s shared files on a 500 GB laptop hard drive. Some enterprise cloud storage systems solve this problem, but not the basic cloud clients.
The second big problem with object storage in the cloud is that today we no longer work with one cloud, but with many. In my case, I regularly use five clouds:
- My home NAS, which I use like in the old days of ubiquitous computing
- Google Drive Peaxy for work (paid subscription)
- Another Google Drive I use for volunteer work
- Dropbox Peaxy for work
- Another Dropbox I use for volunteer work
I believe this is a pretty typical setup, call it “bring your own device” (BYOD) IT. Some people may even have more cloud services from Apple and Microsoft. The problem is that that in the basic versions, each client allows you to mount only one of its clouds. This means that when I already have a Dropbox for collaborating on papers in color science, I cannot also use the work Dropbox. I am forced to use the Web interface. The same holds for Google Drive, because my local disk is too small to mirror the whole work drive.
In either case, I have to use the ugly Web interface with the ugly object identifiers like MX6JiBuh1QmhkZXlyQ3U3WG8 instead of ergonomic file names, and even when pseudo-hierarchical file names are available, the organizational advantages of directory hierarchies are not really there. What I really want is not object storage, but a distributed file system like Peaxy Aureum that brings me all the files across all the servers in one common interface.