A 646Kb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE
All of us are now, of course, aware of the "cloud." In fact, a lot of folks are in a semi-panic because they are not "on the cloud" , sufficiently "using the cloud" and so forth. In this month’s column, I want to explore cloud deployment a bit and see where it makes sense for data production and exploitation.
My own company (GeoCue Group) has experience as both a user of cloud services as well as a developer of technology that is hosted "in the cloud." Our experience is primarily with Amazon Web Services (AWS) and, to a lesser extent, Microsoft Azure. We’ve been at this now for about 3 years and hence have gained some experience with deployment models.
In this discussion I want to focus on the "cloud" as an available infrastructure platform rather than as Software as a Service (SaaS such as Office 365). Considered from an infrastructure point of view, the cloud is basically a set of virtual machines, hosted off-premise, that you can rent. Amazon Web Services (AWS) is fundamentally an operating system (Linux, Unix or Windows) running on Xen (an open source virtual machine) hosted on Intel-based servers (NVidia GPUs are available as a configurable option) with direct attached storage. Amazon calls this basic unit an Elastic Compute Cloud (EC2). Of course, a bevy of optional services can be rented in conjunction with EC2 ranging from workflow services software to effectively infinite long term storage. Rather than list the virtues and sins of cloud computing, let me provide a few examples of places where I think this works and does not work. I do caution that you will get widely varying opinions on this!
To make wise decisions in determining if a particular use case is a candidate for cloud deployment, you need to understand or appreciate a few basics (this discussion is primarily based on AWS but most cloud providers have similar structural and pricing models). The basic cloud operations for a userdeployed solution are:
Push data via the net up to the cloud provider (free, slow)
Store the data (very inexpensive for long term, seldom accessed storage to pretty pricey for Solid State Disk attached to an EC2)
Compute (very economical for surge response but fairly pricey for an always on compute node)
Moving results from the cloud back to your facility, again via the net (very expensive, slow)
You can see from the above that any ideas of processing sensor data in the cloud for delivery to a customer just does not make financial or time sense. You are simply moving too much data too quickly for this to be either price or time competitive. Yes, of course it can be done but for a production shop that can predict its workload, it makes no sense.
Long term archive does make sense (at least with AWS pricing models). AWS offers a service called "S3 Glacier" that provides pricing per terabyte lower than you can locally store and protect data. We have recently switched our entire GeoCue backup to Glacier.
Multiple users viewing a lot of data but downloading modest amounts is another good application. We have developed a mine site collaboration service called Reckon that is AWS hosted (see Figure 1). We push view imagery (rather large) as well as analytic results to a mine owner’s partition in Reckon. The mine personnel view their sites, collaborate via redlining and download analytic data (such as volumetrics) but seldom download large data (such as imagery). This is an ideal application for cloud deployment. It grows as the customer base grows and the data remain static once pushed up to AWS.
Sometimes cloud is a good transitory environment when you need to rapidly deploy a solution and cannot predict the steady-state demand. We are developing a large hyperspectral processing and cataloging system for one of our customers. We know the daily take for this system (about 225 GB) but we have no idea as to the demand ramp for data processing. Obviously this will be low at first but then ramp up at a currently unknown rate over time. The system does "just in time" (JIT) data processing so we also do not know the processing load. Finally, we think that data downloads will be a small fraction of data uploads for quite some time. Thus this is an ideal application for AWS deployment. We are just starting with this project so look for a few progress reports in this column.
Finally, there is LIDAR (after all, LIDAR is at the core of this magazine!). We have an on-premises system for storing, browsing and downloading LIDAR data called "LIDAR Server" (www.lidarserver.com). This system has characteristics that are amenable to AWS deployment. A lot of data are pushed up to the cloud at the front-end but then remain static for some period of time. A large array of stakeholders browse the holdings via a JavaScript browser. Small areas of data are downloaded in an ad hoc manner by persons providing analytic services. I think you see the key here–if there is a massive, frequent download requirement, then AWS deployment probably does not make sense–it is just too expensive. In fact, this is the primary reason that Dropbox is moving from AWS to its own cloud system.
Do not expect to see a big reduction in IT staff expenses when you offload applications to the cloud. Everyone still needs a desktop computer that must be managed (recall the notion of the "web appliance" was a total failure). Running an AWS-deployed processing system is, in many ways, more complex than running a locally hosted system. Therefore, when you are doing a Return on Investment analysis, do not fool yourself into thinking you will save on staffing; that probably will not pan out.
In summary, do not panic if you do not have your production operations for LIDAR and/or imagery "in the cloud." The cloud is probably not a good solution to this particular use case. The cloud is, however, an excellent hosting environment for a very wide range of applications where you are dealing with some combination of unknowns, need to reach a large set of users, have security concerns or need to maintain large archives of data in a very reliable environment. I will keep you posted on our trials and tribulations from time to time but right now it looks like partly cloudy is pretty good weather!
Lewis Graham is the President and CTO of GeoCue Corporation. GeoCue is North America’s largest supplier of LIDAR production and workflow tools and consulting services for airborne and mobile laser scanning.
A 646Kb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE