This is a writeup about a project I developed for extracting files directly from a zip archive in cloud storage, read on to find out how it came to be.

So... story time!. Some years ago, I used to work at a startup where we took aerial photographs of crops using drones and multispectral cameras, these photographs were used as the input for several algorithms that stitched them together andextraced information like plants health, humidity, nitrogen concentration and others, which where delivered as georeferenced images, including an aerial map of the whole area analyzed.

All those images where delivered as raster images (layers from here on) to our clients using a web map visualizer, and that's where the problems began. Layers where composed of huge amounts of files representing diferent quadrants of a map section for different zoom levels, also known as tile layers/maps in GIS software; the more zoom levels available and/or the biggest the area represented, the more data that had to be stored.

And storage was really a huge issue, can't quite remember how much of an area a typical map encompassed but they used huge amounts of space, ranging from 900MB to 2GB, sometimes even much more, and that's for a single layer; analysis results as shown to customers used at least two layers, and sometimes up to 5.

For the customers to view their maps in our website, required serving those raster layers as static files using a specific file structure, with several folders many levels deep and a huge amount of image tiles; then again, a huge amount of disk space.

The interesting part is that the layers in zipped form used just a tiny fraction of the uncompressed data, even though they where all images that already had compression. At this point I had the idea of serving the files directly from the archive, extracting the image on-the-fly; I was able to implement a simple proof of concept that worked with local files, but the local storage for the virtual machine was expensive and filling up fast. That pushed me to the next idea of not storing the zip archives locally but on cloud blob storage; sadly, I didn't have time to work on that at the moment.

Some time passed and the problem only got bigger, but in the meanwhile I also offered a new job that was not going to rejec, ...and didnĀ“t, so I had to leave the idea to rest. Fast forward some months and I got some free time to work on personal projects, opportunity that I took advantage of to revive the idea that gave life to this project.

What this does is, given a public stored zip archive in Azure Blob Storage and the full path to a file inside it, the service will send it to you in deflate encoding, in other words, the file is simply extracted from the zip file, and sent back to you without being decompressed, saving you egress bandwidth on the cloud provider side, and ingress bandwith for the device receiving it. Deflate encoded files are then decompressed by the target browser and presented to the user.

You can see it in action in the map below, it is currently being served from azure out of a zip file.

Needless to say, this is not the only use for this kind of functionality, but it is the problem that gave life to this idea.
Enrique CR
All posts