Data Safety With Nanobox: Security and Control

Any time you run an application that stores data provided by other people, you want to ensure the safety of that data. This is a concern with multiple aspects. For one, you need to ensure that the data is secure, and can't be easily obtained by unauthorized parties - but is still accessible to authorized users. For another, you need to ensure you can recover the data in case of loss. You need to ensure you have the space to store all the data you will be entrusted with for the entirety of its expected lifetime, which is generally indefinitely. And so forth.

In this series of articles, I'm going to cover how Nanobox can help you with each of these concerns, one at a time, and how to go about handling them. This article in particular will focus on data security, and can be used with or without any of the others. The focus will be on best practices, but I will try to include some alternative approaches as well in case the best practice doesn't fit your specific use case.

First, though, for those who aren't already aware, Nanobox is a tool for doing devops tasks so you don't have to. It sets up a development environment unique to your app, and completely isolated from the rest of your system. It lets you specify what that environment should look like (which packages to install, how to configure them, etc), how to set it up, and how to assemble your code, and it lets you do it in a way that any other developer can simply pull down your code and fire up Nanobox on their own system, and get the exact same setup you have on yours. It also lets you deploy your code to one of several cloud hosting platforms — choose from AWS, Digital Ocean, Linode, and more coming soon, and even switch to a different provider at any time — using the exact same environment you have in development (well, you can reconfigure things on their way from development to production, and you usually will, but otherwise everything is exactly the same). In addition, it helps enforce (or at least automatically implement) best practices in every aspect of your app's infrastructure. It's a complete devops tool, which lets you focus on your app, rather than the environments it will run in.

Why Worry About Security?

Some projects justify the need for data security by their very nature. Anything that handles data about money, health, or anything else explicitly protected by law. Projects designed to handle sensitive information of any kind - legally protected and legally subversive alike.

Others, it's not so obvious. If every piece of data you collect is published publicly on your site, or is otherwise freely accessible, especially without needing to log in, it may be hard to see why data security is even a concern. Other aspects of data safety, sure, but security? Why encrypt anything? Why put up access controls to prevent outsiders from getting in?

For one, access controls do more than prevent unauthorized use of systems. They also allow site owners, operators, and other administrators to keep track of who is doing what, and how. The information can be instrumental in tracking down problem users and undoing their actions (since you know what they did), as well as limiting their access to cause further harm. It can also be useful when assisting users in recovering information about activities performed using their accounts. It doesn't just provide authorization, but also identification and authentication.

There are a number of articles, books, and other documents available to explain the importance of security, and many of them go into far more detail than we have room for, here. Check the end of this article for some of our favorites, and feel free to comment with any good ones we've missed!

How Does Security Work?

Data security is all about protecting data from misuse, accidental or otherwise. It begins with identification, authentication, and authorization. Identification consists of a username of some kind, frequently an email address. Authentication is the process of having the user prove that their identification is legitimate — that it is actually theirs. Authorization is the process of determining what the authenticated user can do within the system. All three of these are generally referred to, collectively, as access control.

Once you've determined what a given user is and is not allowed to do within the system, other aspects of data security kick in. Now you need to enforce the limitations the user is under. Administrative sections should only be accessible to those who can affect their contents. Data should only be editable by those with permissions to change it. And so on. Any attempts to access features not available to the current user should be rejected, and possibly logged for later review.

Which brings up another aspect of data security — activity logging. While it is generally overkill to track who visited which pages when, it's often a good idea to keep an eye on who made which changes when. It may make sense to track not only what was changed, but also how it was changed, so that changes can be reversed if needed. These kinds of logs provide, at the least, proof of whether a given user did or did not perform a given action, which can be of great use in dispute resolution. They take up extra space, but that space is often worth it.

Another aspect of data security involves revoking access. This takes two basic forms: revoking rights to a user, and revoking rights to a client. The former involves changing the portions of the system the user has access to when authenticated. The latter involves logging them out of their browser (or whatever they're connected with) after a certain amount of inactivity in order to prevent misuse by others using the same device later. The longer an authentication is valid for, the less it can be trusted.

The biggest piece of data security, though, and the one most people likely think of first when it's brought up, is encryption and hashing. Anything that needs to be retrieved in its original form should be encrypted. Anything that needs to be checked against an input to ensure they match should be hashed. A hash cannot be reversed, and it is very difficult to come up with a plaintext which will generate the same hash as any other (assuming the hash itself is still considered secure, of course). In short, passwords should be hashed, and most other secure data should be encrypted, though of course actual use cases may vary somewhat.

What About Nanobox?

All of the above is general security guidance. It may be good, but what in there is specifically about doing these things with Nanobox? Because that's the point of this article, right? How to exercise data security with Nanobox?

OK, you're right, none of that deals with Nanobox in particular. Truth is, most of the security concerns you'll have, and need to address, will be in your app itself. Nanobox does make some aspects of doing web applications securely a bit simpler, though. So let me jump right in on those.

Cryptographic Libraries

First, you should never "roll your own" crypto. There are several cryptographic libraries already out there for every platform, programming language, and even framework you can think of. Use them! They've gone through a whole lot more testing, bug fixing, and sheer eyeballs-on-code than your custom crypto implementation has, or probably ever will. It's always a good plan to take advantage of this feature.

Nanobox makes this simpler by not only actively supporting automated dependency management, but also by ensuring it's quick and simple to install the binary packages those libraries rely on to do the actual heavy lifting — because nearly all of them rely on one of two or three crypto packages at the system level. Generally speaking, that's OpenSSL or gnuTLS, though others do exist.

Development / Production Parity

Because your development environment and production environment are always using the same software, at the same versions, and with the same configurations, you know that your code will behave the same in both locations. This helps improve security because you can check for security vulnerabilities — and fix them — long before your code ever leaves your local system. Often, that even means your version control system. (You are using version control, right?) And for those asking, yes, by "same versions" we actually mean "same exact binaries" — your entire app's ecosystem is transferred to your production server in the deploy process, so every single file, from configuration to binary, is exactly identical in both locations.

Database Configuration

With databases, it's considered best practice (especially for security purposes) to use the bare minimum set of permissions your app needs to do the database operations it does, and never use the root account. Nanobox enforces this out of the box by automatically setting up the database and user with as few permissions as possible, while still allowing apps to perform their standard functions. If desired, you can create additional users with even fewer permissions — for example, you could use one for database migrations, and a separate one for normal operation. In all of these cases, Nanobox keeps things secure by presenting the credentials for each account using environment variables, whose values may change from one deploy to the next, forcing the actual values to be kept out of your code — another security best practice.

HTTPS

Securing all the data-at-rest in your app is important, but so is securing all the data in transit. Security best practice is to encrypt every single communication between your app and the outside world. That means enabling HTTPS.

Nanobox implements this in a different manner than you may be used to, but it's actually really smooth once you get to know it. All inbound connections to your app are received by a load balancer component, called Portal, which handles forwarding requests on to your web component(s) as well as "SSL Termination". That means your connection is encrypted until it reaches the load balancer, which contains your certificate and private key, and then forwarded on to the rest of your app on the private internal-only network, where it can safely be unencrypted. So you won't configure your app or any web servers you may have in front of it to handle anything related to SSL.

But you can still check whether users are attempting to connect over unsecured HTTP, and redirect them to your HTTPS site instead. The headers X-Forwarded-For and X-Forwarded-Proto are automatically set by the load balancer to indicate where the request came from, and which protocol it was using, respectively. So simply check whether X-Forwarded-Proto is https, and redirect if not. Pretty simple stuff!

As to actually enabling HTTPS, you can do that in your app's dashboard. It's a multi-step process, including providing the information to include in the certificate (domain name and contact details), creating the signing request, generating the certificate (you can create a self-signed cert, import a third-party cert, or use the built-in support for LetsEncrypt, which renews automatically), and then activating it for your app. You can have multiple certificates active at any one time, and the load balancer will respond with whichever is correct for the domain being requested.

Note: As of this writing, support for HTTPS in development is not currently available. A solution for this is being explored, but there is currently no ETA.

Cloud Considerations

Many security experts worry about the use of cloud providers, given the default settings for cloud systems are somewhat less than optimized for security. Nanobox addresses this as well, by automatically configuring your cloud servers with the best security practices currently known:

  • Ports are locked down, with the exception of the bare minimum needed to operate, and nonstandard ports are used wherever appropriate to help obfuscate what's in use and/or available. You can open additional ports in your boxfile.yml, but these are connected directly to your load balancer.

  • Only the Nanobox management service and your load balancer are accessible from outside your app — even your own access to internal components is done through connections opened by the management service. This means your databases and other internal services are inaccessible to the outside world.

  • Each internal component is separated into its own, individual container. This improves security by ensuring that compromise of one piece of your app isn't compromising your entire app. It also improves ease of deployment, as you can simply replace entire components as needed.

  • All app elements run with minimal permissions. Even the management service runs unprivileged, though it does have access to sudo in the rare cases it actually needs more.

Infrastructure

Because it relies on the boxfile.yml to define components of your app, Nanobox also enforces security best practices for your infrastructure. Your app's components are all clearly defined alongside your code, so your infrastructure can be easily recreated in a short period for any auditing tasks that need to be performed, and there is no reliance on sysadmins taking adequate notes about how things are configured, and where.

Additionally, logging is centralized to your dashboard, from all components. You can easily trace the goings-on within your app without having to first log in to each component and find them. Indeed, you shouldn't need to connect to any of your components remotely at all, which improves security by minimizing manual tasks that need to not only be remembered before performing them, but also documented to track whether they've been done.

Since your various components' containers are designed to be disposable, it also means your entire infrastructure can be updated to the latest, most secure code simply by rebuilding all of your components from the dashboard. The new containers will be created with the latest software installed as a base, rather than the result of an upgrade path, which keeps the containers clean and secure by avoiding accidental inclusion of outdated configurations and/or binaries.

Additional References

Earlier we promised to include a list of our favorite security guidance articles, books, and so forth. Well, time to deliver:

  • The Open Web Application Security Project (OWASP) - Pretty much the definitive source for web application security guidelines. Anything you need to know is almost certainly on their site somewhere. As a non-profit (fully tax-exempt under US laws), they aren't motivated to tout any particular tool (unlike myself), so their advice is much more sound than what you're likely to find in many other places. Though I'll note that most companies that tout security features of their products do follow the guidelines and recommendations of the OWASP, often even going beyond the minimum requirements.

  • The Basics of Web Application Security - an Evolving Publication that covers the most common issues and mistakes faced by web developers when dealing with security.

  • Mozilla Developer Network - Learn to Develop: Website Security - A solid overview of security vulnerabilities common on the web, how they work, and how to protect against them.

  • Web Developer Security Checklist - This list doesn't go into much detail on why to do each of the things it recommends, but it's still a good reference to ensure you've covered all the major issues. Also check out the list of similar checklists found in that repo (and the ones on the OWASP site, above!) for additional guidance.

And remember, please include any additional suggestions you may have in the comments, below!

The Big Picture

So now you know how Nanobox handles data security. You've also gotten a quick reminder of the things you still need to do in your own app to make sure everything stays secure. We even included a list of external sources to consult for more information on how to do those things properly. This is all great, but how does it fit in with the other aspects of data safety?

In this case, data security applies to the other major aspects about equally. It's always best to ensure any data you're working with is stored in its most secure form — encrypted data in your live system should still be encrypted in your backups, and the same goes for hashed data. You may even wish to run an additional encryption pass on the data, just to be safe. Also, you'll want to ensure it can't be readily accessed by unauthorized users (which may seem obvious, but is still often overlooked). More detail on how to actually implement these other aspects of data safety are the topics of different articles, but those are the biggest aspects to keep in mind when dealing with security.

That's it for this article. Hopefully it was helpful to your Nanobox experience!

Posted in Nanobox, security, data safety

Daniel Hunsaker

Daniel Hunsaker

Author, Father, Programmer, Nut. Dan contributes to so many projects he sometimes gets them mixed up. He'll happily help you out on the Nanobox Slack server when the staff are offline.

@sendoshin Idaho, USA
Read More