Mongo DB in Cryptogram

OrmontUS · July 16, 2019, 2:22pm

I have religiously read a free data security oriented monthly emailed newsletter called “Cyptogram” by Bruce Schneier, a renowned security technologist. (To subscribe, or to read back issues: https://www.schneier.com/crypto-gram/).

The following may be well known here (at least to the technical propeller heads, but I think its appearance in this very broadly read newsletter is a first):

Jeff

MongoDB Offers Field Level Encryption
[2019.06.26] MongoDB now has the ability to encrypt data by field:
MongoDB calls the new feature Field Level Encryption. It works kind of like end-to-end encrypted messaging, which scrambles data as it moves across the internet, revealing it only to the sender and the recipient. In such a “client-side” encryption scheme, databases utilizing Field Level Encryption will not only require a system login, but will additionally require specific keys to process and decrypt specific chunks of data locally on a user’s device as needed. That means MongoDB itself and cloud providers won’t be able to access customer data, and a database’s administrators or remote managers don’t need to have access to everything either.
For regular users, not much will be visibly different. If their credentials are stolen and they aren’t using multifactor authentication, an attacker will still be able to access everything the victim could. But the new feature is meant to eliminate single points of failure. With Field Level Encryption in place, a hacker who steals an administrative username and password, or finds a software vulnerability that gives them system access, still won’t be able to use these holes to access readable data.

nilvest · July 17, 2019, 10:08am

Hi Jeff,

This looks interesting… I will follow up on the link…

in the meanwhile, any idea if this applies to Atlas or not?
Also how does it compare to other db providers - both relational and non-relational?

Just trying to understand real significance of this capability…

(e.g. sometimes back when MDB was new on the board, there was a lot of dialogue about how their ability to support ACID was a key differentiator… I can see encryption at field level could be in that category, both for financial as well as transaction oriented data… therefore I am more curious)

thanks
nilvest

shikotus · July 17, 2019, 1:08pm

Read up on this briefly. I think this is a pretty significant new security features.

To clarify, MongoDB has supported general-purpose encryption at rest since version 3.2. This is what most modern databases support - your data is stored encrypted on the server, using encryption keys that are also stored on the server (separately from the data and with additional protection). Still because the keys are ultimately also available on the server, they are subject to theft just like the data is. So with this kind of general purpose encryption, life is made more difficult for a would be hacker because they not only have to steal the data, but the encryption keys as well.

What’s important with this new Field Level Encryption feature is not that its field-level (i.e. you can choose to encrypt some fields and not others), but that it uses client side encryption keys. What this means is that they keys necessary to decrypt the data don’t exist on the server at all, but only on the client. This means that the server NEVER has decrypted data - not even temporarily, not in memory, not during transmission. This makes things far more difficult for hackers, because in addition to compromising the server and stealing the data, they’d have to also compromise each client computer to steal the decryption keys, and given that there are likely many clients, each with their own key, and each client could be in completely different physical environment than the server, the hacking job becomes that much harder (potentially completely impractical).

In short, this is an advanced, high end security technique, currently not available in most general purpose databases, and typically seen more in the space of specialized password management services like LastPast. It is a technique developed specifically for the modern set of cyber challenges and perfectly tailored for use in the cloud/multi-user environment where one central database may service sensitive data of many users.

I think this is a clear advantage for the MDB product.

tamhas · July 17, 2019, 4:46pm

I see the pluses and minuses of this Mongo approach a bit differently. While it is true that having the encryption keys on the server means they are on the same system as the DB, this is also the system which one is going to work hardest to make bulletproof, especially compared with the client machines. Also, while there are some specialized cases where there might be genuine client specific data in the database where stealing the client side keys would only expose that database, in the more typical case all users accessing the same document type will have to have the same keys since otherwise they could not report on and process the documents generally. In that case, stealing the key from any one client will open up the whole document type, unless there are authorization rules limiting what one user can see.

shikotus · July 17, 2019, 7:04pm

First off, let me just say that there is no such thing as bulletproof systems - there are only more valuable targets and less valuable targets.

But in terms of how this particular capability is used depends on a use case, and how a particular software solution is engineered to make use of it. For instance, you can imagine a use case where a centralized MongoDB database is used to store some kind of personally sensitive information, such as people’s social security numbers or credit card information. With this new capability, a solution can be designed where each individual client (as in each specific individual making use of this system) can in fact have their own personal encryption key that is used to protect their information. So while your SSN and my SSN might be stored next to each other in this centralized database, each is encrypted with a different key, one residing on your client computer, and one on mine. And so if yours is stolen, it can only be used to decrypt your SSN, but is useless for mine, even though those records are of the same document type and sit “next to each other” in the database. This is exactly what LastPass does, which is a service that lets you store all of your various passwords in the cloud and retrieve them whenever you need. Although the LastPass database sits somewhere in a data center and stores both my passwords and your passwords, they are in fact encrypted with different keys, using a unique password as a hash, which only the individual user knows, and which is never passed to the server - the hash is created on the client computer, used to encrypt data, which is then passed to the server encrypted, where its stored encrypted. Retrieval is the reverse - encrypted data is returned by the server, and decrypted only on the client machine.

Of course, not every situation lends itself to this kind of use. As you hint, there are use cases where data must exist unencrypted on the server, at least temporarily, because the server needs to not merely return this data, but actually interpret it and use in some kind of computation or business process. In that case, this kind of encryption capability may not be useful, and other security mechanisms are more appropriate (like traditional encryption). But having this capability available means that developers now have more options for secure solution design, and all things being equal, that is always a good thing.

tamhas · July 17, 2019, 8:50pm

Your description makes it sound like there is this use case and that use case and that these are equally likely. My background suggests that the person-specific data in a common database is the odd ball … not non-existent, certainly, but not common in a business context. For the bulk of business applications there is a need to report and search on data across the accident of whom may have created the data. To be sure, there may be an authorization system in place which limits what data any given user has access to or can modify, but reinforcing that by unique encryption will be a rare use case in a business environment. In particular, note that an authorization system can more or less instantly transfer access or modification privileges from one user to another, e.g., in the event of termination of employment, but if the relevant data is encrypted with a user specific key, then the data all needs to be decrypted and reencrypted with a new key unless the new user has no encrypted data of his/her own. Similar complexities exist if a single role is later divided between two or more persons. Authorization works really well for this. Encryption is clumsy.

ajm101 · July 17, 2019, 11:54pm

“In short, this is an advanced, high end security technique, currently not available in most general purpose databases, and typically seen more in the space of specialized password management services like LastPast. It is a technique developed specifically for the modern set of cyber challenges and perfectly tailored for use in the cloud/multi-user environment where one central database may service sensitive data of many users.”

It is not particularly advanced. It is almost certainly to get HIPAA and PCI compliance for hosted databases and not push complexity to users.

None of the security algorithms are particularly advanced. I know this because I read their source code: https://github.com/mongodb/mongo/tree/v4.2/src/mongo/crypto. If you’d like to read the standard doc for the one they rolled their own version of, it is here: https://tools.ietf.org/html/draft-mcgrew-aead-aes-cbc-hmac-s… (specifically, AEAD_AES_256_CBC_HMAC_SHA_512, or “authenticated encryption with associated data” done via the “AES-256 block cipher” encryption algorithm in “cipher block chaining” mode with “hash-based message authentication code” done with SHA-512 hash algorithm).

Anyway, it doesn’t differentiate Mongo, it enables regulated and less security savvy customers (read: healthcare) to use Atlas.

CMF_muji · July 18, 2019, 2:28am

I think this is a clear advantage for the MDB product.

Shikotus, you are spot on in both your posts upthread. This is an important new feature in MongoDB 4.2 (the core “open-source” database - it’s not an add-on). Atlas will of course also have it because of that, plus [spoiler] all other MongoDB client applications will support it.

Let me chime in with some developer/DBA focused insights in to what this does. I and a few others already broke down what this feature was all about on $:MDB premium board – I’ll recap it here instead of linking to it since that whole thread became completely derailed.

Here was a good primer and background that Starrob originally posted on MDB: https://duo.com/decipher/mongodb-moves-encryption-out-of-the… One tidbit from it is that it took MDB 2 years to get implemented it properly.

Field level encryption allows a developer to encrypt individual fields in a document as desired. Within a document DB, a single field may be an entire nested object with multiple key/values, like Credit Card details (number, code, expry date) on a purchase record, or Payroll details and SSN on an employee record. Those might be fields you only want certain roles or people in your organization to see, so you want to protect them and have them hidden from other users that can see that document.

Developers could ALWAYS encrypt a field’s value, if they coded it manually within their application to encrypt the value before storing it, and decrypt it when they retrieved it. However, only their application could utilize the field. No other MongoDB tools could be utilized. So it was possible to do, but not very maintainable.

MDB now provides this as a feature, and this is entirely handled by the client-side driver as mentioned. That gives many benefits.

You can choose WHO can see a protected field, and give them the encryption key for that field. (NOT a key per user… just ONE KEY for that field). A user must possess that key in order to see that field’s value. So HR managers would have the key to see those protected details of an employee record (eg payroll and SSN), while other staff could not.
Being in the client-side driver assures that you have end-to-end encryption. The writing client encrypts it, and the reading client decrypts it as needed. The data cannot be understood by the server at all, as it is encrypted in transmission and in the MongoDB engine and on the hard drive (at rest) and in the logs.
What this means, in the new world of MDB Atlas managed hosting, is that your cloud provider cannot see the value, nor your company’s sysadmin. ONLY those with the encryption key can write or read that field. This goes a LONG way towards making companies more confident in using Atlas for secure data, knowing that the database provider (MDB) cannot read those fields, nor can the underlying cloud provider (AWS, Azure, Google).
Unlike doing encryption manually in app code, if you provide the key to the client driver of the ecosystem of MongoDB client tools, those tools can also utilize the encrypted value. So you can access that field within MongoDB Charts, MongoDB Stitch, and any other MongoDB tool if you provide the client driver the key.
Let’s also be clear that this is a battle front - MDB added this due to its ongoing battle with Amazon, as AWS Dynamo DB had already offered this capability since Feb 2018. See: https://aws.amazon.com/about-aws/whats-new/2018/02/amazon-dy… and https://docs.aws.amazon.com/dynamodb-encryption-client/lates… for the client-side encrypt. So MDB just eliminated a possible reason that a user might go with AWS DynamoDB over MDB.
There is no server-side penalty to using it because it is the client handling it. From client perspective, if the field is enormous there may be an impact to performance. But that isn’t likely the case, it’ll more be like CC and Social Security # fields, so encrypting fields on a document by document basis should be very minor impact (in the milliseconds).
A user still needs their authorization credentials in the MDB client. Additionally, the encryption key is needed for them to be able to view those protected fields. This mechanism is akin to 2FA (two factor auth) security in that a user needs 2 factors (credentials plus field’s key) to use that field.
A company needs an external key manager that the MongoDB client will retrieve the key from. So there is a little bit of complexity in getting it set up, and hooking the client drivers up to the key manager. An example would be - a key is generated and used to encrypt a protected field. 1000 users need access to that field from a desktop application on their system, so each client driver would need be set up to point to the key manager, so as to retrieve that particular key as needed.
Any number of fields can be protected and each can have different keys or share a common key… but I assume the overall number of keys any company will use will likely be low just for key mgmt reasons. However, while this is field-based, there is nothing to stop developers from just encrypting EVERY field with a single key to assure their entire database is protected in a cloud environment (provided the documents are small enough so performance isn’t impacted).
Contrary to some worries, this is NOT less secure (see note about 2FA-like quality to this), there is NO such thing as a user-specific key in this feature (that is not at all how encryption keys work), and, as mentioned above, the performance hit to the client should be minimal (size of encrypted field likely to be small). For the sake of this not becoming a 60 post thread again like on $:MDB, please note tamhas’s concerns were already rebuffed repeatedly by myself, Tinker, rtichy, BruceWayneJr and Starrob. It’s not worth debating his concerns further.

-muji
long MDB

SaulR80683 · July 18, 2019, 8:33am

This is an important new feature in MongoDB…

Thanks Muji, That was a very impressive and comprehensive post, which helped me, as a non-techie, to slightly begin to understand the significance of their new product. I do appreciate it.
Best,
Saul

tamhas · July 18, 2019, 10:11am

For the sake of this not becoming a 60 post thread again like on $:MDB, please note tamhas’s concerns were already rebuffed repeatedly by myself, Tinker, rtichy, BruceWayneJr and Starrob. It’s not worth debating his concerns further.

I object to the attempt to discredit my position by reference to a thread on a paid board which many readers may not have access to. Not to mention that I don’t feel my position was discredited at all. Rather, most of the counter arguments were about something other than what I was saying.

While it is true that having the key at the client means that no security breech at the server can access the encrypted data … although it can access everything else … it is also true that the server is more likely to be heavily armored while the client is just some PC on someone’s desk somewhere and thus less likely to be heavily secured.

Some of the advantages being touted for this system are really issues of authorization, which one does not generally managed via encryption.

It is a very good feature for them to have added and the way they have added it has some pluses … along with some cautions … few things are unalloyed good. Among the good is distributing the computing load of doing the encryption and decryption so that there is no overhead on the server. But, as has been noted, there is nothing break through about the encryption itself.

shikotus · July 18, 2019, 11:01am

Tamhas,

I agree that authorization and encryption are not replacements for one another. I don’t think anyone suggested that. However, to the degree that both are mechanisms for protecting private information, there is an area of overlap. It can be summarized in simplest terms as follows:

Problem: I want to store private data on a server, and I want no one but me to be able to read it.
Solution 1: Use authorization. Store data unencrypted, but ask for a personal username and password, and only give the data to clients in possession of the correct credentials.
Solution 2: Use encryption. Store data on the server encrypted. Give it to anyone who asks (no authorization). But only someone who has the right keys on the client system can actually decrypt it and read it.
Solution 3: Use both mechanisms, just to be on the safe side. This would be the most likely, and the most secure design. You talk as if people must choose between one or the other, but they do not.

You can debate the virtues of these 3 solutions endlessly, that’s not the point. The point is that without the feature Mongo just introduced, you’d only have Option 1 available (unless you developed your own equivalent of it, obviously).

Your point about “heavily armored” servers is really quite weak. Any hacker worth their salt will tell you this. If it’s an attractive enough target, no amount of “armor” will be enough. This is particularly true because most hackers don’t even bother breaking through technical defenses, but use social engineering techniques, against which technology does not protect.
Regarding your point about use cases of personal data protection being rare in a corporate enterprise environment. I totally know what you are saying - as long as you are thinking about traditional corporations - banks, hospitals, insurance companies, etc. I work in one myself. But think about Facebook. Think about Google. Think about Twitter. LinkedIn. Those are giant corporations, and ALL THEY DO is manage personal data. That is a very substantial set of use cases. Would you sleep better knowing that all your personal search history on Google servers was stored encrypted using a personal key that was not stored on the servers and that only you had? I would.

svmorningstar · July 18, 2019, 11:49am

Holding my Mongoose. THX for the informative synopsis.

tamhas · July 18, 2019, 11:54am

My general skepticism about storing personal information, i.e., information specifically about me that no one else has access to, is that it is just not a use case which occurs very often in business. If it is personal information, why is it in the central database? Given that no one can make use of it but me, why is it in the central database. One can think of possible exceptions, like a centralized password manager service, but they are unusual. Any data which is encrypted to a key specific to me is data that cannot be accessed anyone else, i.e., it cannot be used by the business. A unique key per document type I get.

If it were not possible to heavily armor servers, we would be hearing every day about breeches at brokerages and banks. The reason hackers are as successful as they are is not because they are all powerful, but rather that lots of companies are lazy or uninformed about security.

What point would there be for FB to accumulate all that information if I was the only one who could read it?

LifeOfDreamer · July 18, 2019, 12:29pm

Hi all,
Exhilarating stuff for sure, but maybe we move this to the “nerds talk database nuances” board please?

if for some strange reason that board doesn’t exist, may I suggest the “Start a New Board” link?

hope this helps,
Dreamer

captainccs · July 18, 2019, 2:48pm

This thread reminds me of Thomas Watson Sr. (IBM’s founder) who said that there might be a need for five computers. His son bet $5 billion on System 360.* And it also reminds me of Ma Bell that didn’t buy into cellphones because there might be a market for 50 thousand of them.

In what some consider the biggest business gamble of all time, International Business Machines invested $5 billion ($35 billion in today’s dollars) in a family of six mutually compatible computers and 40 peripherals that could work together and be expanded in multiple combinations.

https://www.wired.com/2011/04/0407ibm-launches-system-360-co…

The importance of the story for investors is not the geek’s debate but how it fends off AWS and other wannabes in the document database business.

Denny Schlesinger

I signed the first commercial S/360 in Venezuela. The sales rep who had the account before me did the selling!