This article is written by Tristan Dibbens, CTO at Data3.
Cyber security is an area of computing that has seen a huge growth in recent years, and for good reason. Systems become more complex, and more information becomes increasingly more important. So then does the desire and intent to leverage this by those bad actors that choose to operate outside of the usual ethical boundaries that most of us take for granted. Cloud computing has also made our systems far more accessible by those same bad actors.
At Data³ we are constantly talking about getting value, monetising, or creating information to use to improve your businesses. What we also talk about in our deliveries is the importance around preserving the security of client data.
Data security is a broad subject. Follow The Cyber Security Hub on LinkedIn and you will regularly see complex charts describing the different ways that a bad actor can infiltrate your organisation and take what they want without you even knowing about it.
Let’s face it, if your information is important enough, a serious bad actor will likely be able to get to it if truly motivated (or challenged).
Another meme I have seen is the “it’s 100% safe.. unplugged!”. Where the only truly safe place for your information is behind closed doors with no access to the internet. This is a situation with it’s own limitations.
Still, most organisations nowadays are considering cloud transformation to allow them to keep up and leverage the wealth of technical services available to the cloud practitioner.
This blog will discuss basic security around database or data delivery within the cloud.
There are several levels you can look at applying layers of security that can support a secure data strategy.
This is a big one. For example – Most retailers have no need to store credit card information and take on the burden of PCI compliance. Why would they, when there are third party services like Stripe, Adyen, Worldpay, all who provide decent systems to allow you to apply fraud measures or drill into customer service from their operational systems. Let the supplier earn their stripes by taking much of the responsibility away from you, you pay for the service after all.
Customer data falls into this category. Why take PII (Personally Identifiable Information) into your reporting system when all you need to do is pass around an anonymous identifier that only identifies an individual inside of the source operational system. Obviously, there are many use cases where there needs to be some kind of matching process to allow for an operational action to happen based on data driven insight. But this can be created on a needs must basis and in one or other of the operational systems. That’s not to say that you cannot define a key component within a data system as a core operational system, you can, but the risk and consequences must be considered carefully in the context of the broader business strategy.
Other types of data that this applies to could be:
1. Health data
2. Banking Data
3. PII
4. Intellectual property
Using useful data segments is still really important but care must be taken if you are stripping out PII, that the remaining data is not still PII, for example, geolocation coordinates can be household specific.
Virtual Private Clouds / Networks are key armaments as part of your perimeter defences. Used properly these can potentially make your date repository a fortress.
All too often however implementations are delivered with holes in them or accrue holes over time. Projects accumulate holes over time as developers add temporary access and do not remove, or new features are introduced, and the network access points are adapted.
Remember that by allowing users to access any feature, you incur a new risk per user.
Having a VPN to allow common access to users is only as strong as the password policy and culture surrounding the delivery, or the implementation itself. Unique user/credentials are a must, along with a strong culture within your business around how passwords are stored, used and shared.
Managed database services allow for firewall rules to be configured. These often used to come out of the box with an all access rule enabled. Ideally, this is turned off immediately. A good security policy is ‘allow only by exception’ and not by default. It is very common that developers leave these doors open to allow the freedom of development and support, but this can come at a serious cost to any business. Databases can be served publicly or privately. Serving a public database is a risk. A public database allows the opportunity for a bad actor to explore potential vulnerabilities in passwords directly, without having to explore behind the scenes.
One of the biggest risks around passwords are your users! Imagine, you have set your standard password policy for all your systems to be 30 mixed characters long, virtually uncrackable! … but your users store them in a document format on their desktops, or in their email, and or they send this very valuable document to themselves, so they have it at home. Any hacker monitoring your email accounts (and this does happen!!!) now has access to your closest and most protected information. Why would a hacker spend an inordinate amount of time trying to crack passwords when they can break into a weak point in communications and have them delivered directly to them.
• Never share passwords in Slack, over the email in raw form or in unprotected documents or Wikis.
• Never store secret keys or passwords in code directly, or in the code repositories themselves (Git etc)!
• Don’t store passwords or secret keys in databases without being encrypted, or hashed in situation. No one ought to be able to take a password from a database and use it directly!
I mentioned briefly about using an anonymous identifier in the data system. One way to do this is to generate a hash id. There are caveats to this, and care should be taken in deciding to use this approach.
Goals are as follows:
• Prevent derivation of sensitive values from experimental validation of sequential numbering in public URLs
• Link back to the source system uniquely
By passing a hashed identifier around in the public side of the system, a hacker is highly unlikely to be able to guess the next record in any sequence as they have been able to if using sequential identifiers.
Caveats.
1. If using in a database/data warehouse context, consideration must be given to database optimisation. Hashed columns used as keys or indexes do not work so well for table joins, databases nowadays still prefer a sequential number identifier. No reason that you cannot use both approaches given that the cost of storage nowadays is so cheap.
2. Some hash functions allow for a small likelihood for duplication of values over high volumes of data.
We at Data³ have unfortunately seen a few scenarios where a hacker has either succeeded or failed to access data in builds.
1 – A recent scenario where seemingly a set of client credentials was picked up by a hacker (hosted in South Africa), then used to explore the AWS S3 / Athena build for weaknesses.
This build had no PII or sensitive client data in it and so was deemed to be quite a low risk.
1. What happened:
a. Passwords were sent out to end users via email on the client side as part of the implementation
b. A hacker appears to have highjacked either an end users pc, or their email
c. We had configured a logging system linked to AWS CIS practice. Alerts were sent highlighting failed attempts to access core services
d. Upon receiving the alerts (Client was self-hosting), the client notified us
2. What did we do:
a. We changed all the user keys/passwords on the account and assessed the access policies
b. Checked the logs for identifiable information and to understand the scope of the breach
c. We raised the issue with the domain provider where the IP address of the hacker originated
d. Client had been informed specifically of the risks, so these were highlighted again, there was no VPN or common network to tie credentials into
e. We recommended that the client re-evaluate their password strategy
f. We recommended that a VPN or common network was introduced so that we could lock down the user access more explicitly
2 – A previous example where we experienced a brute force attack on a GCP MySQL database set to public access
This client did not have any sensitive PII data or anything noted to be a high risk.
1. What happened:
a. The client build slowed down considerably on a Virtual Machine hosting MySQL
b. The Logs had built up to the point that the system resources had become maxed out
c. An attacker had been attempting to brute force the “admin” password on the MySQL server on that machine, so many times that they filled the logs
2. What did we do:
a. Locked down the firewall. At this point in our journey, we defined an internal policy of ‘allow by exception’, meaning that all of our database firewalls would be set to deny all by default
b. Cleared out the logs allowing the machine to recover
c. Checked password policy and reset for security reasons
d. Luckily, we had initially changed the “admin” root credentials to something else as part of the implementation mitigating the brute force risk
As mentioned at the start of this article, Cyber Security is a broad topic.
We have summarised some of our knowledge and experience in this article, but there is much more to consider.
• Each project ought to be considered on a project by project basis.
• Specific risk assessments need to be carried out on data within each project assessing risk for data on multiple levels (PII / IP / Healthcare / PCI etc)
• Your business Data Strategy must consider your requirements and how risk is perceived and therefore how projects are implemented
• Do not get blasé about your data, there is always a risk of bad actors getting access to your data whatever the size of your business!!
• Look after your passwords, make a password culture work across EVERYONE in your business. Do not allow the storing of passwords in open document format or in accessible places
• When configuring a database, make sure you do not stick with the default credentials, or allow an open firewall policy!
• Use a VPN if possible, for added security. Weigh up the cost vs the cost of not using one