When an executive uploads a dataset to a cloud analytics platform, that data travels — over the internet, to a server owned and operated by the vendor, processed by systems the vendor controls, and stored in logs the vendor maintains. The privacy policy says the vendor will not misuse it. The security documentation says the data is encrypted in transit. Neither of these facts changes where the data went.
For most data, this is an acceptable trade-off. For financial records, patient information, government data, proprietary research, and confidential business intelligence, it is not. And the line between acceptable and not is not always as clear as organisations assume until something goes wrong.
The assumption built into every cloud analytics tool
Cloud analytics platforms are built on a foundational assumption: that your data will travel to their infrastructure to be processed. This is not a flaw in their design. It is a deliberate architectural choice that allows them to offer powerful shared infrastructure at relatively low cost.
The problem is that this assumption is baked so deeply into the architecture that it cannot be opted out of. You cannot use a cloud analytics tool without your data leaving your control. The two things are inseparable.
The most secure data pathway is the one that never exists. Data that does not travel cannot be intercepted.
What data residency actually means
Data residency refers to the physical and legal jurisdiction in which data is stored and processed. It matters because data processed on servers in a given country may be subject to the laws of that country — including laws that require disclosure to government authorities without your knowledge or consent.
Most cloud analytics vendors operate servers in multiple regions and offer data residency options — the ability to specify that your data is stored and processed within a particular geography. This addresses one dimension of the problem. It does not address the fact that the data still left your infrastructure and is still processed on systems you do not control.
What cloud data residency does not guarantee
- That your data is processed only on servers you can inspect
- That subprocessors in the vendor supply chain are equally restricted
- That the vendor employees with administrative access are subject to your vetting procedures
- That your data is not used to train or improve vendor models
- That a breach of the vendor infrastructure does not expose your data
The regulatory dimension
For organisations operating under HIPAA, GDPR, financial services regulations, or government security frameworks, the question of where data is processed is not philosophical. It is legal.
HIPAA requires covered entities and their business associates to implement specific safeguards for protected health information. Every cloud analytics vendor that processes PHI must sign a Business Associate Agreement — and that agreement must be carefully reviewed to understand what it actually permits.
GDPR requires that personal data be processed lawfully, with appropriate technical and organisational measures. Transfers outside the EEA require specific legal mechanisms. The analysis of what constitutes a "transfer" when data is uploaded to a cloud analytics platform has become an active area of regulatory attention.
The architecture that eliminates the problem
The alternative to sending data to an analytics platform is deploying the analytics platform to the data. Instead of the data travelling to the intelligence, the intelligence travels to the data.
This is not a new idea. It is the architecture of on-premises software — tools that run on your own hardware, inside your own network, under your own governance. What is new is that this architecture can now deliver the same capability as cloud analytics platforms, including natural language querying, automated data preparation, and certified analytical results.
The analytical capability no longer has to be traded for security. Organisations can have both — if they choose a platform built on the right architectural assumption.
What to demand from any analytics platform
Before deploying any analytics platform on sensitive data, every organisation should be able to answer four questions. Where does the data go when a query is run? Who has administrative access to the systems that process it? What happens to the data after the query completes? And what is the complete chain of subprocessors that may have access to it?
If any of those answers are unclear, or if the answer to the first question is "it travels to external servers," that is worth examining carefully — before the data travels, not after.