Backup and Recovery
Data Management
Security

How to Capture Forensic-Quality Copies of SaaS Data

Ariel Berkman
|
Chief Product Security Officer, Own Company
No items found.

Data acquisition for forensic purposes encompasses all the procedures involved in gathering digital evidence, including cloning and extracting evidence from any electronic source. It involves producing a forensic copy from storage media, mobile devices, and other technologies including the cloud.

As more data is being stored in SaaS environments, including mission-critical business data, it’s important to take a look at the acquisition process, and see if it checks the ‘forensically-sound’ box.

Several years ago, I emphasized the importance of distinguishing between sound forensic procedures versus the unrealistic and impractical paradigm of “preserve everything but change nothing.” Then, in 2013, Own founder Ariel Berkman demonstrated that having access to physical storage media does not guarantee a complete and accurate copy since certain service sectors are typically in-accessible, and therefore not acquired.

These distinctions are particularly important for SaaS data sources for two reasons: first, physical cloud resources are pooled and allocated as needed, rendering physical duplication ineffective (are you really that surprised?) and making it necessary to acquire what is exposed via APIs. Second, SaaS data sources are dynamic, and are generally changing during the acquisition process. This situation is comparable to forensic acquisition of Apple iOS devices when access is protected, making it necessary to copy data via services (e.g., Backup, Lockdown). Also, when an Apple device is powered on, it’s in flux, and the copy is not of a static data source. These real world practicalities must be accounted for when acquiring a forensic-quality copy.

To show how principles of forensic preservation apply to SaaS data, this blog post presents solutions developed by Own to acquire data from Salesforce, ServiceNow, Microsoft Dynamics 365 and Power Platform, and soon Workday.

Comprehensive copies

Despite SaaS data’s dynamic nature and lack of physical storage, you can still capture a complete and accurate copy of data and metadata accessible via an application programming interface (API). A SaaS API has the lowest available level of abstraction, making it best suited for forensic purposes. Typically, APIs provided by developers of SaaS platforms deliver the most complete and accurate information available, and are updated as new features are added. The APIs of SaaS platforms typically expose all relevant and useful data, including content, metadata, and logs. Using APIs allows you to reproduce results and conduct rigorous tool testing based on well-defined API semantics.  

When creating a forensic-quality copy of SaaS data, the objective is simple: Acquire everything that is available via APIs. Since you generally don’t know in advance what specific data will be needed in the future, it’s usually insufficient to only acquire certain commonly useful data, unless the user chooses to exclude certain parts.

Following this approach, Own executes a ‘describe all’ API call to see the entire SaaS schema, including objects and metadata. For each object in the schema that is queryable, Own queries all elements, unless the user exercises the option to exclude them. It’s  important to know that users can adapt SaaS environments to their needs, such as custom code and nested data. Unless these user customizations are copied along with SaaS data, important context and content will be missing in the preserved data. So, to maintain comprehensiveness and cohesiveness of acquired data, Own also copies user customization of SaaS environments.

Acquiring SaaS data via APIs isn’t foolproof, and can come with several challenges. SaaS environments evolve over time, with integration of different products and the addition of new features. The accretion of elements in SaaS environments is not always orderly or documented fully, making it difficult to find all of the nooks and crannies. Therefore, developers must be in the know about updates to the API that expose new features and data sets in SaaS platforms. Implementing a new version of an API requires testing and evaluation to ensure that the acquisition process remains reliable, so it can take time for all of the new data sets to be added to a forensic-quality copy. If a certain object is not acquired using the standard approach, we engage with SaaS providers and do research to overcome the limitations (e.g. by crafting a special query to address certain limits with that particular object).

In cases where we see that a certain object is not easily queryable and doesn't seem to contain relevant or useful data, we will confirm that the customer requires this data before trying to find a work-around solution.

Adding to the complexity, different APIs to the same SaaS environment can acquire overlapping data in different ways. For instance, Salesforce provides a Data API and Metadata API, and certain information is acquired using both APIs but in different formats. For instance, some permissions appear in the Data API even though that is a type of metadata. The metadata API for Salesforce natively exports in XML format, whereas the data API provides 'table' data in the form of JSON or CSVs directly.

Keep in mind that certain features of SaaS platforms are only accessible if an additional license is purchased, which can impact what we can capture from a customer's environment. For example, if a customer buys certain Salesforce Shield modules, additional log data will be accessible and we will acquire it over the API.

Minimize changes

Given the transitional nature of SaaS environments, changes can occur while data is being copied. If a part of the data that has already been copied changes, resulting in a related change in a part of the data source that has not yet been copied, it can cause a transactional inconsistency within the copied data. Put simply, one element within the copy contains the old data while another related element contains the new data.

So how can you minimize inconsistencies? One way is to speed up the acquisition process. The faster that data can be acquired, the lower the chances of major changes occurring during the process. There are a few ways to accelerate the copy process, including multiple parallel API connections and copying data to a cloud-based solution. Copying cloud-to-cloud is more efficient and reliable than attempting to transfer cloud data to a client, due to latency, throttling, and timeouts. Additionally, copying within the same cloud environment (e.g., Azure-Azure, AWS-AWS) eliminates any potential constraints spanning from LAN or ISP infrastructure. Own Recover uses all available options to accelerate the acquisition process.

Maintain transparency

From a forensic perspective, documentation is essential to demonstrate authenticity, verify integrity, and avoid silent errors.

To illustrate the authenticity and comprehensiveness of acquired SaaS data, it’s important to maintain an audit log documenting where the evidence originated and how it was handled. At Own, we store a detailed audit log alongside the acquired data and present certain entries in the user interface.

For a clear, well-rounded perspective on SaaS data, it’s also important to document characteristics of the data source itself, including its structure. For instance, Own captures the schema in a JSON file to document the comprehensiveness of the copy. In addition, it’s  essential to preserve the configuration of the copy process, such as any items excluded by the user. Together, documentation of the full schema and the excluded items help verify that the copy contains all of the SaaS data that it should.

Documentation must also record and report when a specific object cannot be copied as part of the process, which Own does by alerting users with a warning when a given item was skipped or failed.

Integrity verification

When the copy process is complete, forensic best practice is to compute cryptographic hashes of all acquired data. Hash values are used to ensure that the stored data is the same as the data obtained from the original data source, and to confirm that the copy has not changed since it was created.

Own computes hash values of copied data segments to verify their integrity. In addition, an overall SHA256 hash of combined segment hashes can be computed and stored in a public blockchain using the Blockchain Verify feature.

Final thoughts

In order to capture a forensic-quality copy of SaaS data that provides an accurate reflection of the original source, it’s necessary to preserve as much as feasible, minimize changes, maintain transparency, compute hashes, and continuously improve.

With our Recover solution, Own applies these best practices to capture forensic-quality copies of our customers’ valuable SaaS data, helping ensure the integrity and completeness of our customers’ backup files. The results align with recent updates to the SEC 17a-4 record-keeping rules that eliminate the Write Once Read Many (WORM) requirement by adding an audit-trail alternative, and that emphasize the importance of verification and backup/redundancy.

To learn more about Own Recover, check out our website or request a demo below.

Get started

Submit your details and we will contact you shortly to schedule a custom 25-minute demo.

Book a demo
Get started

Submit your details and we will contact you shortly to schedule a custom 25-minute demo.

Book a demo
Ariel Berkman
Chief Product Security Officer, Own Company

Backup and Recovery
Backup and Recovery
Backup and Recovery
Data Management
Security

Get started

Share your details and we’ll contact you shortly to schedule a custom 25-minute demo.

Schedule a Demo