MPP is not the whole story. Data storages with cost-efficiency and performance is a starting point.
The real challenge is how to design them in order to minimize data isolation and duplication, to maintain data consistency, and to meet downstream users¡¯ requirements. These are not only related to data storage but all other data engineering activities, but we would like to mention them here.
Basic strategy of data storage is to find the balance between two contradicting views. These are
SSOT (Single Source of Truth) and
MVOT (Multiple Versions of Truth). SSOT is to maintain consistency. It prevents incorrect data from being circulated by keeping the original data. MVOT is the opposite approach where data is stored in multiple locations so as to meet data requirement from each department. Simply put, data warehouse is SSOT, and data mart is MVOT.

The above image is an example where SSOT and MVOT are in good balance because one data warehouse is SSOT and multiple data marts are MVOT sourcing their data from data warehouse. However, in reality, you can easily find cases where the boundary between SSOT and MVOT collapses such as data marts with multi-sources other than data warehouse, and data warehouse taking data marts as source data. All data must be easily traceable from SSOT to MVOT and vice-versa. In order to do so, keeping a history of data transformation is important, and this is called
¡®data lineage¡¯.
This is where EDW (Enterprise Data Warehouse) and data lake comes in. Both are ultimate SSOTs, to put it simply. As the boundary between SSOT and MVOT gets blurred and many existing data warehouses fail to play SSOT role, the alternative was to build a new layer, the first destination of all data. The remaining data storages are considered to be MVOTs generated from one single SSOT layer. This is the basic concept of many big data platform including commercial cloud service provider like Amazon and Google. Refer to the high-level architecture of AWS data architecture.