A data warehouse is a single repository of organization wide data (at least theoretically), composed of multiple data marts each being data from a single business unit or division. This means a process of data collection from different sources and using procedures, processes and jobs to bring it into one repository. Once all this data is staged in a temporary location, it is transformed (processed) into a certain data model (dimensional, snowflake etc.). The reasoning behind the processing of data (mostly batch) at this stage is to be able to save the data in de-normalized structures which are better from a reporting as well as data management standpoint.
The process of designing and deploying one, typically follows a standard software development lifecycle with some unique aspects to it and involves business requirements definition, technical architecture design, creating a dimensional model translated into a physical table design and then the development of the jobs which can be considered its own sub project. Once the data warehouse is designed and tested it is ready for a production launch cycle.
Of late, there has been a paradigm shift in the thinking and hence, essentially the management of data especially driven by the volumes and types of data as well as the advanced data visualization requirements being placed on these types of systems. There is real time and massive number crunching and a whole new industry of big data that has come up as a response to this new set of demands. Please reach out to our advanced experts who can help you navigate these new waters successfully.
Installing Cloudera Manager
July 19, 2016
Modern Datawarehousing implementation techniques and strategies