menu

Sunday, May 8, 2011

The Purpose of a Data Warehouse

A data warehouse is a repository for storing and analyzing numerical information. A data warehouse stores stable, verified data values. You might find it helpful to compare some of the most important differences between a data warehouse and a transaction database.
  • A transaction database helps people carry out activities, while a data warehouse helps people make plans. For example, a transaction database might show which seats are available on an airline flight so that a travel agent can book a new reservation. A data warehouse, on the other hand, might show the historical pattern of empty seats by flight so that an airline manager can decide whether to adjust flight schedules in the future.
  • A transaction database focuses on the details, while a data warehouse focuses on high level aggregates. For example, a parent purchasing the latest popular children’s book doesn’t care about inventory levels for the Juvenile Fiction product line, but a manager planning the rearranging of store shelving may be very interested in a general decline in sales of computer book titles (for subjects other than SQL Server 2005). The implication of this difference is that the core data in a warehouse are typically numeric values that can be summarized.
  • A transaction database is typically designed for a specific application, while a datawarehouse integrates data from different sources. For example, your order processingapplication—and its database—probably includes detailed discount information for each order, but nothing about manufacturing cost overruns. Conversely, your manufacturing application—and its database—probably includes detailed cost information, but nothing about sales discounts. By combining the two data sources in a data warehouse, you can calculate the actual profitability of product sales, possibly revealing that the fully discounted price is less than the actual cost to manufacture. But no worries: You can make up for it in volume.
  •  A transaction database is concerned with now; a data warehouse is concerned with activity over time. For example, in a simple bank account, each transaction—that is, each deposit or withdrawal—creates an instantaneous change in the account balance. The transaction system rarely maintains historical balances, and even transaction logs are usually archived after a month or two. In a data warehouse, you can store many years of transaction data (perhaps summarized), and you can also store snapshots of historical balances. This allows you to compare what you did today with what you did last month or last year. When making decisions, the ability to see a wide time horizon is critical for distinguishing between trends and random fluctuations.
  • A transaction database is volatile; its information constantly changes as new orders are placed or cancelled, as new products are built or shipped, or as new reservations are made. A data warehouse is stable; its information is updated at standard intervals— perhaps monthly, weekly, or even hourly—and, in an ideal world, an update would add values for the new time period only, without changing  values previously stored in the warehouse.
  • A transaction database must provide rapid retrieval and updating of detailed information;a data warehouse must provide rapid retrieval of highly summarized information. Consequently, the optimal design for a transaction database is opposite to the optimal design for a data warehouse. In addition, querying a live transaction database for management  reporting purposes would slow down the transaction application to an unacceptable degree.


There are other reasons to create a data warehouse, but these are several of the key reasons, and should be sufficient to convince you that creating a data warehouse to support management reporting is a good thing.

No comments:

Post a Comment