DatatWare Housing: 2010

Oracle Buys Carleton Corporation to Enhance Warehouse Offering

"REDWOOD SHORES, Calif., and MINNETONKA, Minn., Nov. 9 /PRNewswire/ -- Oracle Corporation (Nasdaq: ORCL) and Carleton Corporation (Nasdaq: CARL) today announced that the two companies have signed a definitive merger agreement for Oracle to acquire Carleton, an early innovator of data quality and mainframe data extraction software for customer-focused data warehousing applications. The acquisition will be effected through a cash merger pursuant to which holders of Carleton common stock will receive approximately $2.45 per share or $8.7 million in the aggregate. The parties anticipate closing the transaction by the end of February 2000, which is subject to approval by Carleton's stockholders and certain other closing conditions". The offer values Carleton shares at $2.45 a share, which is less than the stock's closing price on Monday of $2.56.

According to Michael Howard, vice president of Oracle's Data Warehouse Program Office, "It increases our ability to provide better knowledge to customers; it's a big win for Oracle. There's a lot of dirty data about customers today; it's stored in a fragmented way. The Carleton software can clean it up. This is a top priority for e-businesses."

Carleton was founded in 1979 and marketed a product called Passport, which was an early leader in extract/transform/load tools for data warehousing. It was a strong contender with customers who had mainframe legacy data such as sequential files and VSAM. In 1997 Apertus Corporation acquired Carlton, and the new company became the Apertus Carlton Corporation. Apertus was a supplier of data cleansing and integration software. The product of this merger became Carleton Pure*Extract (data movement) and Carleton Pure*Integrate (data cleansing and validation). In September, Carlton secured a working capital line of credit from Silicon Valley Bank of Santa Clara, California in order to continue its growth strategy. Carlton had problems with their market capitalization, and their 52-week low stock price at the time of the announcement was $1.00 per share. .

Market Impact

TEC predicted Carlton would be acquired in the news analysis "Data Warehouse Vendors Moving Towards Application Suites", September 29, 1999. With this acquisition, Oracle gains access to both the data extraction/transformation and the data cleansing technologies. Oracle has traditionally had their customers use Oracle gateways to access mainframe data, so it is very probable that the data cleansing technology is what they were after for inclusion in Oracle Warehouse Builder. Carleton has been an obvious takeover candidate for some time because of their financial problems. Oracle is the second vendor to acquire data cleansing technology recently, as Ardent Software acquired Prism and its QDB data cleansing technology earlier in the year. More vendors will need to add this feature to their suite, so TEC reiterates the prediction that the remaining data cleansing vendors, such as Vality and Trillium, will be courted vigorously by the remaining extract/transform/load vendors such as Computer Associates and Sagent.

SOURCE:
http://www.technologyevaluation.com/research/articles/oracle-buys-carleton-corporation-to-enhance-warehouse-offering-15494/

SAS/Warehouse 2.0 Goes Live

0 comments Posted by Ram at 5:12 AM

"CARY, N.C. (Feb. 24, 2000) - SAS Institute, the market leader in integrated data warehousing and decision support, has announced the production availability of SAS/Warehouse Administrator software, Version 2.0. Demonstrated at the Data Warehousing Institute conference in Anaheim, Calif., this new version provides IT the ability to proactively publish data warehouse information and track its usage, plus aggressively manage the process of change in the data warehouse."

"Data warehouses and data marts have become a vital component of all successful data mining, knowledge management, business portal, e-intelligence, customer relationship management (CRM) and supplier relationship management (SRM) applications today," said Frank Nauta, product manager for SAS/Warehouse Administrator at SAS Institute.

Nauta added, "Successful warehouses are continuously changing to keep pace with the changing business rules that they support. SAS/Warehouse Administrator simplifies change management by providing information delivery through e-enabled viewers like MetaSpace Explorer and integration with business intelligence and reporting tools. It helps make IT professionals more productive by giving them the ability to publish data from the warehouse - putting information in the hands of those who need it and freeing up IT staff for other projects."

Version 2.0 offers proactive information delivery with the addition of publish-and-subscribe tools robust enough for the entire enterprise, and offers enhanced intelligent access to data sources including SAP AG's R3, Baan, Oracle, DB2, Teradata, SQL Server 7.0, Sybase and many others.

SAS Institute was voted No. 1 in data warehousing and business intelligence in DM Review's 1999 Data Warehouse 100. SAS/Warehouse Administrator is a key component of this award-winning solution. Installed at more than 600 sites worldwide, SAS/Warehouse Administrator is the leading tool to help IT professionals meet the demands of administering a warehouse.

"IT and business users will find significant enhancements when building and designing warehouses and extraction, transformation and loading (ETL) processes," Nauta said. "By tracking the usage of information in the warehouse, IT staff can identify and remove data that is not being used. Removing unnecessary data makes the warehouse more efficient and maximizes hardware investments."

Market Impact

Publish/Subscribe metaphors are becoming much more common in the data warehouse arena. The ability for users to subscribe (request information on a regular basis), and for the server to publish ("push") that information automatically to those users is a powerful feature. Platinum Technology had already begun on an effort in this area, known as "Project C", which was delayed by their acquisition by Computer Associates.

It has been made clear by the success of products such as PointCast that push technology is an important aspect of information distribution. In addition, although SAS Institute has not been well known in the Extract/Transform/Load arena, they have a strong offering.

SAS has also formed a business intelligence alliance with IBM which may be leveraged by customers in the general data warehousing arena.

SOURCE:
http://www.technologyevaluation.com/research/articles/sas-warehouse-2-0-goes-live-15410/

Datawarehouse Vendors Moving Towards Application Suites

0 comments Posted by Ram at 5:12 AM

During September, two more data warehousing vendors announced product suites that they claim offer broader integration between business intelligence, data movement, data cleansing and metadata management. BI vendor Cognos (NASDAQ: COGN) announced "Cognos platform", a tool to build complete "BI-ready data infrastructures". Data Movement vendor Ardent Software (NASDAQ: ARDT) announced "DataStage XE", which is designed to "simplify integration of multiple data sources and business intelligence tools".

"Cognos platform"

Cognos claims their new product is an "end-to-end platform, the first solution for building, managing and deploying BI solutions for enterprise and e-business needs" which "includes enterprise infrastructure layers that cover data mart building, integrated meta data modeling, and integrated security" as well as content management and distribution.

The product is a suite that contains:

*

Extract/Transform/Load capabilities for populating data marts. This technology was acquired with the purchase of U.K. based Relational Matters, which created the DecisionStream product, "the first OLAP-aware data integration product". DecisionStream is an ETL tool capable of populating OLAP hypercubes, such as the one used by Cognos PowerPlay.
*

Metadata management capabilities to provide centralized metadata and business rules. All applications in the suite share a common meta model.
*

Business intelligence tools in the form of Cognos PowerPlay and Impromptu. These products have made Cognos one of the top three business intelligence vendors.
*

Additional services such as data mart modeling and common security functions.

Ardent "DataStage XE"

Ardent has acquired a number of other software vendors in the last year, and is attempting to integrate the purchased technologies into their existing DataStage 3.6 product. Like Cognos, they are offering a suite that, in addition to providing data movement capabilities, offers metadata management and data quality management to "simplify integration of multiple data sources and business intelligence tools".

The product is a suite that contains:

*

Extract/Transform/Load capabilities in the form of DataStage 3.6. This product has made Ardent one of the top three ETL vendors.
*

Metadata management technology acquired with the purchase of Dovetail Software. This is a proprietary meta model, Ardent promises an XML-compliant repository in a future release. (For further details on metadata repository standards, see News Analysis "Is There Finally a Metadata Exchange Standard on the Horizon?" September 28, 1999).
*

Data quality technologies acquired with the purchase of Prism Solutions. Prism had previously purchased QDB Solutions, which created QDB/Analyze, a tool for complex data cleansing.
*

Improved mainframe functionality, also acquired with the purchase of Prism Solutions. Prism's mainframe based ETL tool provides Ardent with mainframe job scheduling and improved access to legacy data sources.

Market Impact

Data warehouse vendors are flocking to the suite concept. Ardent and Cognos have joined the ranks of larger vendors, such as Microsoft (with SQL Server 7.0) and Computer Associates (with DecisionBase 1.9), in attempting to provide end-to-end data warehousing solutions. Customers and prospects have been complaining for years about how hard it is to integrate ETL and BI tools into a functional whole that includes integrated metadata management. We believe that a truly integrated product would be a better solution than a suite, but suites are a step in the right direction. Users want "one stop shopping" with the ability to procure from a single vendor.

The current market direction indicates that more vendors will merge or be acquired as the larger players in the data warehousing space attempt to buy technology and fill in holes in their product offerings.

*

Ardent is the only ETL vendor that possesses a data cleansing tool, so we expect that data cleansing vendors such as Vality and Trillium will be targets for other ETL vendors. We believe the company most at risk is Carleton, which acquired data cleansing vendor Apertus in October of 1997. Carleton's (NASDAQ:CARL) stock price was recently at $1.75, and they recently had to secure a working capital line of credit with Silicon Valley Bank.
*

Cognos includes a complete BI solution with their product. Ardent has no business intelligence offering in the DataStage suite. However, Ardent has partnered with Business Objects in the U.K to provide a more complete solution in Europe. We expect to see this partnership extended to North America.
*

Neither vendor has adhered to the XML standard in their metadata repository. Both Ardent and Cognos will have to address this issue.
*

Ardent now has stronger mainframe capabilities because of the Prism acquisition, and already had a market-leading ETL tool. Cognos will have to prove the strength of its ETL solution and ensure that the tool can extract legacy data.

SOURCE:
http://www.technologyevaluation.com/research/articles/datawarehouse-vendors-moving-towards-application-suites-15485/

Can You Add New Life To an Old ERP System

0 comments Posted by Ram at 5:12 AM

Recently, TEC featured an article by Olin Thompson titled, "The 'Old ERP' Dilemma: Replace or Add-on" which discussed options available to companies who want to add business functionality to their "Old ERP" systems. Certainly, there are many options now available in new business functionality that run the gamut from Supply Chain Planning (SCP) to Customer Relationship Management (CRM). The pros and cons of replacing or adding on to your existing ERP system were set forward in Thompson's article. But before you look to new ERP functionality, you should see if you are getting the full benefit out of your existing system. If not, are there ways to add new life to your current ERP system without going into an extensive development project.

Whether you have an old or new ERP system you have probably learned that to maximize its value, you have to work hard at getting information from the ERP system to key users. According to Thompson, " the data checks in, but the information can't check out of many ERP systems". You also may be finding that as e-business strategies are emerging in your supply chains, you could need access to more externally generated information than your ERP system, in its current configuration, can handle. For an Information Technology manager, both situations are problematic. Many companies should take another look at data warehousing before deciding that what to do with the "old ERP" system.

*In Memoriam

Does Data Warehousing Really Work?

Bob Cramer, Director of IT for Appleton WI based Anchor Food Products has found that, " lots of the pain we have with our old ERP system is based on users not having access to information. We see data warehousing addressing most of the problems our users have with the old ERP system". Today, most reporting from older ERP systems is directly from the ERP transaction processing (OLTP) system. Typically, users take ERP transactional data and input it to an Access database or a spreadsheet to generate the reports they need to make business decisions. From a user perspective, the extraction and re-inputting of information is both time consuming and potentially error prone. From an IT perspective there are no opportunities to build in validation checks to ensure that the information is either reliable or the most current available.

Data warehousing provides another way of getting information from legacy systems. Many companies have found it necessary to "build around" their ERP system to some extent. For example, Advanced Planning and Scheduling (APS) systems have often been added after the ERP installation. Companies find that they can report from either their ERP or their APS systems, but have difficulty combining data from both systems without having to create new databases or spreadsheets. Once the data is extracted from the systems, it is very difficult to ensure its integrity. James F. Dowling pointed out in the TEC article, "Business Basics: Unscrubbed Data is Poisonous Data" data should be managed as a corporate asset that appreciates in value over time. Historical data must be addressed with as much care as current database content".

The Data Warehousing alternative uses a better approach. It "packages" the information in data cubes that are customized for each group of users. Once the information is packaged in a data cube, users can extract the information using an On Line Analytical Processing (OLAP) tool. Today, OLAP tools are available as client-server applications or can be operated from a Web browser.

The data warehouse also can include information that is not in your ERP system. By adding information from outside the ERP system, IT can provide users access to ALL the transaction information that the company collects as well as whatever information they might want to collect from OUTSIDE the company. This is a significant difference and a potentially powerful advantage. Pat Clifford, Director of Business Consulting at the Boise ID agri-business giant the J. R. Simplot Company, found after installing a data warehouse comprised of company information from their ERP and several legacy systems, " it not only gave more information to our employees, but allowed them to move from just reading reports to performing managerial analysis."

What is the Best Way to Integrate Your Old ERP with Data Warehousing?

There are two basic strategies that can be used to start a data warehousing project. For certain ERP systems third party providers have developed 'off the shelf" data warehousing solutions that are pre-built to the fit the features of your ERP system. If you have an old ERP system that is supported by a data warehousing "solution", you should seriously consider this option. Data warehouse solution products are usually based on the ERP modules you have installed. You can roll out the data warehouse to one module at a time making it easier for IT to manage. One major advantage of using a data warehousing solution is that it can be done in a significantly shorter timeframe than if you have to buy an entire data warehousing tool set.

If your old ERP system is not supported by a data warehousing solution product, you will need to "build your own" using a tool set provided by a data warehousing vendor. At Simplot, Clifford found there were advantages in defining the project by functional areas instead of trying to create one big project: "Different functional areas look at information in different ways, so it's important to work with each group as you build the data warehouse". The advantage of a data warehousing tool set is that it gives you total control over what kind of information you want to present to your users. The disadvantage is that it will take more time and internal resources to implement.

Conclusion

IT managers are under increasing pressure to deliver information that can be used to perform managerial analysis. Decision makers in companies are no longer content to read the simple reports that are generated by old ERP systems. They need to have access to multi-dimensional information based on transactions generated both inside and outside your company. A well thought out data warehousing project can address many of the user issues behind their perceived need for a new ERP system.

SOURCE:
http://www.technologyevaluation.com/research/articles/can-you-add-new-life-to-an-old-erp-system-16444/

Computer Associates Splashes Into the Data Warehousing Market with Platinum Technology Acquisition

0 comments Posted by Ram at 5:11 AM

In 1993, a vendor of artificial intelligence software named Trinzic acquired Channel Computing of Portsmouth, NH and inherited a product called InfoPump. InfoPump was a script-based data movement tool for the portion of the data warehousing market known as Extract/Transform/Load (ETL) tools, and was a market leader in the early 1990's. In 1995, Platinum technology International, inc. purchased Trinzic. The large influx of research & development capital from Platinum allowed the InfoPump developers to greatly enhance the 3.0 version of the product. Unfortunately, at the same time, market analysts began to predict the demise of scripted data-movement tools. The belief was that graphical tools were necessary to reduce the need for programmers and increase use by the business analysts who actually owned the data. In 1997 Platinum began development of DecisionBase 1.0, a combination of a GUI mapping tool (developed in-house), Platinum's Open Edition Repository version 1.6 (for metadata management, technology acquired by Platinum's purchase of the Reltech and Brownstone companies), and InfoPump 3.2 (for data movement).

DecisionBase 1.0 was released in March of 1998, but GUI mapping functionality was severely limited in the initial release. For instance, the mapper always assumed a row that was being moved to a target database was an insert and the code had to be manually modified to allow an update. By March of 1999, with release 1.9, a significant amount of new functionality had been added, including the ability to bulk-load data to Oracle and IBM UDB, support for Microsoft SQL Server 7.0, the ability to do pre- and post-processing, and the ability to modify the generated SQL from within the mapper.

InfoWorld Magazine favorably reviewed the product in March of 1999. The major drawback was the DecisionBase price tag, which was listed at approximately $200,000, depending on the number of metadata scanners and database interfaces necessary. This was the major contributor in the size of the installed base, which at last count was approximately 40 customers.

The market for ETL tools is expected to grow from $327 million dollars in 1996 to $620 million dollars by 2001, an increase of almost 90%. The number of vendors in the ETL market in the mid-1990's was small, comprised of basically four companies (Prism, Carleton, Evolutionary Technologies, Trinzic) plus some modest offerings from IBM. In the past four years, the space has become very crowded, with over fifty vendors competing in various market niches (e.g. specializing in access to VSAM databases). Four vendors still primarily control the general market, including Ardent, Computer Associates, Informatica, and Sagent, with some offerings from IBM and Oracle. Prism has merged with Ardent, Carleton with Apertus, and Platinum with Computer Associates. The major vendors are now working furiously to find a competitive differentiator, with the most popular differentiator being integration with Enterprise Resource Planning packages such as SAP and PeopleSoft.

Product Strategy and Trajectory

Computer Associates' DecisionBase strategy is to create a new product, to be known as "DecisionBase TND" (The Next Dimension). This product will include:

*

Integration with CA Unicenter TNG (The Next Generation) to provide integrated workflow services and replace Platinum's "Synergy" product. Unicenter will also provide high-speed data transport services, probably via CA's "TransportIT" product (probability 80%).
*

Integration with CA Jasmine TND Object Oriented Database. This relates to CA's "Aspen" repository effort, which is the new Microsoft Repository that Platinum co-developed with Microsoft.
*

Integration with Platinum's Forest & Trees product for data visualization. We believe portions of this integration are a result of an ongoing internal Platinum effort known as "Project C" (an integrated decision support infrastructure suite which would provide services that unify multi-vendor decision support tools)

Platinum technology's main competitive advantage is the metadata management aspect of the product, as it is one of only two true enterprise metadata repositories on the market. For customers interested in a true fully functional repository capable of functions like impact analysis and data rationalization, this is a strong selling point. However, for customers only interested in data movement, the price tag associated with the inclusion of the repository is a difficult sell. The other advantage is that the underlying data movement engine is the InfoPump product, which is feature-rich and has a very large installed base. Computer Associates appears to be positioning the product as a strategic end-to-end data warehouse solution. As stated in their press release: "A Data Warehousing strategy must be based on the most powerful technology for collecting all the information that might possibly be relevant, regardless of its form and its location - in-house, the Web or other public data - through a powerful infrastructure, and through intelligent data collection tools. And to collect, understand, to correlate and leverage this information, through comprehensive metadata management. And to analyze it, using intelligence technologies. And to present the results effectively, through sophisticated visualization technology. And to deliver it, through modern user interfaces and the Web." It is likely that this strategy will be refined over the next few years as CA acquires additional software companies and their technologies.

Product Strengths

*

The product exhibits strength in its metadata management layer, especially for companies with on-going metadata management projects. Platinum's repository technology was industry-leading, as evidenced both by its large installed base and the fact that it was chosen by Microsoft to co-develop the repository included in Microsoft SQL Server 7.0 (see http://msdn.microsoft.com/repository/). The repository technology is a significant competitive advantage with customers interested in metadata, since none of CA's closest competitors have a competitive offering.
*

Data Movement Engine: the InfoPump data movement technology is well known in the industry for its robustness, is very flexible and has a strong development language. It has the capability to handle virtually any datatype, including exact numeric and support for Binary Large Objects (BLOB's).
*

Database interfaces: Another key differentiator is the fact that the interfaces which interact with the source and target databases are written to the native C API ("Application Programming Interface" for the C/C++ programming languages) supplied by the vendor, where possible. Many other vendors use generic ODBC (Open DataBase Connectivity, a Microsoft technology that is becoming a de-facto standard) for connectivity. Using the C API (for example Microsoft and Sybase call theirs "DBLib" and "CTLib", Oracle's is "OCI") allows the interface to take advantage of specific vendor features, some of which are quite powerful. ODBC is the least common denominator, so these specific features become unavailable. The C API also allows for support of non-standard datatypes that the vendor may implement, while the ODBC standard only supports a limited number of datatypes.
*

Integration with CA Unicenter will provide CA with access to its large installed base. Depending on the pricing model CA chooses, customers may regard it as just another Unicenter "plug-in".
*

CA will probably improve the method currently used to access legacy "flat files". A more intuitive approach to mainframe and UNIX file systems would be a big improvement.

Product Challenges

*

Computer Associates only recently completed the acquisition of Platinum technology. Since CA previously had virtually no presence in the data warehousing space, and a great number of Platinum employees either were not retained by CA or left on their own, it will likely be some time before the CA sales force can effectively market the DecisionBase product.
*

Another consequence of CA's new arrival to data warehousing will be market perception. Customers about to set out on expensive, long-term data warehouse projects will want strong assurances that CA is committed to data warehousing, both from a research and development perspective, and from a product line perspective. Customers who have doubt about CA's commitment to the product will go elsewhere.
*

Platinum was tardy in porting DecisionBase to the most popular UNIX platforms (HP/UX, Sun Solaris, and IBM AIX), while DecisionBase's closest competitors are already strong on those platforms. CA's position on UNIX ports has not been publicly announced, and DecisionBase is only available currently on Sun Solaris, and only in beta.
*

CA also lags behind the competition in Enterprise Resource Planning (ERP) integration. The product does have the ability to read SAP data, but there is no SAP BW integration (already present in the Ardent product), and no integration with PeopleSoft (present in both Ardent DataStage and Informatica PowerMart). If industry rumors are correct and CA acquires PeopleSoft, integration may occur sooner, but as with all CA acquisitions, integration of the corporate cultures and products would be long and painful.
*

Release dates have been pushed back and many customers are frustrated. Since the CA acquisition of Platinum, a number of key employees have left the company or had their positions redefined. This, in conjunction with the fact that the product future has been completely changed, will make it difficult for CA to meet the announced "Q4 1999" beta release date.
*

Cost. DecisionBase costs a great deal more than many of the competing technologies (from 50% to as much as 500%).
*

Widgets and Wizards. Competing products have pre-defined "widgets" (code snippets to perform more complex processing) which can be dragged and dropped into the GUI. DecisionBase is currently woefully short in this area, and provides no pre-defined widgets with the shipping product. DecisionBase is also short in the area of wizards, assistants that walk the user through a "question and answer" session to help fill out complex dialogs.
*

Mainframe data access. Some competing products have embedded technology to access non-DB2 data residing on IBM MVS mainframes, as well as other systems, often without the need to convert the data into a "flat file". Ardent's acquisition of Prism should give them a leg up on CA in this area.
*

InfoPump upgrade path. There is no way to migrate existing InfoPump code into the repository. This means that existing InfoPump customers can not upgrade to DecisionBase unless they are willing to re-write all of the work they have already done.

SOURCE:
http://www.technologyevaluation.com/research/articles/computer-associates-splashes-into-the-data-warehousing-market-with-platinum-technology-acquisition-15224/

A Definition of Data Warehousing

0 comments Posted by Ram at 5:09 AM

Bill Inmon
Bill Inmon is universally recognized as the "father of the data warehouse." He has over 26 years of database technology management experience and data warehouse design expertise, and has published 36 books and more than 350 articles in major computer journals. His books have been translated into nine languages. He is known globally for his seminars on developing data warehouses and has been a keynote speaker for every major computing association. Before founding Pine Cone Systems, Bill was a co-founder of Prism Solutions, Inc.

Ralph Kimball
Ralph Kimball was co-inventor of the Xerox Star workstation, the first commercial product to use mice, icons, and windows. He was vice president of applications at Metaphor Computer Systems, and founder and CEO of Red Brick Systems. He has a Ph.D. from Stanford in electrical engineering, specializing in man-machine systems. Ralph is a leading proponent of the dimensional approach to designing large data warehouses. He currently teaches data warehousing design skills to IT groups, and helps selected clients with specific data warehouse designs. Ralph is a columnist for Intelligent Enterprise magazine and has a relationship with Sagent Technology, Inc., a data warehouse tool vendor. His book "The Data Warehouse Toolkit" is widely recognized as the seminal work on the subject.

In order to clear up some of the confusion that is rampant in the market, here are some definitions:

Data Warehouse:

The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process".

He defined the terms in the sentence as follows:

*

Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations.
*

Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
*

Time-variant: All data in the data warehouse is identified with a particular time period.
*

Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

(Source: "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995). This definition remains reasonably accurate almost ten years later. However, a single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.

Ralph Kimball provided a much simpler definition of a data warehouse. As stated in his book, "The Data Warehouse Toolkit", on page 310, a data warehouse is "a copy of transaction data specifically structured for query and analysis". This definition provides less insight and depth than Mr. Inmon's, but is no less accurate.

Data Warehousing:

Components of Datawarehousing

Data warehousing is essentially what you need to do in order to create a data warehouse, and what you do with it. It is the process of creating, populating, and then querying a data warehouse and can involve a number of discrete technologies such as:

*

Source System Identification: In order to build the data warehouse, the appropriate data must be located. Typically, this will involve both the current OLTP (On-Line Transaction Processing) system where the "day-to-day" information about the business resides, and historical data for prior periods, which may be contained in some form of "legacy" system. Often these legacy systems are not relational databases, so much effort is required to extract the appropriate data.
*

Data Warehouse Design and Creation: This describes the process of designing the warehouse, with care taken to ensure that the design supports the types of queries the warehouse will be used for. This is an involved effort that requires both an understanding of the database schema to be created, and a great deal of interaction with the user community. The design is often an iterative process and it must be modified a number of times before the model can be stabilized. Great care must be taken at this stage, because once the model is populated with large amounts of data, some of which may be very difficult to recreate, the model can not easily be changed.
*

Data Acquisition: This is the process of moving company data from the source systems into the warehouse. It is often the most time-consuming and costly effort in the data warehousing project, and is performed with software products known as ETL (Extract/Transform/Load) tools. There are currently over 50 ETL tools on the market. The data acquisition phase can cost millions of dollars and take months or even years to complete. Data acquisition is then an ongoing, scheduled process, which is executed to keep the warehouse current to a pre-determined period in time, (i.e. the warehouse is refreshed monthly).
*

Changed Data Capture: The periodic update of the warehouse from the transactional system(s) is complicated by the difficulty of identifying which records in the source have changed since the last update. This effort is referred to as "changed data capture". Changed data capture is a field of endeavor in itself, and many products are on the market to address it. Some of the technologies that are used in this area are Replication servers, Publish/Subscribe, Triggers and Stored Procedures, and Database Log Analysis.
*

Data Cleansing: This is typically performed in conjunction with data acquisition (it can be part of the "T" in "ETL"). A data warehouse that contains incorrect data is not only useless, but also very dangerous. The whole idea behind a data warehouse is to enable decision-making. If a high level decision is made based on incorrect data in the warehouse, the company could suffer severe consequences, or even complete failure. Data cleansing is a complicated process that validates and, if necessary, corrects the data before it is inserted into the warehouse. For example, the company could have three "Customer Name" entries in its various source systems, one entered as "IBM", one as "I.B.M.", and one as "International Business Machines". Obviously, these are all the same customer. Someone in the organization must make a decision as to which is correct, and then the data cleansing tool will change the others to match the rule. This process is also referred to as "data scrubbing" or "data quality assurance". It can be an extremely complex process, especially if some of the warehouse inputs are from older mainframe file systems (commonly referred to as "flat files" or "sequential files").
*

Data Aggregation: This process is often performed during the "T" phase of ETL, if it is performed at all. Data warehouses can be designed to store data at the detail level (each individual transaction), at some aggregate level (summary data), or a combination of both. The advantage of summarized data is that typical queries against the warehouse run faster. The disadvantage is that information, which may be needed to answer a query, is lost during aggregation. The tradeoff must be carefully weighed, because the decision can not be undone without rebuilding and repopulating the warehouse. The safest decision is to build the warehouse with a high level of detail, but the cost in storage can be extreme.

Now that the warehouse has been built and populated, it becomes possible to extract meaningful information from it that will provide a competitive advantage and a return on investment. This is done with tools that fall within the general rubric of "Business Intelligence".

Business Intelligence (BI):

A very broad field indeed, it contains technologies such as Decision Support Systems (DSS), Executive Information Systems (EIS), On-Line Analytical Processing (OLAP), Relational OLAP (ROLAP), Multi-Dimensional OLAP (MOLAP), Hybrid OLAP (HOLAP, a combination of MOLAP and ROLAP), and more. BI can be broken down into four broad fields:

*

Multi-dimensional Analysis Tools: Tools that allow the user to look at the data from a number of different "angles". These tools often use a multi-dimensional database referred to as a "cube".
*

Query tools: Tools that allow the user to issue SQL (Structured Query Language) queries against the warehouse and get a result set back.
*

Data Mining Tools: Tools that automatically search for patterns in data. These tools are usually driven by complex statistical formulas. The easiest way to distinguish data mining from the various forms of OLAP is that OLAP can only answer questions you know to ask, data mining answers questions you didn't necessarily know to ask.
*

Data Visualization Tools: Tools that show graphical representations of data, including complex three-dimensional data pictures. The theory is that the user can "see" trends more effectively in this manner than when looking at complex statistical graphs. Some vendors are making progress in this area using the Virtual Reality Modeling Language (VRML).

Metadata Management:

Throughout the entire process of identifying, acquiring, and querying the data, metadata management takes place. Metadata is defined as "data about data". An example is a column in a table. The datatype (for instance a string or integer) of the column is one piece of metadata. The name of the column is another. The actual value in the column for a particular row is not metadata - it is data. Metadata is stored in a Metadata Repository and provides extremely useful information to all of the tools mentioned previously. Metadata management has developed into an exacting science that can provide huge returns to an organization. It can assist companies in analyzing the impact of changes to database tables, tracking owners of individual data elements ("data stewards"), and much more. It is also required to build the warehouse, since the ETL tool needs to know the metadata attributes of the sources and targets in order to "map" the data properly. The BI tools need the metadata for similar reasons.

SOURCE:
http://www.technologyevaluation.com/research/articles/a-definition-of-data-warehousing-16730/

The Necessity of Data Warehousing

0 comments Posted by Ram at 5:08 AM

Data warehousing is an integral part of the "information age". Corporations have long known that some of the keys to their future success could be gleaned from their existing data, both current and historical. Until approximately 1990, many factors made it difficult, if not impossible, to extract this data and turn it into useful information. Some examples:

* Data storage peripherals such as DASD (Direct Access Storage Device) were extremely expensive on a per-megabyte basis. Therefore, much of the needed data was stored offline, typically on magnetic tape.

* Processing power was very expensive as measured in MIPS (Millions of Instructions per Second). Mainframes had to reserve most of their processing power for day-to-day operations, reports could only be run overnight in batch mode (without interaction from the user).

* Relational database technology was still in its infancy, and server engines were not powerful enough to support the data loads required.

* The type of programming that had to be done with third generation languages (3GL's) was tedious and expensive. Fourth generation languages were needed to abstract some of the required coding, but 4GL's were still in their infancy.

Most operational data is stored in what is referred to as an OLTP (On-Line Transaction Processing) system. These systems are specifically designed for high levels of transaction volume with many concurrent users. If the database is relational, it has probably been "normalized" (the process of organizing data in accordance with the rules of a relational database). If the database is non-relational, custom programs have to be written to store and retrieve data from the database. (This is often accomplished with the COBOL programming language). Whether relational or non-relational, the very design that makes an OLTP system efficient for transaction processing makes it inefficient for end-user queries. In the 1980's, many business users referred to their mainframes as "the black hole", because all the information went into it, but none ever came back out - all requests for reports had to be programmed by the Information Systems staff. Only "pre-canned" reports could be generated on a scheduled basis, ad-hoc real-time querying was virtually impossible.

To resolve these issues, data warehousing was created. The theory was to create a database infrastructure that was always on-line, contained all the information from the OLTP systems, including historical data, but structured in such a way that it was fast and efficient for querying. The most common of these schemas (logical and physical database designs) is known as the star schema. A star schema consists of facts (actual business facts) and dimensions (ways of looking at the facts). One simple way to look at a star schema is that it is designed such that the maximum amount of information can be derived from the fewest number of table reads. Another way to reduce the amount of data being read is to pre-define aggregations (summaries of detail data, such as monthly total sales) within the star, since most queries ask questions like "how many were sold last month?"

Data warehousing also led to the development of the concept of metadata management. Metadata is data about data, such as table and column names, and datatypes. Managing metadata makes it possible to understand relationships between data elements and assists in the mapping of source to target fields. (For more information of Metadata see "Metadata Standards in the Marketplace ")

Next came the creation of Extract/Transform/Load (ETL) tools, which made use of the metadata to get the information from the source systems into the data warehouse.

Additional tools, which made use of SQL (Structured Query Language), were developed to give end-users direct access to the data in the warehouse. As time went by, the query tools became user-friendly, and many now have a parser that can turn plain English questions into valid SQL. These end-user tools are now loosely referred to as "business intelligence" tools. In addition, there are other database constructs used to assist business intelligence tools in multi-dimensional analysis of data in the warehouse. These databases are referred to as hypercubes (also known as cubes, multi-dimensional cubes, or MDB's).

Since the early 1990's, data warehouses have become ubiquitous, technology and methodology have been improving, and costs have been decreasing. In 1998, data warehousing was a $28 Billion (USD) industry, and growing at over 10% per year. In addition, a recent survey of top IT executives indicated that data warehousing would be the number one post-Y2K priority. Data warehousing is now recognized as an important way to add business value and improve return on investment, if it is properly planned and implemented.

Selection Issues

Selecting a set of products for a data warehouse effort is complex. The first and most important issue is to ensure that the Extract/Transform/Load tool that is chosen can effectively and efficiently extract the source data from all the required systems.

The selection of the ETL tool requires an understanding of the source data feeds. The following issues should be considered:

* Many warehouses are built from "legacy" systems that may be difficult to access from the computer network. ETL tools often do not reside on the same machine as the source data.

* The data structures of the legacy systems may be hard to decompose into raw data.

* Legacy data is often "dirty" (containing invalid data, or missing data). Care must be taken in the evaluation of the tool to ensure it has an adequate function library for cleansing the data. Depending on the complexity of the cleansing required, a separate tool designed specifically for cleansing and validation may have to be purchased in addition to the ETL tool.

* The ETL tool should have a metadata ("data about data") repository, which allows the data sources, targets, and transformations to be tracked in an effective manner.

* The tool should be able to access legacy data without the need for pre-processing (usually with COBOL programs) to get the data into sequential "flat files". This becomes increasingly complex when working with filesystems like VSAM (Virtual Sequential Access Method), and files that contain COBOL Occurs and Re-Defines clauses (repeating groups and conditionally defined fields). It should be noted that a large percentage of the world's data is stored in VSAM files.

* A final issue is whether the ETL tool moves all the data through its own engine on the way to the target, or can be a "proxy" and move the data directly from the source to the target.

Selection of the business intelligence tool(s) requires decisions such as:

* Will multi-dimensional analysis be necessary, or does the organization need only generalized queries? Not all warehouse implementations require sophisticated analysis techniques such as data mining (statistical analysis to discover trends in the data), data visualization (graphical display of query results), or multi-dimensional analysis (the so called "slice and dice").

* Will the architecture be two-tiered or three-tiered? Three-tiered architectures offload some of the processing to an "application server" which sits between the database server and the end-user.

* Will the tool employ a "push" or a "pull" technology? ("Push" technology publishes the queries to subscribed users, much like Pointcast works, "pull" requires that the user request the query).

* Will the information be broadcast over a corporate intranet, extranet, or the Internet?

* How will the organization implement data security, especially if information is being broadcast outside the corporate firewalls?

SOURCE:
http://www.technologyevaluation.com/research/articles/the-necessity-of-data-warehousing-15998/

A One-stop Event for Business Intelligence and Data Warehousing Information

0 comments Posted by Ram at 5:08 AM

The Data Warehousing Institute (TDWI) hosts its quarterly World Conference in cities across the US to help organizations involved in data warehousing, business intelligence (BI), and performance management, by giving them access to industry experts, and providing impartial classes related to topics pertinent to the industry. As the industry grows, organizations are faced with questions about how to best access their data to drive profits and meet goals and budgets. The need to understand data warehousing and the best means of leveraging data has become essential to developing a forward-looking approach to a BI solution.

This year, TDWI's summer event was held in San Diego, California (US) from August 20 to 26. Participants were able to take advantage of courses given by worldwide BI experts, as well as network with peers, have access to vendors and product demonstrations, and participate in one-on-one sessions with industry experts and instructors. The six-day event provided one-stop shopping for participants, who were able to take advantage of planned networking events, a two-day trade show highlighting various vendor offerings, and classes ranging from data warehousing testing techniques to best practices in performance management. The advantage of this one-stop shopping approach was that organizations had the opportunity to evaluate software, compare vendor offerings, and gain knowledge from other organizations that have implemented their own data warehousing environments.

The conference focused on five main themes, namely business analytics, leadership and management, data analysis and design, data integration, and administration and technology. These themes identify the main areas within data warehousing and BI, and provide the necessary knowledge related to the whole design and implementation process. A series of classes were offered in each area to allow users to focus on a specific industry aspect, or to gain an overall understanding of the sector and the different driving forces within it. Not only does TDWI focus on technology and the drivers associated with technological advances, but a key advantage to participating in the conference series is the additional focus on the business side of technology, and on managing the business processes associated with BI and performance management.

TDWI Overview

TDWI delivers research, education, and news, which enables individuals, teams, and organizations to leverage BI industry information to improve organizational decision-making, optimize performance, and achieve business objectives. One of TDWI's goals is to provide organizations with the impartial information required to make informed decisions. Although the organization runs events sponsored by various vendors, and provides users with product-related information, TDWI touts itself as being a central impartial resource for information. Business and information technology (IT) evaluators of solutions—whether in the requirements-gathering or enhancement phase of a current platform—can access a wide range of information, including classes, webinars, on-site training, and research.

TDWI has an international membership program, and provides industry publications and news, and a comprehensive web site. A division of 1105 Media, TDWI was created in 1995. It has over 5,000 members from Fortune 1000 companies, and includes both business and technology professionals. It is regarded as one of the central organizations for collecting data and providing insight into the world of data warehousing and BI.

TDWI collects and promotes best practices research to educate technical and business professionals about new BI technologies, concepts, and the approaches that have been applied in other organizations. This research also addresses significant issues and problems that organizations have experienced, and how they were handled. Many companies use TDWI's information to identify how they measure up to industry standards, how to take advantage of new or upcoming technologies, and how to address issues that relate to how they conduct business. The benefit of this information is two-fold. First of all, organizations can keep on top of enhancements within the industry, and can gain a wider knowledge base than that provided to them by their service provider (their selected vendor). Secondly, TDWI can help drive industry trends by leveraging the needs of organizations, as well as the way vendors should develop products to meet those needs.

TDWI's annual BI Benchmark Report identifies best practice metrics and compares TDWI's data warehousing maturity model to the industry. Many organizations consult this report to benchmark their BI use to ensure they are optimizing their implemented solutions, and discover ways to continuously improve their technical platforms and BI environments. This can include comparing their current environment with other organizations, or looking at information about other organizations within their vertical markets. TDWI also distributes other industry-related publications:

* The Business Intelligence Journal, published biannually, provides information and resources for BI and data warehousing professionals. The focus is on actionable advice on how to plan, build, and deploy BI and data warehousing solutions.
* Ten Mistakes to Avoid is distributed quarterly, and advises readers on different topics related to building, deploying, or maintaining a data warehouse, or managing a data warehouse team.
* What Works: Best Practices in Business Intelligence and Data Warehousing, also distributed quarterly, gives readers a comprehensive collection of case studies, questions and answers, and lessons learned from the experts.
* TDWI e-mail newsletters provide up-to-date news and industry commentary.

These publications provide users with continual information on the industry, and can help identify pitfalls in order to prevent them from making those same mistakes. Also, organizations that are in the same situations can gain insights on how to solve issues, as well as learn from other organizations and industry experts.

TDWI also develops webinars to discuss pertinent issues in the BI and data warehousing industry, and gives training at customer sites. TDWI seminars deal with the skills and techniques used to ensure successful implementations of BI and data warehousing projects. Overall, TDWI leverages its decade of experience within the data warehousing and BI industries to provide organizations with the information needed to make the best decisions possible. This way, organizations can access information that is industry-specific (without a bias towards one vendor versus another), and benchmark their own BI and data warehousing environments against organizations that have more experience implementing and growing these solutions. Also, organizations can compare and contrast challenges and issues as they arise.

TDWI Conference Tracks

Each of the quarterly conferences focuses on different tracks. These tracks present business and IT users with classes and seminars that highlight main industry trends, and provide a basis for enhancing their current data warehousing and BI environments (or aid in the requirements and selection process to implement such an environment). Not only do classes provide a wealth of information that can be justly described as verging on information overload, but in-class exercises, depending on the class, allow users to internalize the information to which they are being exposed. Aside from diverse and in-depth topics, the instructors are experts (whether within their respective industries, or their consultancy practices). Not only can users learn about the topics being presented, but they can also meet with experts to gain additional insight into topics directed specifically to their organizations.

Over fifty classes were offered during this summer's six-day conference. Topics ranged from data warehousing testing techniques to performance management benchmarking practices, in either full-day or half-day sessions. This allowed participants to learn about the latest trends, best practices, and industry insights on how to improve their current structure or enhance their technical platforms. Five tracks were presented during the event:

* Business Analytics
The business analytics track focused on both business and technical aspects of analysis. Topics included performance management, the definition and delivery of business metrics, data visualization, and the deployment and use of technology solutions. Solutions discussed included online analytical processing (OLAP), dashboards, scorecards, and data mining, as well as analytic applications. This focus allows organizations to gain insight into areas within BI and the different aspects of insight that analytics can provide. Organizations that require a subset of BI can identify how their needs can be met, by identifying requirements based on the topics presented. Additionally, they can take advantage of the trade show to identify those vendors that meet their needs, or those that (while not all-encompassing BI vendors) play in a specific space within the industry, such as data mining or data integration.

* Leadership and Management
The leadership and management track provided users with the insights needed to take a project from inception through to completion. Aside from identifying process and project management methodologies related to data warehousing and BI projects, emphasis was placed on the overall management of these projects. Ideas presented ranged from team building and the high level technical requirements needed to manage such projects, to other business areas such as customer relationship management (CRM) and supply chain management (SCM). This focus allowed users to identify a broad range of topics and considerations needed to implement and manage a data warehousing project through the systems development life cycle. Additionally, outside markets were identified to show the interrelation between BI and other industries. For example, many operational BI efforts are driven by SCM and the need to manage day-to-day decisions from the shop floor.

* Data Analysis and Design
A key focus of the data analysis and design theme centered on the skills needed to identify business needs and to transform those needs into data structures that are adaptable, extensible, and sustainable to the business unit. Course topics included needs analysis, specifications of business metrics, and data modeling. These topics and surrounding concepts encompass the backbone of developing a data warehousing and BI platform. Identifying business and systems requirements and translating them into the appropriate systems requirements is essential within any project. Within data warehousing and BI, it becomes more important as platforms are designed, and as business needs analysis has to be integrated into the actual design of the platform. Integration questions center on whether the current systems will integrate with the new software—and more importantly, how they will integrate.

* Data Integration
The theme of data integration included all the topics related to implementing a data warehouse solution. Included were data profiling; data transformation; data cleansing; source and target mapping; data cleansing and transformation; and extract, transform, and load (ETL) development. It is important not to underestimate the importance of data integration, as the way data is integrated into a data warehouse or BI solution is the essence of that system. If a scorecard is developed to measure an organization's sales metrics and the source data is not accurate, the key performance indicators (KPIs) set and reported on will be meaningless.

* Administration and Technology
The administration and technology track identified and covered topics related to infrastructure management, and the continued successful operation of data warehousing and BI solutions. The focus was on technology architecture, planning and configuration, system and network administration, database administration, and access and security administration. Maintenance of the implemented architecture and platform is essential to continued success in the data warehousing and BI environment. This section helped bridge the gap between administration and technology, and identifies the complexity of managing these two aspects of a data warehouse.

SOURCE:
http://www.technologyevaluation.com/research/articles/a-one-stop-event-for-business-intelligence-and-data-warehousing-information-18778/

Microsoft Goes Their Own Way with Data Warehousing Alliance 2000

0 comments Posted by Ram at 5:02 AM

"REDMOND, Wash., Nov. 30 /PRNewswire/ -- Microsoft Corp. (Nasdaq: MSFT) today announced that 47 applications and tools from 39 top vendors throughout the industry have qualified for Microsoft Data Warehousing Alliance 2000. Alliance members and partners are committed to delivering tools and applications based on the Microsoft Data Warehousing Framework 2000, an open architecture for building business intelligence and analytical applications based on the open standards and services built into the Windows 2000 operating system, Microsoft SQL Server 7.0 and Office 2000. Application vendor membership for the Data Warehousing Alliance has more than doubled since it was originally announced in October 1998."

According to the release "organizations leveraging the framework and using alliance member products are better able to align local decision-making around key business drivers and harness the full potential of the web to win new customers, retain and extend customer relationships, and work more effectively with partners."

The architecture is based on OLE DB and the Open Information Model (OIM), in "recognition of the value and competitive advantage provided by the data warehousing services built into Microsoft products."

According to Microsoft, this technology is based on the Microsoft Data Warehousing Framework, which "is based on open, published protocols for interoperability and integrated end-to-end data warehousing services. It utilizes technologies provided in Microsoft Office 2000 and Microsoft SQL Server 7.0 products, and a partnership with Data Warehousing Alliance members for complementary tools and applications. The DWF enables data warehousing solutions where the data comes from virtually any source and where any type of information can be delivered to any compliant client interface or application."

Market Impact

Once again, Microsoft is using proprietary standards (OLE DB and OIM) to achieve its data warehousing goals. The more widely accepted standards are under the stewardship of the Object Management Group (OMG), which has over 800 members. OIM is a standard developed by Microsoft and turned over to the MetaData Council (MDC) which has "close to 50" members. For more information on the dueling standards bodies see "Is There Finally A Metadata Exchange Standard on the Horizon?", (http://technologyevaluation.com/news_analysis/09-99/NA_DW_MFR_9_28_99_1.asp,September 28, 1999). The alliance criteria require compliance with OLE DB for data access and the Open Information Model for sharing metadata. According to Colin White, president of DataBase Associates International Inc., "The Microsoft Data Warehousing Framework 2000 makes it easy to build Digital Dashboard applications integrating business intelligence, collaboration, and Web content right into the environment many knowledge workers live in: Outlook 2000."

This effort should make it easier for customers to integrate and use tools from multiple vendors, as long as their database is Microsoft's, and the other vendors are members of this alliance. The web component is to be provided by Microsoft's SQL Server 7.0, a component of the Windows DNA platform (Distributed interNet Architecture, introduced in 1997, Microsoft's umbrella term for its enterprise network architecture based on COM and Windows 2000 (NT 5.0)). The Windows DNA platform is advertised as "Microsoft's comprehensive platform for building Web applications."

We believe this will only serve to further fragment the data warehousing market. Obviously, Oracle is not a member of this alliance, and other applications show spotty representation. For example, in the enterprise resource planning space, Baan NV is represented, but SAP AG and PeopleSoft are not. In the area of supply chain analytics, the only vendor represented is Manuguistics Inc.

SOURCE:
http://www.technologyevaluation.com/research/articles/microsoft-goes-their-own-way-with-data-warehousing-alliance-2000-15412/

Newer Posts Older Posts Home

Blog Archive