A Data warehouse turns raw information into a useful analytical tool for business decision making. The fundamental question that gets addressed in about 1000 firms is: “Which customers are buying or using what products, services, when and where?”. If you know the answer to that question, then you are able to guide your business strategically.
Transaction processing systems can also play a strategic role in gaining competitive advantages for a business. Many firms are using the internet, extranets and other networks that tie them electronically to their customers or suppliers for real time or online transaction processing (OLTP). It is also a real time transaction processing system.
Companies often keep raw information in online transaction processing systems, which track day to day operations such as- each sale, purchase and inventory change. But OLTP systems are not well suited for answering questions that affect the past, present and future direction for a business question like: What are the historical trends in unit costs vs. growth in sales to customers in Orissa?
To answer those kinds of questions, a company needs an analysis system with the ability to perform ad hoc queries and create specialized reports. The raw material for analysis is a combined view of all the relevant data a company has – a data warehouse. The warehouse stores information from OLTP systems and other sources of raw data are external systems.
Metadata blueprints: Because information is coming from many sources, each with its own view of the data, a company must create an enterprise level data model to have a consistent view of its information. This metadata is the blueprint for the pieces of the data warehouse architecture.
The process of transforming raw data into a data warehouse involves several steps: extraction, consolidation, filtering, cleansing, conversion and aggregation. This process is collectively known as data warehouse generation. Generation is at the heart of the warehouse infrastructure, most of the efforts in a data warehouse project is spent on this process. Doing it right means the difference between finding answers that are valuable and answers that are useless. Here is what each of the steps involves:
Extraction – This step involves taking the data out of its original database and transforming it to the data warehouse infrastructure. Companies often place restrictions on what is extracted. For example, the extraction process may occur every day, so any changes to the raw data sources older than 24 hours are ignored.
Consolidation – It is the process of combining data from several sources into one database. To get a complete view of a customer, a company may consolidate data from older entry systems, sales contact database and technical support databases.
Filtering - Not every piece of data is needed. For example, a company may want to know which products customers have ordered but do not need the confirmation number used to process a sale. Filtering picks out the relevant data and removes duplicate entries.
Cleansing - The quality of an answer is only as good as the quality of data used to derive that answer, so it’s important to cleanse the data to improve the accuracy of the data in the warehouse. The classic example of poor quality data is a single customer with multiple entries. It requires some intelligence on part of the cleansing software to identify and correct such data.
Conversion – Conversion is also called translation, means mapping raw data onto new data fields within the warehouse data model and translating the data into the format used by the warehouse. For example, the original supplier data may count widgets by the gross, while manufacturing data tracks widgets individually. In a data warehouse, the units of measure must be same in order to get useful analysis results.
Aggregation – Often the value of a data warehouse is in the summarized data and derived data it contains, as opposed to the raw data stored in OLTP sources. The aggregation step sorts and combines data into useful metrics for analysis. For example, while the raw data may track individual orders by individual customers, a more useful measure of sales might be orders of a particular product family. The aggregation process generates these new calculated sales numbers.
It is a large centralized storage area such as a database. When an organization centralizes the storage off data, it is said to be putting the data into a data warehouse. These warehouses contain millions of pieces of information about customer behaviour and demographics, and they are starting to contain information about other personal traits and behaviours. Even though the phase ‘warehouse’ implies that the data is shelved, infact the storage spaces are often nothing more than the servers that contain large arrays of disk for storing information.
The scalable data warehouse framework is a complete view of data warehousing. To grasp data warehousing fully, it is important to understand the data warehousing process. Data warehouses, in essence, store and access data supplied by OSSs (Open System Specification) and provide data for tools and applications.
The source data that will populate the data warehouse solution will come from OSSs. There are two major categories of operation systems: network systems, including planning and engineering, provisioning, network management and trouble/repair and business operations systems, including customer care, billing and directions.
A data warehouse will deliver network planning and analytical processes to support network asset optimization, provide cross-geographical views of critical network data and deliver the integration of both customer and network data to improve and enhance profitability. This solution will help communications service providers manage the evolution of their networks so that it is robust enough to support new, diverse services in a cost effective manner while addressing capacity and planning issues for basic network services.
As one might expect, the heart of a data warehouse is the data itself. The type and quantity of the data stored changes almost on a daily basis. Change is inherent to the data warehouse. This data changes as the business grows, as operational data changes, as the business question change, as the number of users changes and as the applications used to access the data change.
If a warehouse is built around a database engine that cannot handle its dynamic nature, failure is inevitable. This failure could force users to limit the questions they ask, result in a long delay for the return of queried data, and ultimately require the construction of separate, independent systems. Finally, a failure could necessitate the use of summarized data instead of detailed data and providing detailed data is quite possibly the most important role of the data warehouse. The capture of details data enables businesses to drill down and perform analysis on subject oriented data and get insight into the entire business. Most importantly businesses will learn from their mistakes and successes.
Data Marts and Operational Data Stores
A data mart is specialized set of business information focusing on a particular aspect of the enterprise, such as a department or business process. The information is a data mart often comes from several different raw data systems. Many companies choose to feed a data mart from a data warehouse because the in the warehouse has already been consolidated and processed from the same raw data.
An operational data store is a hybrid of an OLTP system and an analytical system. It contains information that is frequently updated in an ad hoc basis, often in response to changes in the OLTP system, as opposed to the scheduled updates of a data warehouse. The data within an operational store mirrors some of the data within the OLTP system. It has been extracted from the OLTP system and transformed and aggregated to a limited extent. Its purpose is to provide an operational level query system that won’t affect the performance of the raw OLTP systems. Databases used for transaction processing are designed to update thousands of records per second but are not designed for sophisticated querying. Data warehouse databases are designed to analyse terabytes of data and billions of records. They are organized to better allow analysis using special techniques.
Online Transaction processing involves real time transactions. It has been recognized that this data combined with current data, contains an enormous amount of information from which one can discover trends that would never be seen on a day-to-day or month-to-month basis. To make this data more useful, it is now being stored in a separate database called a data warehouse.
The Document Object Model (DOM)
The Document Object Model, unlike SAX, has its origins in the World Wide Web Consortium (W3C). Whereas SAX is public-domain software, developed through long discussions on the XMLdev mailing list, DOM is a standard just as the actual XML specification itself is. The DOM is also not designed specifically for Java, but to represent the content and model of documents across all programming languages and tools. Bindings exist for JavaScript, Java, CORBA, and other languages, allowing the DOM to be a cross-platform and cross-language specification. In addition to being different from SAX in regard to standardization and language bindings.
DOM is organized into " levels" instead of versions. DOM Level One is an accepted Recommendation, a Level One details the functionality and navigation of content within a document. A document in the DOM is not just limited to XML, but can be HTML or other content models as well! Level Two, which should finalize in mid-2000, adds upon Level One by supplying modules and options aimed at specific content models, such as XML, HTML, and Cascading Style Sheets (CSS). These less-generic modules begin to "fill in the blanks" left by the more general tools provided in DOM Level One.
The DOM and Java
- Using the DOM for a specific programming language requires a set of interfaces and classes that define and implement the DOM itself. Because the methods involved are not outlined specifically in the DOM specification, and instead the model of a document is focused upon, language bindings must be developed to represent the conceptual structure of the DOM for its use in Java or any other language. These language bindings then serve as APIs for us to manipulate documents in the fashion outlined in the DOM specification.
- We are obviously concerned with the Java language binding. The classes you should be able to add to your IDE or class path are all in the org.w3c.dom package (and its subpackages). However, before downloading these yourself, you should check the XML parser and XSLT processor you purchased or downloaded; like the SAX package, the DOM package is often included with these products. This also ensures a correct match between your parser, processor, and the version of DOM that is supported.
- Most processors do not handle the task of generating a DOM input themselves, but instead rely on an XML parser that is capable of generating a DOM tree. For this reason, it is often the XML parser that will have the needed DOM binding classes and not the XSLT processor. In addition, this maintains the loose coupling between parser and processor, letting one or the other be substituted with comparable products. As Apache Xalan, by default, uses Apache Xerces for XML parsing and DOM generation, it is the level of support for DOM that Xerces provides that is of interest to us.
Getting a DOM Parser
One thing that the DOM does not specify is how a DOM tree is created. The specification instead focuses on the structure and APIs for manipulating this tree, which leaves a lot of latitude in how DOM parsers are implemented. Unlike the SAX XMLReader class, which dynamically loads a SAX XMLReader implementation, you will need to import and instantiate your vendor's DOM parser class explicitly. To begin, create a new Java file and call it DOMParserDemo.java. We will look at how to build a simple DOM parsing program to read in an XML document and print out its contents. Create the structure and skeleton of your example class first, as shown in Example B.
Example B. DOMParserDemo Class
// Import your vendor's DOM parserimport org.apache.xerces.parsers.DOMParser;
/**
* DOMParserDemo will take an XML file and display
* the document using DOM
*/
public class DOMParserDemo {
/**
* This parses the file, and then prints the document out
* using DOM.
* @param uri String URI of file to parse.
*/
public void performDemo(String uri) {
System.out.println("Parsing XML File: " + uri + "\n\n");
// Instantiate your vendor's DOM parser implementation
DOMParser parser = new DOMParser( );
try {
// parser.parse(uri);
} catch (Exception e) {
System.out.println("Error in parsing: " + e.getMessage( ));
}
}
/**
* This provides a command-line entry point for this demo.
*/
public static void main(String[] args) {
if (args.length != 1) {
System.out.println("Usage: java DOMParserDemo [XML URI]");
System.exit(0);
}
String uri = args[0];
DOMParserDemo parserDemo = new DOMParserDemo( );
parserDemo.performDemo(uri);
}
}
- This is set up in a fashion similar to our earlier SAXParserDemo class, but imports the Apache Xerces DOMParser class directly and instantiates it. We have commented out our actual invocation of the parse( ) method for the moment; before looking at what is involved in parsing a document into a DOM structure, we need to address issues of vendor neutrality in our choice of parsers. Keep in mind that this is simple and works great for many applications, but is not portable across parser implementations as our SAX example was.
- The initial impulse would be to use Java constructs like Class.forName(parserClass).newInstance( ) to get an instance of the correct vendor parser class. However, different DOM implementations behave in a variety of fashions: sometimes the parse( ) method returns an org.w3c.dom.Document object (which we look at next); sometimes the parser class provides a getDocument( ) method; and sometimes different parameter types are required for the parse( ) method (InputSource, InputStream, String, URI, etc.) to be supplied with the URI. In other words, while the DOM tree created is portable, the method of obtaining that tree is not without fairly complex reflection and dynamic class and method loading.
Well, we are all familiar with Google Maps (both on computers and on mobiles).Recently, Google has launched a new feature for the mobile version of the Google Maps, "My Location".
The "My Location" feature is mainly for those phones which DO NOT have GPS feature. Still, it is capable of showing your position within 1000 meters of your actual position.And if you have GPS enabled handset, then off course it will show you your exact position.
So how does it work even without GPS?It's quite simple.The My Location feature uses the same technology that is being used while calling, texting etc, that is, through the network towers. Each network tower has got unique footprints. When we press the '0' button of our phone with the Google Maps application running, Google estimates your present location based on the unique footprint of the nearby tower. As said above, the accuracy can be within 1000 meters. This goes pretty close to GPS.
So what is the benefit of using My Location feature?It's simple too. Every time you want to search anything on the map, say restaurants, you are free from the hassle of finding out and entering your present location. Google will do that for you, that too without a GPS enabled phone. But yes, this feature is still in Beta, so sometimes you might get an error.
Frame Relay
- Frame Relay is packet-switching WAN technology that exists at the Data Link layer of the OSI model
- Typically more cost-effective than leased lines
- Communication occurs over virtual circuits
- Permanent virtual circuits (PVCs) are always available
- Switched virtual circuits (SVCs) are created and then terminated as required
Frame relay is a packet switching technology that exists at the data link layer of the OSI model and is one that has become increasingly common as a WAN solution since the early 1990s. Unlike with leased lines and circuit – switched networks, the available bandwidth on a providers frame relay network is shared amongst many subscribers. This sharing of resources leads to significantly lower costs tan traditional leased lines.
Many people tend to be confused by packet switching technologies like frame relay. Mostly this is a result of trying to under stand how data actually gets from one location to another. On packet switched networks (like frame relay), data streams are separated through the use of “virtual” rather than dedicated hardware circuits. In other words, a logical path is defined between endpoints, through a provider’s packet switched network. Many virtual circuits will be defined for different customers, and will be multiplexed overt the shared physical links of the network. An example, consider the figure below. It shows two different companies, each connecting two offices over a providers frame relay network. Notice that between frame relay switches X and Y, both of their virtual circuits traverse a common physical link. The data that one company passes between their own offices is completely separate from the data of the other company all data stay witching each company’s dedicated virtual circuit only.
Two main types of virtual circuits can be defined on a frame relay networks permanent virtual circuits (PVCs) and switched virtual circuits (SVCs) a PVC functions somewhat similar to a leased line, in that a service provider defines a path through the packet switched network to each customer location, in cases where companies wish to have “always-on” connectivity between locations using frame relay, PVCs are usually defined.
A switched virtual circuit (SVC) functions somewhat differently, almost like a circuit switched connection. SVCs are not permanent, and can instead be created as required across a packet –switched network. For example, an SVC could be created between a company’s head office and a remote location. For the duration of the connection, data might travel over a completely different path.
Frame relay networks are referred to as being non-broadcast multi access (NBMA). What this means is that, by default, broadcast traffic will normally not be passed over a virtual circuit without explicit configuration. This is an important consideration when dealing with the use of broadcast-based routing protocols like RIP or IGRP in a frame ready environment. You will look at how broadcast traffic can be handled on frame relay networks later in this section.
Frame relay equipment
- connections to a frame relay network require both DTE and DCE equipment
- the DTE equipment is a Cisco equipment
- the DCE equipment is a CSU/DSU
In order to connect to a frame relay network, both DTE and DCE equipment needs to be located at the customer premises. This DTE equipment is usually a router whose serial interface connects to a DCE device. In the past, customers required a completely separate DTE device known as a frame relay access device (FRAD) to connect to a frame relay network. However, almost all routers sold today (with an appropriate serial interface) are capable of handling frame relay encapsulation and communication. The DCE device is usually a CSU/DSU that provides clocking functions and the connection to the provider’s physical circuit. Ultimately m the physical ink from the customer, premises connects to the frame relay switching equipment of the service provider. This switching equipment is not the responsibility of the customer. The figure blow illustrates the interconnections of equipment on a frame relay network.
Once the planning for a computer program has been done, the next step in its development is to write the specific steps for solving the problem at hand in a language and form, which is acceptable to a computer system. A language that is acceptable to a computer system is called a computer language or programming language and the process of writing instructions in such a language for an already planned program is called programming or coding. The aim of this article is to introduce some of the common computer languages which are used for writing computer programs.
ANALOGY WITH NATURAL LANGUAGES
A language is a means of communication. We use a natural language such as English to communicate our ideas and emotion to others. Similarly a computer language is used by a programmer to instruct a computer what he/she wants it to do.
All natural languages (English, French, and German etc) use a standard set of words and symbols for the purpose of communication. These words and symbols are understood by everyone using that language. We normally call the set of words allowed in a language the vocabulary of the language. For example, the words we use in English form the vocabulary of English language. Each word has a definite meaning and can be looked up in a dictionary. In a similar manner all computer languages have a vocabulary of their own. Each word of the vocabulary has a definite unambiguous meaning which can be looked up in the manual meant for that language. The main difference between a natural language and a computer language is that natural languages have a large vocabulary but most computer languages use a very limited or restricted vocabulary. This is because a programming language by its very nature and purpose does not need to say too much. Every problem to be solved by a computer has to be broken down into discrete logical steps which basically comprise of four fundamental operations input and output operations, arithmetic operations movement of information within the CPU and memory and logical or comparison operations.
Each natural language has a systematic method of using the words and symbols of that language, which is defined by the grammar rules of the language. Similarly the words and symbols of a computer language must also be used as per set rules which are known as syntax rules of the language. In case of a natural language people computer language we must stick to the exact syntax rules of the language if we want to be understood correctly by the computer. Yet no computer is capable of correcting and deducing meaning from incorrect instructions. Computer languages are smaller and simpler than natural languages but they have to be used with great precision. Unless a programmer adheres exactly to the syntax rules of a programming language, even down to the punctuation marks his/her instructions will not be understood by the computer.
Over the years programming languages have progressed from machine-oriented languages which use strings of binary 1s and 0s, to problem-oriented languages which use common mathematical and/or English terms. However all computer languages can be broadly classified into the following three categories.
1. Machine language
2. Assembly language
3. High-level language
MACHINE LANGUAGE
Although computers can be programmed to understand many different computer languages there is only one language understood by the computer without using a translation program. This language is called the machine language of the computer. The machine language of a computer is normally written as strings of binary 1s and 0s. The circuitry of a computer is wired in a manner that it immediately recognizes the machine language instructions and converts them into the electrical signals needed to execute them.
A machine language instruction normally has a two-part format .The first part of an instruction is the operation code which tells the computer what function to perform and the second part is the operand which tells the computer where to find or store the data or other instructions which are to be manipulated .Hence each instruction tells the computer what operation to perform and the length and locations of the data fields which are involved in the operation. Every computer has a set of operation codes called its instruction set. Each operation code in the instruction set is meant for performing a specific basic operation or function. Typical operations included in the instruction set of a computer are as follows:
1. Arithmetic operations
2. Logical operations
3. Branch operations (For transfer of control to the address given in the operand field)
4. Data movement operations (For moving data between memory locations and registers)
5. Data movement operations (For moving data from/to one of the computers input/output devices)
Although some computers are designed to use only single-address instructions many computers use multiple-address instructions which include the address of two or more operands. For example the augends and addend may be the two operands of an addition operation.
More Articles …
Page 40 of 73