Pat's DBMS Blog

Friday, April 3, 2009

Introducing Structured Query Language (SQL)

Structured Query Language (SQL) is the most common language used to retrieve information from a database. It allows the retrieval, insertion, updating, and deletion of data as well as permission setting for security purposes. SQL is sub-divided into several elements including clauses, expressions, predicates, conditions, Boolean truth values, queries and statements. Writing SQL statements and using the language the way in which it was intended takes time and practice, especially when dealing with many entities, complex queries, integrated criteria, among other complications. Below is a link to an image of SQL functions for Microsoft's SQL Server from cheat-sheets.org.

In accordance with the relational data model, the database is treated as a set of tables. Relationships are represented by values in tables, and data is retrieved by specifying a result table that can be derived from one or more base tables. Here are a few examples of what SQL looks like:

INSERT is used to add rows or tuples to an existing table:

INSERT INTO My_table
(field1, field2, field3)
VALUES
('test', 'N', NULL);

UPDATE is used to modify the values of a set of existing table rows:

UPDATE My_table
SET field1 = 'updated value'
WHERE field2 = 'N';

DELETE removes zero or more existing rows from a table:

DELETE FROM My_table
WHERE field2 = 'N';

The following query would render the student ID's, in numerical order, of those students who scored an A in a particular course.

SELECT DISTINCTROW STUDENT_ID, GRADE
FROM GRADES
WHERE GRADES="A"
ORDER BY STUDENT_ID

This is an example of a SELECT query that returns a list of expensive cars. The query retrieves all rows from the Car table in which the price column contains a value greater than 60,000.00. The result is sorted in ascending order by model. The asterisk (*) in the select list indicates that all columns of the Car table should be included in the result set.

SELECT *
FROM Car
WHERE price > 60,000.00
ORDER BY model

Many SQL tutorials are available online for those interested in learning the language. W3Schools.com and SQLcourse.com provide introductory tutorials, while other sites like Microsoft's SQL Server Training provide in-depth information for the use of their own SQL software package.

Sources:
Jessup, Valadich & Wade. (2006). Information Systems Today: Why IS Matters.
Toronto: Pearson—Prentice Hall.

IBM. "Structured Query Language (SQL)". Database Systems.
<http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/c0004100.htm>
Accessed April 3, 2009.

Metadata: Role and Importance

Metadata is essentially "data about data". It provides a context for data. It is used to elaborate on existing information and to improve the understanding of its own characteristics and/or use. Industry metadata standards exist in industry verticals such as insurance, banks, manufacturing. In somes contexts, such as database schema, there are hierarchal levels. In data processing, metadata provides information about or documentation of other data managed within an application or environment. Metadata schema usually include a limited number of elements being described, the name of each element and its meaning. Key metadata elements supporting access to published documents include the originator of a work, its title, when and where it was published and the subject areas it covers.

Some of the most popular metadata schemas include:
Dublin Core
AACR2 (Anglo-American Cataloging Rules)
GILS (Government Information Locator Service)
EAD (Encoded Archives Description)
IMS (IMS Global Learning Consortium)

While the syntax is not strictly part of the metadata schema, the data will be unusable, unless the encoding scheme understands the semantics of the metadata schema. The encoding allows the metadata to be processed by a computer program. Important schemes include:
HTML (Hyper-Text Markup Language)
SGML (Standard Generalised Markup Language)
XML (eXtensible Markup Language)
RDF (Resource Description Framework)
MARC (MAchine Readable Cataloging)
MIME (Multipurpose Internet Mail Extensions)

Metadata is about controlling the quality of data entering the data stream. Metadata enhances retrieval performance. Metadata can improve retrieval, establishing a context for individual descriptors. Appropriate metadata tags around the different data elements allow search engines to seek information in a more discriminating way. The presence of a subject field (metadata element) can be used as a prompt for entering key words, or for use of controlled indexing terms to describe the document. Knowing how metadata works provides information managers with a mechanism for indexing documents more precisely.

Metadata provides a way of managing electronic digital objects. Many software packages use metadata as a way of managing electronic resources, whether it is for records retention schedules or for digital preservation. Content management systems (CMS), for instance, use metadata to track when a digital object was last updated or verified, who was responsible for its creation and whether any special access conditions apply. Metadata provides a focus for the establishment of standard practices. It is the metadata associated with digital objects that provides a common format for management and manipulation of resources.

Metadata can help to determine the authenticity of data. Metadata provides an audit trail to establish ownership and authenticity of a digital object such as an electronic document or image. The history of what has happened to a document or record in its life becomes an important part of this. Metadata provides evidence about the provenance of a resource and this underpins good governance, transparency and accountability. This is increasingly important for the many organisations that depend on electronic records rather paper files. It becomes necessary to demonstrate that the electronic document has been kept securely, is a complete record, and has not been tampered with. Metadata provides evidence for the integrity of an electronic document. This is particularly important in a legal context where electronic documents or physical records may be used as evidence in legal proceedings.

Metadata is the key to interoperability. Interoperability depends on exchange of metadata between systems to establish the nature of the data being transferred and how it should be handled. E-commerce is one example of interoperating, where several different proprietary systems may need to exchange data. Access to metadata helps to establish the protocols of exchange of data and ways in which it might be exploited.

Metadata is the future. An increasing number of software and systems suppliers are working to metadata standards or are creating their own standards for metadata. The growth of e-commerce depends on metadata. Many industries are developing their own infrastructure to allow software from different suppliers to work together and exchange data. Metadata generated by content management systems is seeing a renaissance on the internet after its initial use for subject description. Metadata standards are being used by portal software and to provide access to the information content of websites.

Sources:
University of Queensland. "An Introduction to Metadata". UQ Library.
<http://www.library.uq.edu.au/iad/ctmeta4.html>
Accessed April 3, 2009.

Haynes, David. "Five Purposes of Metadata". Cilip.
<http://www.cilip.org.uk/publications/updatemagazine/archive/archive2004/july/update0407.htm>
Accessed April 3, 2009.

Data Warehousing Review. "An Introduction to Metadata". Data, Data Everywhere!
<http://www.dwreview.com/Articles/Metadata.html>.
Accessed April 3, 2009.

Data Base Management System Architecture

Architecture of a DBMS should be based on the business needs of the organization. Choice of DBMS should not be made by a single individual or group, but by a team consisting of business experts and IT experts. Business experts should include representatives from various levels of the firm. The firm needs to ensure that the DBMS selected is appropriate for the nature of the buesiness and intended type of processing. There are essentially four levels of DBMS architecture that can be chosen: enterprise, departmental, personal, and mobile.

Enterprise DBMS are designed for scalability and high performance. They must be capable of supporting very large databases, a large number of concurrent users, and multiple types of applications. Enterprise DBMS run on a large-scale machine, typically a mainframe or a high-end Unix, Linux, or Windows NT machine. In addition, enterprise DBMS offer all of the bells and whistles available from the DBMS vendor. Multi-processor support, support for parallel queries, clustering, and other advanced DBMS features should be included. Examples include Microsoft SQL Server 2005 and Oracle Database 11G.

Departmental DBMS support small to medium sized workgroups within an organization, and typically run on a Unix, Linux, or Windows 2000 (or NT) servers. Hardware and software upgrades often can allow a departmental DBMS to tackle tasks that previously could only be performed by an enterprise DBMS.

Personal DBMS are designed to be used by single users, usually on a low to medium-powered PC. Lotus Approach, Microsoft Access and dBase are examples of personal database software for personal use and small-scale information dissemination and retrieval.

Mobile DBMS (MDS) are specialized versions of a departmental or enterprise DBMS. They are designed to be used by remote users not connected to their networks. Mobile DBMS enable local database access and modification on a laptop or handheld device, such as a Blackberry, PDA or PocketPC. Mobile DBMS provide a mechanism for synchronizing remote database changes to a centralized, enterprise or DB server. MDS allow information dissemination via wireless channels.

Sources:
Mullins, Craig. "DBMS Architecture". DBMS Corner.
<http://www.craigsmullins.com/dbta_010.htm>.
Accessed April 3, 2009.

Hellerstein, Joseph. "Motivation and DBMS Architecture Overview". Readings in DB Systems.
<http://redbook.cs.berkeley.edu/redbook3/lec1.html>.
Accessed April 3, 2009.

Advantages of Using DBMS in Business and other Organizations

Implementing DMBS in business and other types of organizations carries with it many advantages. The flexbility of quality DBMS packages allows easy software updgrades to changing business and user needs. Security of information can easily be tailored to different users. Private data such as salary information or account details can be made only accessible to the appropriate parties. The DBMS can also log by whom an alteration of data was made, as well as the date and time of the edit. Data access can be restricted via database "views" that filter out sensitive data. Data redundancy can be significantly reduced by keeping a single copy of specific information, by doing so, consistency of data is improved.

A DBMS' centralized system allows for easier data control, deployment, access and restriction. In addition, it makes it easier to enforce standards for data creation, naming, modification and removal, thereby also improving the quality and consistency of the data itself. Information altered in the central database is replicated seemlessly throughout all applications supported by the database.

Businesses and organizations can benefit from the aforementioned advantages in several ways. Time is money and DBMS save a lot of it. Jobs previously reserved for data maintenance and updates have almost been eliminated, or at least the nature of it has been dramatically changed. As such, organizations and companies are able to compete with industry leaders or even maintain a competitive advantage over their peers. The security of information is a prevalent issue these days, and effective DBMS can ensure its access is limited only to those concerned. The database administrator (DBA) is responsible for the for the development and management of databases within an organization. He or she works with management and programmers as well as users to establish and enforce policies for the management of information and security. A good DBA is essential to justifying the expense of implementing database technology within a firm.

Sources:
Jessup, Valadich & Wade. (2006). Information Systems Today: Why IS Matters.
Toronto: Pearson—Prentice Hall.

DBMS.ca. “Data Security”. A Guide to Database Management Systems. <http://dbms.ca/concepts/security.html>.
Accessed April 3, 2009.

Introducing Data Base Management Systems (DBMS): History and Data Storage Evolution

Data base management systems (DBMSs) are software packages designed to allow the entry, creation, storage, organization, querying and retrieval of data from databases. Databases collect related data or in a way that facilitates the search for information. In this modern information age, database technology has been essential to organizational productivity and at the heart of competitive advantage. Databases are relied-upon to perform and integrate several different functions: anywhere from aggregating data, tracking sales, determining compensation, prospecting opportunities, paying bills, managing customers, developing models, processing transactions, fulfilling purchases orders, etc. Good DBMSs render the manipulation and analysis of data and information easy—directly contributing to business productivity and efficiency. Conversely, poorly-conceived, built, populated and/or managed databases can be detrimental to a company and result in higher costs, inadequate customer service and/or loss of any competitive advantage, among other significant issues facing businesses today.

In a DBMS, an individual database is the collection of related attributes about entities. An entity is something of particular interest for which data is collected. Entities are represented by tables with rows and columns wherein specific data is stored in individual cells. Rows, read left to right, are referred to as records, while the columns constitute the attributes or fields. The DBMS can distinguish between records of an entity by a unique identifier called a primary key. Attributes not used as the primary key are often called secondary keys. Associations between entities can be made by implementing the primary key from one entity into the table of another entity, as such this attribute is referred to as the foreign key.

Prior to the use of DBMSs in the 1960’s, the file processing approach was used to store and manipulate data electronically. Data had widely been stored in long, sequential computer files on magnetic tape. Disk storage was expensive in the early days of computing. Data was read from flat files with application systems running in batch mode. One of the biggest issues then was the redundancy of data. Similar information often was stored in several different places or files for different reasons. Any edits made to an address for example, had to be repeated in every location the address appeared, thus creating a time-consuming, redundant job prone to human error. With falling cost of disk storage, specialized DMBS packages were first developed throughout the 1960’s for the sole purpose of managing data. Improvements ever since have focused on interface, navigation, flexibility, efficiency and usability.

Sources:
Jessup, Valadich & Wade. (2006). Information Systems Today: Why IS Matters.
Toronto: Pearson—Prentice Hall.

Ramakrishnan, R., Gehrke, J. “Database Management Systems”. Database Management Systems.
<http://pages.cs.wisc.edu/~dbbook/openAccess/thirdEdition/slides/slides3ed.html>.
Accessed April 3, 2009.

DBMS.ca. “Evolution of DBMS”. A Guide to Database Management Systems.
<http://dbms.ca/concepts/evolution.html>.
Accessed April 3, 2009.