Programming part IV: A programmer does databases.
In the most recent installment of this series about becoming a professional programmer, I suggested that one of the hallmarks of professionalism is the programmer’s toolkit–essentially the knowledge and experience that make it possible to do the job easier, faster, and better. The toolkit contains some obvious things like knowledge of one or more programming language and familiarity with at least one software development framework (such as Java Foundation Classes). This article is really about another item that belongs in the toolkit–knowledge of data management.
Think for a moment about the applications you’ve used in business, not counting the familiar office suite programs such as word processing and spreadsheets. What were most of these programs about? Business information, right? Information like sales figures, customer lists, and inventory. Where did most of the information in these programs come from or go to? A database. I’m leading you through this rhetorical exercise because I’m often surprised by programmers who don’t understand what they are programming.
Consider what an accounting program contains, such as accounts receivable: There are lists of clients or customers and their pertinent information (names, addresses), lists of what they bought or services they used, information about each individual transaction (quantity, prices). The program then calculates totals and balances, and finally updates lists of accounts. What is all this? Manipulation of database information, of course, and the same can be said for at least 80 percent of all programs written for business.
Everybody knows that programmers need to learn a programming language; what’s not so widely understood is that all programmers will need to work with databases at some point in their careers. The touch of the database might be light, maybe just the knowledge of how to connect to a database server and ask it for some data; but more often than not in business software, programmers will be up to their keyboards in data manipulation.
This is not a casually acquired skill. There’s a lot of database theory and jargon that has to be truly understood: It’s hard to figure out how to use a child-parent relationship if you don’t understand what that means.
It’s not that every programmer must acquire the skills of a database administrator. These folks are specialists. They’re hired because they have been trained in how to optimize and troubleshoot a particular database management system. Programmers seldom need that kind of in-depth knowledge. But they often get called on to incorporate data from a database into their programs, and to do that, they need to understand quite a lot about the underlying database and the methods of retrieving data.
There’s a SQL to this story
Perhaps 30 percent of the world’s data is in a mainframe computer database, another 3 percent in an object-oriented database, the rest of it–about two-thirds–is in a relational database system. Relational rules. Consequently, if you’re a programmer who goes anywhere near business applications (or business programming of any kind), you’re going to encounter relational database management systems (RDBMS), and you need to know something about the Structured Query Language (SQL), which is the lingua franca of relational systems.
SQL (usually pronounced sequel) is not a programming language. It’s a database query language that over the years has been augmented and tweaked to do a variety of data management jobs. On the other hand, HTML isn’t really a programming language either, but for all practical purposes it’s treated like one. So it is with SQL. What’s important for the prospective programmer is that knowing SQL, at least in its rudiments, is even more useful than knowing HTML. Where HTML is a required skill for specific Web-based programming work, SQL is a required skill just about everywhere, including the Web.
One clue about SQL: You can buy books with nothing but SQL puzzles to solve, like crossword puzzles. SQL is a good case of a relatively small number of innocuous-looking command elements (such as SELECT, JOIN, WHERE, GROUP BY) that can be combined in a vast number of ways, and which produce results that can be wildly different. That is to say, you learn SQL, then you find out how to make it work. SQL statements that look great on paper may get you totally erroneous data. (You also have to know enough about what’s in the data to understand when you’ve screwed up.)
One good thing about SQL that’s in your favor: It’s relatively standardized. While there are significant differences between the implementation of SQL in IBM DB2, Oracle 9, and Microsoft SQL Server 2000, it won’t take you very long to learn any of them if you know SQL well. In any case, learning and practicing with SQL is a prerequisite for a lot of programming.
The decline of database programming
Despite what I’ve said about the importance to business applications of data and database management, we’ve seen a sharp decline in the database orientation of software development systems over the past five years. From the days when client/server systems were ascendant (most of them were database-oriented) and matched by mini and mainframe systems (which have always been database-oriented) we’ve gone into a period in which applications and databases have become a matter of data connections rather than data management. By that I mean we see much less emphasis on manipulating data within an application, and a lot more emphasis on having the database server do the data work and pass it on to the application.
What does this mean to the programmer? It generally means that instead of working in a development environment that’s heavily oriented toward data management (such as Visual FoxPro, PowerBuilder, or Clarion), you’ll be working indirectly with data through various kinds of connections (ODBC, JDBC). In these schemes, most database management will consist of passing the appropriate SQL statement. This approach works most of the time, although it is often clumsy. There are other times when you do not have enough control over the data, unless there is some “local” ability (usually on the part of the client) to manipulate the data.
Of course, if you’ve never worked with a database management programming language, you probably won’t know what you’re missing. Some of us old-timers remember, though. Our hope is that one day, when most of the problems of building applications for the Web have been solved, we may get back to development environments that provide more tools to handle the core of applications-the data.
Why not object-oriented databases?
Another area where databases and programming don’t seem to follow a very natural path is object orientation. Consistency would seem to favor the idea that object-oriented programming should be matched by object-oriented database systems. This is, in fact, an argument made by the hardy band of object-oriented database vendors.
Just like the OOP languages, an object-oriented database management system (OODBMS) structures data in terms of objects. If you’re programming a slick user interface to represent a telephone (buttons, display, volume dial, etc.), you can use an OODBMS to create a telephone data object containing all the information about the telephone that your program might need. It could also reference other data objects that contain data relevant for the telephone operations, such as various ringing sounds.
Object-oriented databases are very good at storing and retrieving multimedia (sounds, images, animation, etc.) because they don’t need the awkward overhead of rows, columns, and table relations used in relational systems. Relational systems often refer to multimedia data as blobs (binary large objects). Object-oriented databases are better able to store information about multimedia data, which makes it possible to be more flexible. It would seem that an OODBMS would be a perfect fit for Web applications, which often use a great deal of multimedia data.
Problem is, who said everything about programming and database management is consistent? Familiarity is often far more persuasive, as is pure inertia. By that I mean, relational database systems do the job, patched up and hung on the side of OOP as they are. The investment in relational systems is huge, and therefore, so is the resistance to change to object orientation.
Unless a particular project or application is started from scratch and can use a database management system that’s consistent with the programming model, you won’t get the opportunity to do that kind of programming. Of course, if you’re an opportunity seeker (or a risk taker) you might want to find out more about various OODBMSs such as those offered by Objectivity Inc., POET Software Corp., Versant Corp., and eXcelon Corp. Most of these systems use a combination of Java or C++ with their own embedded database commands doing the programming. They’re not easy to learn because you must master both the programming environment and the database system, but this could be highly prized skill-in some fairly narrow circles.
Can you lift that toolkit?
It is possible to make a career out of programming for and with database management systems. Far more programmers did this years ago; now it’s relatively rare. But even now, the ability to understand data–how it’s structured, what a data management system does, how to manipulate data, and how to use it in programming–is very close to being a requirement for a programmer. It’s certainly something that belongs in the bulging toolkit of a professional programmer.
XML as a database language
Extensible Markup Language (XML) continues the tradition of data handling, being something necessary but different. For those of you who are new to programming and considering it as a profession, XML is one of the newest languages, to use the term loosely. Like HTML, which XML is intended to supplement, XML is not a complete programming language. But in many respects, it’s used like one, and from the perspective of learning, it might as well be a language.
XML exists because HTML has minimal (some say nearly nonexistent) data management and display capability-only the first of many deficiencies. Yet as the Web became more and more the place to transact business and display dynamic information, the demand for database access and data display became overwhelming. A lot of companies tackled this problem by creating additional proprietary tags for HTML that, when passed through their own application servers, connected to databases and managed the flow of data. This approach works, but many felt the lack of standards in this area would be detrimental to the development of Web applications.
To make a longish story short, the perceived problems with HTML led to the creation of XML by the Web standards body known as the World Wide Web Consortium (W3C). It’s taken all of about two years for XML to go from a fledgling standard to the biggest wave in software development since Java. The core of XML is not too complicated. It’s a markup language that tells a Web server or browser how to handle data both in terms of content and display. It’s also a specification that spells out for a given document how the data should be interpreted. You could learn this aspect of XML very quickly.
But there’s more. XML has spawned (and continues to spawn) a number of related protocols such as: XLink and Xpointer to express relationships between documents; the Extensible Style Language (XSL) to specify how XML data can be displayed in different style sheets; Namespaces protocol to prevent conflicts between named elements of XML data; XML schemas to help developers define their own XML formats; and the Document Object Model (DOM) to specify how to manipulate an XML document with a programming language. Most of these additions are in response to all the tasks that XML is being called on to perform. These include supporting transfer data, formatting data in Web pages, supplying data to applications, and formatting documents.
It is already possible to make a career programming in XML, although like SQL and database management in general, it will not usually be an end in itself. Most programmers will need to understand what it does and how to use it-yet another major item for the programmer’s toolkit.