Latest News

Unicode Data

Can your ability to communicate effectively depend on which character set you use on your computer?

The power of data in Customer Relationship Management (CRM), ERP, and data warehouses to deepen relationships within a market is proven; however, you can also benefit from the technology’s power to broaden the markets themselves. Technology has opened the global marketplace to your company.

To penetrate these new markets, a business must be able to draw on accurate, relevant, and real-time information that transcends the barriers of language, culture, and geography. The challenges of maintaining quality in global data are formidable. First, consider just the issues that are likely to exist in any data set:

Mixed business and customer data Disparate data in multiple formats Duplicate records across files and systems Data in free-form text fields Incomplete, inconsistent and misfielded data elements

Global business compounds these data challenges. Language variations are an obvious issue; however, a greater challenge lies in character set variations?particularly in Asia/Pacific and EMEA markets. Generally, English is rendered in the Latin character set, called a single-byte character set (SBCS). More complex scripts like Japanese, Chinese and Korean are double-byte character sets (DBCS). Applications that understand only single-byte characters cannot process DBCSs, omitting, distorting or corrupting it when they encounter it (not unlike the way some word processing programs omit, distort or corrupt text in a non-Latin font).

Unicode provides a vender-neutral solution to double-byte data processing. The Unicode standard defines many DBCSs, including Latin, Kanji, Hebrew, Greek, Arabic and many others that describe most of the world’s major languages. Unicode enables:

Greater growth support, since Unicode-enabled systems can accept international data without significant new development investment as business becomes more global and complex. Lower IT/IS demands, since Unicode averts the common challenge of conflicting encoding systems. Higher data fidelity, since, as a single encoding, Unicode eliminates the risk of corruption that occurs whenever data is passed between different encodings or platforms.

Of course, Unicode support is by no means universal. Enabling an application for Unicode generally requires profound recoding, which can overmatch the desire and resources of software vendors. Consequently, there are a couple of levels of Unicode support. Software buyers for global companies must understand the differences in DBCS support and be aware of how they impact software processes.

Basic Unicode enablement provides data field mapping, allowing double byte characters to pass intact through processes. This type of support does not interpret characters, however, and doesn’t attempt to understand double-byte data.

More functional Unicode enablement provides context-sensitive character understanding. Fully Unicode-enabled software can interpret character meaning (equivalent, for example, to interpreting whether “read” is in present or past tense in English by the word’s context). To be effective, a data quality tool that supports Unicode must both translate double-byte characters and understand the meaning of a character in context. Fully-enabled Unicode software does the following:

Multiplies the value of data quality investments, allowing software to be deployed across multiple platforms, languages and countries without re-engineering. Enables real-time data from global subsidiaries to support operations, since worldwide data can be centrally stored and processed. Empowers companies to create and enact stronger global visions, using worldwide data across systems to empower global strategy and foster better understanding across the enterprise. Reduces knowledge latency, allowing users to access and manipulate global information from anywhere in their preferred language.

A fully functional Unicode implementation is the natural choice for global companies, for today and for the future. In business across borders, particularly in Asia/Pacific and EMEA, the ability to understand data in context can make or break businesses.

DuBois is vice president of Marketing for Trillium Software of Billerica, Mass. (

Leave a comment

seks shop - izolasyon
basic theory test book basic theory test