ArcherPoint Microsoft Dynamics NAV Developer Digest - vol 19
In this blog post I’d like to discuss whitespace.
Not the graphic designer’s whitespace, where the lack of text and images draws attention to the designer’s message. Rather, I’m talking the white spaces (or hidden characters) in text files.
Most developers are familiar with ASCII text (ASCII stands for the American Standard Code for Information Interchange), and it’s the standard used in all text files – although not always in a standard way.
To understand some of the issues, realize that the ASCII standard had its beginnings back in the early 1960s, when computers were huge and expensive and the “latest” office tool was an electric typewriter (much better than those manual typewriters that fewer and fewer people remember actually using).
You see, back then the typewriters were limited in what they could print. For instance, to underline some text, you would type the text, then backspace and overprint the text with the underscore character. Then there was a lever on the side that would let you skip down a line and continue typing wherever the typewriter head was at on the page. To begin a new line, you would push the lever all the way back, a bell would ring, indicating that the carriage had returned to the starting position so you could start your next line of text from the left margin.
Obviously, if you wanted a space or a tab, you would just hit the space or tab key.
Electric typewriters worked the same way, and when the “new” teletypes came along, they needed a way to do everything the typewriters were doing. Hence, we have ASCII characters for spaces and tabs, but also for backspaces (BS), line feeds (LF), carriage returns (CR), bells (BEL), and several other “non-printable” characters (non-printable because you don’t “print” a space or bell).
A while back, I was developing software for the wireless telephone system. Our group was responsible for validating the caller, another group was responsible for the networking and switching, and still another group was responsible for settling the bill among the various wireless carriers that were involved in the call (if a person from New York is roaming in San Francisco, which phone company gets which part of the bill?).
Our group used UNIX systems, the billing group used Windows. Everything was working just fine until we were asked at one point to provide billing records for one of the services we provided. We set up the system to automatically send the billing group their information in the specified format at the specified time…and then everything went crazy. The billing reports were a mess and didn’t make sense. And let me tell you, when half the country’s wireless providers don’t get their billing information on time, it is not pretty.
We went over the specifications and everything matched – we looked at the data from our side and it looked good, but when they received the data from us, there was something wrong. Yet we showed we were sending and receiving just fine. So what happened?
It turns out, the problem we were having was with the way the various operating systems chose to implement the ASCII standard for text files…specifically, how they chose to implement a new line. “A new line is a new line, isn’t it?” or so we thought.
We finally sat down and looked at the binary data on both sides and saw the problem.
DOS, and subsequently Windows, chose to follow the typewriter method of issuing a carriage return followed by a line feed (“CR/LF”, or “rn” if you’re using escape characters). UNIX decided to simply issue a line feed, while early Apple Macs used only a carriage return.
This is why, if you’re on a Windows machine and you download and open a text file from a UNIX platform, you might open it up in your text editor and see one, long line of characters and spaces. The Windows machine never sees the carriage return it is expecting and just prints the characters one after the other.
Conversely, if you’re on a UNIX machine and you download a text file from a Windows platform, you get some strange characters at the end of each line, typically looking like ^M, which is the carriage return it is NOT expecting.
There are tools to handle this situation for both platforms. One of the most common is to FTP the text file to your local machine as ASCII text (make sure it’s a text file, though, and not binary data!). FTP will format the text for the native platform, so you won’t have to fix it later.
Some text editors and code writing tools will automatically interpret it correctly, or prompt you for how you want the text to be displayed.
To make a short story long, it is necessary to be aware of cross-platform issues and understand that not all text files are created equally (or, at least, the same way).