Read and write binary file in vbscript recalls
The stream can be saved to a file and then closed. However, if you try to open the newly-created zip file, Windows will politely inform you that the file is corrupt. When you examine the newly-created file in your trusty hex editor, you will soon see why.
The ADO Stream object inserts two unwanted characters at the beginning of the file before we begin writing our characters. Since the ADO Stream object is the only thing we have available capable of reading binary input, we have to go back to it to fix the problem.
This time we open a new stream but we open it in binary mode. Reading a stream in binary mode will read the file contents into a true binary array. At least we have the binary data to work with.
The ADO Stream now contains the binary contents of our zip file. This piece of code is where the magic happens.
The stream object is kind of similar to a recordset. We can read it one character at a type from beginning to end, and a marker keeps track of where we are. So by manually setting that marker position to 2, we can avoid reading the two unwanted characters at the beginning of our binary stream. Once the binary data has been read, we can set the cursor position back to 0, or the beginning of the stream. This effectively empties the contents of the stream object. Now we can write the cleaned-up binary back to the file.
Stream method of creating binary files for quite some time before stumbling across this method. The FileSystemObject is not designed to work with binary files. OpenTextFile strPath, 2, True. You can use any of the other modes too, simply add a 'b': Closing a binary file is no different to a text file, simply call the close method of the open file object: Before we discuss how to access the data within a binary file we need to consider how data is represented and stored on a computer.
All data is stored as a sequence of bi nary digi t s, or bits. These bits are grouped into sets of 8 or 16 called bytes or words respectively. A group of 4 is sometimes called a nibble!
A byte can be any one of different bit patterns and these are given the values The information we manipulate in our programs, strings, numbers etc must all be converted into sequences of bytes. Thus the characters that we use in strings are each allocated a particular byte pattern. A new encoding standard known as Unicode has been produced, which can use data words instead of bytes to represent characters, and allows for over characters.
A more recent increase in spec has raised that to over a million! These characters can then be encoded into a more compact data stream. Unicode provides a number of different encodings each of which defines which bytes represent each Unicode numerical value or code point in Unicode terms.
If you are thinking that this is complicated you are right! It is the cost of building a global computer network that must work in lots of different languages.
The good news if you are an English speaker is that for the most part you can ignore it! The exception is when reading data from a binary file, when you do need to know which encoding has been used to interpret the binary data successfully. Python fully supports Unicode text and we can specify which particular encoding we want to apply by inserting a special comment at the top of a source file.
A string of encoded characters is considered to be a byte string and has the type bytes whereas a string of unencoded text has the type str. The default encoding is usually UTF-8 but, in theory at least, could be different! I will not be covering the use of non UTF-8 encodings in this tutorial but there is an extensive "How-To" document on the Python web site. The key thing to realize in all of this is that a binary stream of encoded unicode text is treated as a string of bytes and Python provides functions to convert between bytes and str values.
In the same way numbers need to be converted to binary codings too. For small integers it is simple enough to use the byte values directly, but for numbers larger than or negative numbers, or fractions some additional work needs to be done. Over time various standard codings have emerged for numerical data and most programming languages and operating systems use these. The point of all of this is that when we read a binary file we have to interpret the raw bit patterns into the correct type of data for our program.
It is perfectly possible to interpret a stream of bytes that were originally written as a character string as a set of floating point numbers. Or course the original meaning will have been lost but the bit patterns could represent either. So when we read binary data it is extremely important that we convert it into the correct data type. We provide a string representing the data we are reading and apply it to the byte stream that we are trying to interpret. We can also use struct to convert a set of data to a byte stream for writing, either to a binary file or even a communications line!
There are many different conversion format codes but we will only use the integer and string codes here. You can look up the others on the Python documentation for the struct module. The codes for integer and string are i , and s respectively. The struct format strings consist of sequences of codes with numbers pre-pended to indicate how many of the items we need. The exception is the s code where the prepended number means the length of the string. For example 4s means a string of four characters note 4 characters not 4 strings!
Let's assume we wanted to write the address details, from our Address Book program above, as binary data with the street number as an integer and the rest as a string This is a bad idea in practice since street "numbers" sometimes include letters!
The format string would look like: To cope with multiple address lengths we could write a function to create the binary string like this: The length of that string is the number we need in the struct format string so we use the len function in conjunction with a normal format string to build a struct format string.
Now that we have our binary data let's see how we can write that to a binary file and then read it back again. We need to open the file for writing in 'wb' mode, encode the data, write it to the file and then close the file. The characters will be readable but the number will not look like 10! In fact it has disappeared! If you have an editor which can read binary files e. The first of these may look like a newline character and the rest are zeros. Now it turns out that, just coincidentally, the numerical value of newline is 10!
As we can show using Python: So the first 4 bytes are 10,0,0,0 in decimal or 0xA,0x0,0x0,0x0 in hexadecimal, the system usually used to display binary data - since it is much more concise than using pure binary. On a 32 bit computer an integer takes up 4 bytes. So the integer value '10' has been converted by the struct module into the 4 byte sequence 10, 0, 0, 0. Now on intel micro-processors the byte sequence is to put the least significant byte first so that, reading it in reverse, gives us the true "binary" value: Which is the integer value 10 expressed as 4 decimal bytes.
The rest of the data is basically the original text string and so appears in its normal character format. Be sure not to save the file from within Notepad since although Notepad can load some binary files it cannot save them as binary, it will try to convert the binary to text and can corrupt the data in the process! It is worth pointing out here that the file extension. Some Operating Systems use the extension to determine what programme they will use to open the file, but you can change the extension by simply renaming the file, the content will not change it will still be binary or text whichever it was originally.
You can prove this by renaming a text file in Windows to. If you now rename it back to. To read our binary data back again we need to open the file in 'rb' mode, read the data into a sequence of bytes, close the file and finally unpack the data using a struct format string.
In general we would need to find the binary format from the file definition there are several web sites which provide this information - for example Adobe publish the definition of their common PDF binary format.
In our case we know it must be like the one we created in formatAddress , namely 'iNs' where N is a variable number.
How do we determine the value of N? The struct module provides some helper functions that return the size of each data type, so by firing up the Python prompt and experimenting we can find out how many bytes of data we will get back for each data type: So N will be the total length of the data minus 4. Let's try using that to read our file: We had to convert rest to a string using the str function since Python considered it to be of type bytes see the sidebar above which won't work with join.
And that's it on binary data files, or at least as much as I'm going to say on the subject. As you can see using binary data introduces several complications and unless you have a very good reason I don't recommend it. But at least if you do need to read a binary file, you can do it provided you know what the data represented in the first place of course! Random Access to Files The last aspect of file handling that I'll consider is called random access. Random access means moving directly to a particular part of the file without reading all the intervening data.
Some programming languages provide a special indexed file type that can do this very quickly but in most languages its built on top of the normal sequential file access that we have been using up till now.
There are three solutions: This tells Python to ignore any back slashes and treat it as a "raw" sting. This has the added advantage of making your code portable to other operating systems too. Thus any of the following will open our data file correctly: One way to do that would be to open the file for input, read the data into a list, append the data to the list and then write the whole list out to a new version of the old file.
If the file is short that's not a problem but if the file is very large, maybe over Mb, then you could run out of memory to hold the list.
Fortunately there's another mode "a" that we can pass to open which allows us to append directly to an existing file just by writing. Even better, if the file doesn't exist it will open a new file just as if you'd specified "w". As an example, let's assume we have a log file that we use for capturing error messages. We don't want to delete the existing messages so we choose to append the error, like this: A common technique is to create a filename based on the date, thus when the date changes we automatically create a new file and it is easy for the maintainers of the system to find the errors for a particular day and to archive away old error files if they are not needed.
Remember, from the menu example above, that the time module can be used to find out the current date. Python v3 has introduced a new, more convenient, way of working with files, particularly when iterating over their contents. This uses a new construct known as with. It looks like this: With guarantees to close the file at the end of the with statement. This construct makes file handling a little bit more reliable and is the recommended way of opening files in Python v3. The Address Book Revisited You remember the address book program we introduced during the Raw Materials topic and then expanded in the Talking to the User topic?
Let's start to make it really useful by saving it to a file and, of course, reading the file at startup. We'll do this by writing some functions. So in this example we pull together several of the strands that we've covered in the last few topics.
The basic design will require a function to read the file at startup, another to write the file at the end of the program. We will also create a function to present the user with a menu of options and a separate function for each menu selection. The menu will allow the user to: We import the os module which we use to check that the file path actually exists before opening the file.
We defined the filename as a module level variable so we can use it both in loading and saving the data. We use rstrip to remove the new-line character from the end of the line.
Also notice the next funtion to fetch the next line from the file within the loop. This effectively means we are reading two lines at a time as we progress through the loop.
Saving the Address Book def saveBook book: Also note that we write two lines for each entry, this mirrors the fact that we processed two lines when reading the file. Getting User Input def getChoice menu, length: We receive a length parameter which tells us how many menu entries there are. This allows us to create a prompt that specifies the correct number range.
Adding an Entry def addEntry book: So the main program will look like this: Now if you type all that code into a new text file and save it as addressbook. There are a couple of things we can do to improve it which I'll cover in the next section, but even as it stands it's a reasonably useful little tool. This is a security feature to ensure nobody can read your files when you innocently load a web page, but it does restrict their general usefulness.
There are also some helper objects, most notable of which is, for our purposes, the TextStream object. Basically we will create an instance of the FSO object, then use it to create our TextFile objects and from these in turn create TextStream objects to which we can read or write text. Type the following code into a file called testFiles.
Echo "What file name? Echo line ; outFile. WriteLine line Wend ' close both files inFile. Opening and Closing Binary Files The key difference between text files and binary files is that text files are composed of octets , or bytes, of binary data whereby each byte represents a character and the end of the file is marked by a special byte pattern, known generically as end of file , or eof.
A binary file contains arbitrary binary data and thus no specific value can be used to identify end of file, thus a different mode of operation is required to read these files.
The end result of this is that when we open a binary file in Python or indeed any other language we must specify that it is being opened in binary mode or risk the data being read being truncated at the first eof character that Python finds in the data. The way we do this in Python is to add a 'b' to the mode parameter, like this: You can use any of the other modes too, simply add a 'b': Closing a binary file is no different to a text file, simply call the close method of the open file object: Before we discuss how to access the data within a binary file we need to consider how data is represented and stored on a computer.
All data is stored as a sequence of bi nary digi t s, or bits. These bits are grouped into sets of 8 or 16 called bytes or words respectively. A group of 4 is sometimes called a nibble! A byte can be any one of different bit patterns and these are given the values The information we manipulate in our programs, strings, numbers etc must all be converted into sequences of bytes.
Thus the characters that we use in strings are each allocated a particular byte pattern. A new encoding standard known as Unicode has been produced, which can use data words instead of bytes to represent characters, and allows for over characters. A more recent increase in spec has raised that to over a million!
These characters can then be encoded into a more compact data stream. Unicode provides a number of different encodings each of which defines which bytes represent each Unicode numerical value or code point in Unicode terms.
If you are thinking that this is complicated you are right! It is the cost of building a global computer network that must work in lots of different languages. The good news if you are an English speaker is that for the most part you can ignore it! The exception is when reading data from a binary file, when you do need to know which encoding has been used to interpret the binary data successfully.
Python fully supports Unicode text and we can specify which particular encoding we want to apply by inserting a special comment at the top of a source file. A string of encoded characters is considered to be a byte string and has the type bytes whereas a string of unencoded text has the type str.
The default encoding is usually UTF-8 but, in theory at least, could be different! I will not be covering the use of non UTF-8 encodings in this tutorial but there is an extensive "How-To" document on the Python web site.