Chapter 10. What data formats do we have?

Nikolay (unDEFER) Krivchenkov

2009-07-26

In this article we will ask the question: "What data formats do we have?". For what purpose it were designed? Also for what purpose it used?

The answer can seem strange, but overwhelming majority of data formats well approach only for storage and data transmission, for archives. Actually we have no any widely known format for editing.

I will explain what I mean. For example we take a plain text format. To insert a string into the file middle, it is necessary to rewrite at first the old data in a proper place, and then to copy all file up to the end. File systems do not suit for insert operation into a file.

Any compressed file automatically is not suitable for editing. Even its small change leads to necessity full recompress an initial file.

ODF Format (OpenOffice.org documents format) is not only zip archive, plus it even bad for simple big documents view. Simply to get cursor position in the document it is necessary to read it completely. The cursor is stored in the document body as special sequence of symbols, like <cursor>.

Actually good formats for editing exists. First of all it is structure used in the RAM for documents processing. Besides this many applications designed for work with big documents create such files cache on a disk. File systems, files of databases - all this is examples of formats suited for editing.

However the majority of such formats widely are not discussed, are not standardised. It is also badly. Simply someone has decided that, on a disk, documents must be stored in a format suitable for exchange. And meanwhile files are seldom edited once.

Doubtless the majority of files, which come to system from out (in a format convenient for an exchange), in general are never edited. But files created by the user, or any other file which has been edited by the user once, with a high probability it will be edited many times. So there is no need always to keep it in a format suitable only for reading. User not need in questions about format, he do not want to lose part of the information because of the chosen format. Also as user do not interest formats of the structures used in operative memory of the computer, he/she hardly interest in formats which will be used that he/she could see always document history and that all information which he/she has entered into the computer, has been kept completely.

In unDE at editing will be used formats which as much as possible allow to save time of the user further. All of them will be well documented and there will be no problems to write foreign converters and other processors of these formats. The problem of a format choice will arise in unDE only at saving of a file for long storage (archieving) and at data transmission. Thus, depending on the purpose of converting, the system can offer as much as possible suitable formats for this task.

SourceForge.net Logo