Chapter 13. BerkeleyDB as universal Data format

Nikolay (unDEFER) Krivchenkov


As stated earlier modern data formats are suitable for editing a little. To provide this ease of editing, in unDE will be widely used BerkeleyDB file database.

BerkeleyDB is a high performance database, which essentially implements a hash. Hash tables are convenient for storage of various types of information. BerkeleyDB database can reach huge sizes. And its work speed on modern computers reaches millions read/write operations per second.

That fact that the rights on BerkeleyDB have been bought by Oracle, obviously confirm seriousness of of this development.

Many at all do not suspect that BerkeleyDB is used in applications daily used by them. For example, in the rpm package manager, in subversion system the control of versions, a mysql relational database versions 3.23 - 5.1.

BerkeleyDB easily allows to implement transactions mechanism without problems. The blocking mechanism allows to lock records that also difficult to implement with usual means of an operating system.

Of course, for storing information about the document a BDB-file will often not enough. And archieving in this case would be very inappropriate.

So one of the unDE principles become "all is a directory." Unlike the principle accepted in modern systems "all is a file" it is more generalised. In early Opera versions it saved all web page files in a directory, instead of one HTML-file and a directory "<page name> + Files" as it is accepted in the Microsoft browser. Actually it is fine idea when all web page finds room in one entity (not important file it or a directory). Only that there were no browsers capable automatically to accept the whole directory as a single entity, soon and Opera began to save web pages in a single file - a web archive or a pair of file + directory.

So in unDE even for directories will probably assign the application handling it.

Among other things BerkeleyDB will allow not to reflect at all on saving of documents on a disk. The program code will work directly with database records, and libdb and OS will decide when it should be written to a disk. Thanks to mmap calls in BerkeleyDB even unrelated applications by operation with one document will not load memory with copies of the same data, and will share memory pages for reading.

So active usage BerkeleyDB will allow to considerably simplify development unDE and to instill to it many useful properties. Logo