Why in Microsoft SQL Server log file is more important than data file
© All rights reserved.

The log file in database servers, aka transaction logs, is the object of the most "critical" misconceptions of many DBMS administrators.

Both in terms of data "availability" and performance.

Let's find out why...

Typically, log files or log files are used by applications to store information about events, which can later be used by administrators to analyze problem situations.

These can be various error messages or, on the contrary, reports about successfully completed operations.

In some cases, various utilities analyze the log and, for example, show statistics about the work of the application "in the past".

 

But in the case of SQL Server and some other databases, the transaction log is more than just a report of events.

The log file and the sql server transaction log readers are the cornerstone in the overall system of maintaining data integrity and ensuring good performance of data modification operations in the databases

To begin with, the data in the database files is stored in 8KB pages.

That is, the entire data file is just a set of pages - data, indexes, various services.

Page (data, index or service) is always read and written as a whole - to control the integrity of the page, directly it stores service information - the checksum or bits of page rupture control. Therefore, even if you need to read or write a value of one field on one line, you should read or update the whole page. In addition, this structure allows to optimize disk operations and memory usage - one page always contains data related to one table or index and in most cases reading 8192 bytes instead of several tens is just a preemptive read.

Now consider the issue of data integrity.

In order to ensure that when data is changed by a query, these changes would be stored "forever", as required by the principles of transactional integrity (ACID), any changes must be stored on a long-term physical media even before the initiator of the request receives confirmation of the request (the end of the transaction).

That is, write caching is unacceptable - otherwise, if the system fails, we'll get at least a database with no changes made, and most likely - with uncoordinated internal structures at all.

Of course, the situation where you need to change only a small percentage of data on a page occurs much more often, and if you always had to save the entire page, the "paginated" organization of the record would suffer a lot waiting to save 8Kb, when you actually need to change only a few bytes - completely unnecessary load on the disk system.

 

So, on the one hand paginated organization allows to read data efficiently, on the other hand it reduces the productivity of change operations.

How is this dilemma solved in Microsoft SQL Server?

Put simply, directly changing data is done by the following algorithm (I must say that for the "brother" Sybase is very similar, and for other DBMS used similar change protocols):

1. a transaction is opened

2. If the data page is not in memory, it is read from the disk (cached)

3. Execution of the query - changing the necessary data on pages

4. information about the performed changes and how the modified data "looked" before the change is stored in the transaction log

5. Request completed, transaction completed - client receives confirmation of successful execution.

Step 4 (logging) is performed with special "flags" that tell both the OS and the disk system to write to the medium in synchronous mode, that is, the disk response for the disk request will be received after the information is saved to the physical disk.

You can read more abour sql server transaction log here

 

So, in this algorithm there is saving information about changes into log file, but there is no saving of changed pages to disk. Direct saving of changed pages to disk is done in background mode by system processes CHECKPOINT and LAZY WRITER.

In the event of a failure during operation execution or transaction rollback using the ROLLBACK command, the server "restores" the original versions of the data pages from the changes saved in the transaction log. Also, when the server starts, it "runs" the databases by analyzing their transaction logs to modify the data in the completed transactions, if the data pages themselves have not been modified or rollback the transactions that had not been completed by the time the server was stopped previously. "Traces" of this process (Recovery) can be seen in the "normal" error log of the server itself, which is a simple text file. (The messages from this log are also duplicated in the Windows Application log).


 

So, what conclusions can be drawn for the database administrators wishing to "properly" configure their server for fast and reliable operation:

1. Reliable media is required for placing the sql server transaction log files.

It is absolutely unacceptable to locate the log file on a single disk and even less so on a RAID0 array. Failure of the disk on which the database log file is located, leads at least to its transfer to the state of "suspect" - inaccessible. And in many cases of such failures, can only restore the database with the last remaining backup.

2. Fast carriers are required for placing transaction log files.

The speed of executing a query for modifying the data depends mostly on the speed of the disk where the log file is located. Since the data is written in the background, most systems will have lower performance requirements for the disk serving the data files. The current requirements are as follows: disk query time for log files should (roughly) fall within 5-10 milliseconds, and query time for data files - within 10-20 milliseconds. Different sources give different specific values of this indicator, but in any case for the log file is very critical for fast writing.

3. if the database recovery mode is set to FULL and set up regular backups of both the database and log copies, then the database can be restored even if the data file itself is lost - you only need to have the last full copy and all subsequent log copies. The last transaction log entries can also be saved, as long as the log file itself is intact.


 

Author(s): Alex Shapovalov
Published at: 19 Feb 2021 10:37 GMT
Original link (login required): https://ilde.upf.edu/pg/lds/view/211567/