Re: [boost] [log] Patch: rotating file collector

10 May 2013

      On Mon, May 6, 2013 at 4:09 PM, Alexander Arhipenko <arhipjan@gmail.com>wrote:
...
Hi Andrey,
Regarding points 5 and 6: after analyzing this case,
I've dropped support for datetime pattern in rollover collector.
You've also mentioned an option of using datetime placeholder in file sink:
...
... Maybe it's better to avoid
dealing with date/time placeholders in the collector at all and only
work with
file counters. But scan_for_files should be changed so that it is
possible to
set a pattern with date/time placeholders in the sink backend.
I liked that option, but existing implementation should be enhanced to
support this: currently during rollover phase files are simply renamed.
I.e., fileN -> fileN+1, fileN-1 -> fileN, ..., file -> file1.
Thus, datetime placeholder will be invalid after rollover -
it will point to the wrong date.
If you think this option is very nice to have - please tell.
Why, the date will stay relevant even after the collector renames the file
(assuming it doesn't change the date). Say, you have these files:

file-2013-05-01.3.log
file-2013-05-03.2.log
file-2013-05-05.1.log

and you rotate file-2013-05-07.log. After rotation you should have:

file-2013-05-01.4.log
file-2013-05-03.3.log
file-2013-05-05.2.log
file-2013-05-07.1.log

So the dates in the file names still indicate the beginning of the time
frame of the files.

I admit, however, that it may look strange to use both dates and file
counters with rollover collector. Dates are useful for lookup purposes and
it shouldn't be difficult to support them. On the other hand the current
collector already supports them. I have no strong opinion on whether to
support dates in rollover collector or not, I'll leave it to you.

In regards to a. - collectors repository.
...
I agree that only one file_collector should serve one target directory.
Regarding rollover collectors - they are different beasts.
No, they are not, actually. Let's see why.
...
Agree that Boost.Log should not detect any "external" file deletions or
additions. However, I don't see any problems of having 2 _rollover_
collectors targeting one directory.
Each of them will monitor own file series and manipulates (add, remove)
only on that series.
The only 1 point of "race condition" we can imagine -
that is minimum free space requirements.
However, this is still not a problem: even if both collectors perform
rotation at one point in time and minimum free space requirement is not
met,
each collector will just remove its file(s),
freeing a little bit more of free space.
Nope, that case is still broken. Suppose you have two sinks that have two
rollover collectors with the common target directory. Disregarding thread
contention, the library will feed records to sinks sequentially, always in
the same order. When the space limit for the target directory is reached,
only one of the collectors will typically invoke old files deletion, and
the other collector will typically not.

You see, storage limits are set for the target storage and not for the sink
or set of files. While maintaining these limits the collector should
typically ensure that the files are being deleted in a fair manner. Always
deleting the oldest file (regardless of the sink that file came from)
maintains this behavior, assuming that all sinks rotate files at a more or
less similar rate. This approach also works with rollover collector, IMHO,
and that is why I think only one collector should be managing any given
storage.

I think I can see why you want to make the collectors separate - this would
simplify managing the queues of files from different sinks. You can
implement multiple queues of files within a single collector, which will
still exclusively manage its target directory. You can also implement
additional limits for each file queue, if you like, but then you'll have to
come up with some rules that will choose the files to delete in a fair
manner, and this is also much easier to implement within a single collector.

Also, some words regarding Linux example.
...
On my Ubuntu 12.04 there are both files that reside in separate
directories:
/var/log/samba, /var/log/mongodb, /var/log/nginx
as well as in root log directory:
/var/log/syslog.log*, /var/log/dpkg.log*, /var/log/auth.log*.
That means that having files from different services in one directory
is existing practice. We need to give user an ability to cover this cases
in Boost.Log library.
Yes, some services allow that, although I suspect most if not all of the
files in /var/log are written by syslog daemon. My point was that storing
files in service-specific directories is a fairly common practice already.
...
However, I still think that having active log file with persistent name
will be a good benefit for many cases
(one of the examples - /var/log/*.log* files mentioned above).
After some thinking, I've came to conclusion that when
* file sink does not have placeholders (i.e., log file name is static),
* active log resides in collection directory (this point is not mandatory)
- then we can safely do not rotate such file during sink destruction.
So, I propose to implement the behavior described above, i.e.:
log file is not rotated during sink destruction when it's filename is
static.
I don't think it is safe to implicitly rely on configuration in this case.
After all, directories and file name patterns may change after application
restart. And such subtle change in behavior without being explicitly
requested by user also doesn't look like a good idea.

I don't really like this, but we could add an option to the sink (a flag,
rotate_on_destroy), which by default would be true to maintain the current
behavior. If the user knows what he's doing, he would be able to disable
rotation on destructor.

Re: [boost] [log] Patch: rotating file collector

Andrey Semashev