Re: [boost] [log] Patch: rotating file collector

1 May 2013

      Ok, I finally got to this, sorry for the delay.

On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:
...
...
I'm prepared to rename the current file::make_collector method to better
describe the current behavior (I'm open to suggestions on the new name).
The new collector should be created with a different function, also
appropriately named (file::make_stacking_collector?). It's also probably
a good idea to extract collectors and the base file::collector interface
to separate headers.
I see 2 options here.
First, have 2 functions: make_collector, make_rotating_collector.
Second, have:
a) enum rotation_method { rotate_sequential, rotate_??? }
  (enum collection_method ?)
and
b) pass additional parameter rotation_method to make_collector function
Personally I would prefer second approach.
I would rather prefer different factory functions for different collectors so 
that the parameters are independent and have the appropriate meaning for each 
collector type.

I would also prefer another name for the new collector since I'm afraid 
"rotation" term would become too overloaded since it is used not with regard 
to rotating files in the sink backend. "Stacking" collector maybe? "Rolling" 
collector?
...
...
2. The new collector should also be kept in the collectors repository.
This
way it is possible to create multiple sinks collecting files in the common
target directory, without different collectors interferring with each
other. Basically, you'll have to move file_collector_hook and
enable_shared_from_this< file_collector > base classes to
file_collector_base, as well as m_pRepository.
This introduces an additional possible issue if the user tries to create
different sinks with different collectors targeting the same directory.
This case should be detected (e.g. by checking the type of the collector
in file_collector_repository::get_collector) and handle it with an
exception (log::setup_error).
The reason I haven't added rotating collector to repository
is the "possible issue" you are describing:
several collectors targeting one directory but different files.
In current implementation,
it's impossible to have more than 1 collector for one directory.
It is okay when the collector responsibilities are:
* files copying to target directory
* old files removal (according to max size and/or min free space setup)
...but does not work for rotating collector case.
To elaborate more, consider use case:
* /var/log - is a directory where all the system log files reside
* foo.log - foo service log file
* bar.log - bar service log file
Both foo and bar log files have to be rotated, e. g.:
foo.log (latest), foo.log.1 (previous), ..., foo.log.N (earliest).
Thus, user have to setup 2 collectors and will got log::setup_error.
The easiest way to make described use case working -
do not store rotating collectors in repository.
The more sophisticated approach would be:
use additional criteria for collector identification (file_name pattern?).
Please, comment on this.
Well, the rule of the thumb the library follows is that only one process is 
managing the target directory. This means:

a. If some other process manages the storage, you just don't set up the 
collector and leave it to this process to collect the rotated log files from 
the sink. In this case we do nothing in Boost.Log to help it.

b. If Boost.Log is managing the storage, you set up the collector and mostly 
rely on the fact that noone else is messing around with the storage. This 
means that we don't try to detect sudden file deletions or additions to the 
storage, although should that happen the collector should handle the situation 
gracefully. This means that if there are two services (foo and bar) running, 
they should be collecting their log files to separate directories. This is 
fairly common in Linux, at least, since many services would just create 
directories within /var/log and put their logs there. If this is not 
acceptable, then go for case (a).

Now, in (b) you still have to implement Boost.Log so that it is possible to 
create multiple sinks collecting to the same target directory, and this is 
solved by the collectors repository. The problem of having different 
collectors targeting the same directory can be solved by simply prohibiting 
such configuration, because the desired behavior is not clear in this case. So 
if you put the new collector to the repository and add a check for its type 
before adding, you should be fine.
...
...
4. I think, it should be possible to avoid the do_rollover method.
file_collector_base::store_file should be moved to file_collector, and
there should be a separate rotating_file_collector::store_file method as
well. Any common subroutines (like checking available free space, erasing
old files, etc.) can be extracted as distinct functions into
file_collector_base and called from store_file methods of the
derivatives.
I don't see any advantages of eliminating do_rollover method
in favour of common subroutine methods (as well as vice versa ;) ).
I don't mind to implement approach you are describing,
but maybe you can provide additional arguments why do we need this change?
This is a design clarity question. The do_rollover method is specific to the 
new collector, and by exposing it to the base class you're leaking these 
specifics to the base class, which is supposed to be oblivious to it. Also, 
the method is virtual and you already have a virtual table dispatch when 
store_file is called. Not that the performance really matters in this case, 
but in general this counts as a design flaw to me.
...
...
5. You don't seem to rename the collected files when the file pattern
contains date/time (see date_rollover_policy). Is that right? Obviously,
you will only have the date and time point of the rotation and not the
file creation.
You're right that collected files are not renamed for date/time patterns.
I don't have any strong opinion about whether file name should indicate
file creation or file rotation time.
However, gut feeling inclines me to the second case (file rotation time).
I think, most, if not all systems I had experience with (including proprietary 
developments) put file creation time in the file name. And this is quite 
expected, I think. I mean, if you're looking for a file that might contain 
logs for a specific time point, you would typically assume the file name to 
indicate the beginning of the time frame and not the end of it.

Of course, we could put both time stamps into the file name, but that's 
another story.
...
On the opposite, I have following real life use case.
User logins to system and wants to monitor events for day 23.
There are 2 log files in the log directory: file.log and
file.log.2013-04-21. If filename indicates creation time, then it's hard to
guess
what file to look at: records could be found in both files.
If 2013-04-21 indicates rotation time,
then it's clear that the record for day 23 is in file.log.
It will be interesting to hear opinions from you.
If I have a login date (20013-04-23), I would look into file.log.2013-04-21 
first. Call it whatever you like, but my intuition tells me that the file name 
indicates the date of the file creation. :) Also, the last modification time 
might give you a hint on when the file ends, although it depends on your file 
manager how apparent this attribute is.

You can avoid the confusion if you name the active file with its creation time 
as well (which is what Boost.Log does, currently). So if you have 
file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which 
file contains the records you seek.
...
...
...
Also, I would like to discuss following question with you :
...
rotate_file invocation in ~text_file_backend -
I would prefer to have this call during text_file_backend construction
phase.
I'll have to take a closer look later to answer that but it seems odd to
have to rotate something when nothing have been written yet. The
rotate_file call is intended to place the written file to the storage
governed by the collector, and at construction this file hasn't been
created yet. Could you describe why you need this?
This question remains actual.
I'll try to explain why I'm requesting this.
Point 1 - when rotating collector is used,
file name that is currently being written is somewhat constant.
But when application shut down, it becames something like
file_name.log + ".1" (counter name pattern case).
This behavior is chosen to guarantee there are no files piling up in the 
directory where the sink backend writes them. Imagine there are timestamps in 
the file name, so the next time you start your application the file names will 
be different and the old files will be left unmanaged. So the files are 
collected when the sink is destroyed and then managed by the collector after 
the application restarts and runs scan_for_files.

Another problem can happen even with file counters only, if file appending is 
not enabled (the default). After scan_for_files doesn't find the last file's 
counter, the sink will reconstruct the file name that is the same as it was in 
the previous run and will overwrite the old file. This kind of gotcha I would 
like to avoid.
...
I would like to have file name the same for both cases:
when service is running and when not running.
Also, if user collects log files to e.g. /var/log/archive folder but
the log file itself resides in /var/log directory,
then latest application log should be seeked in 2 place which is annoying.
Well, right now I don't have ideas on how to support this without interferring 
with my previous points. You see, when file collectors are used, the file that 
the sink writes is a temporary, really.

Actually, I think if you open it on Windows while the sink attempts to rotate 
it, you can screw up the rotation and loose some records while the sink tries 
to recreate the file. I haven't tried this but if I'm right then you're 
basically not recommended to open the file while it's being written.
...
Point 2 - user changed collector configuration (decreased min free space).
After starting application he would expect that some files will be removed,
but this will occur only on next rotation phase.
So, I would re-phrase:
* it's preferable to *try* rotate_file
  during text_file_backend construction phase
You're not talking about file rotation here but rather about checking the 
target storage constraints when initializing the sink. This might be 
worthwhile although, as I said before, we don't expect someone else doing 
something with the storage behind the scene. Besides, the next time a file is 
stored, the collector will check the constraints anyway _before_ storing the 
file, so the additional check on the sink construction isn't really needed.
...
* there is no need to rotate_file during text_file_backend destruction
See my previous points.

Re: [boost] [log] Patch: rotating file collector

Andrey Semashev