Ok, I finally got to this, sorry for the delay. On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:
I'm prepared to rename the current file::make_collector method to better describe the current behavior (I'm open to suggestions on the new name). The new collector should be created with a different function, also appropriately named (file::make_stacking_collector?). It's also probably a good idea to extract collectors and the base file::collector interface to separate headers. I see 2 options here. First, have 2 functions: make_collector, make_rotating_collector. Second, have: a) enum rotation_method { rotate_sequential, rotate_??? } (enum collection_method ?) and b) pass additional parameter rotation_method to make_collector function
Personally I would prefer second approach.
I would rather prefer different factory functions for different collectors so that the parameters are independent and have the appropriate meaning for each collector type. I would also prefer another name for the new collector since I'm afraid "rotation" term would become too overloaded since it is used not with regard to rotating files in the sink backend. "Stacking" collector maybe? "Rolling" collector?
2. The new collector should also be kept in the collectors repository. This way it is possible to create multiple sinks collecting files in the common target directory, without different collectors interferring with each other. Basically, you'll have to move file_collector_hook and enable_shared_from_this< file_collector > base classes to file_collector_base, as well as m_pRepository.
This introduces an additional possible issue if the user tries to create different sinks with different collectors targeting the same directory. This case should be detected (e.g. by checking the type of the collector in file_collector_repository::get_collector) and handle it with an exception (log::setup_error).
The reason I haven't added rotating collector to repository is the "possible issue" you are describing: several collectors targeting one directory but different files. In current implementation, it's impossible to have more than 1 collector for one directory. It is okay when the collector responsibilities are: * files copying to target directory * old files removal (according to max size and/or min free space setup) ...but does not work for rotating collector case. To elaborate more, consider use case: * /var/log - is a directory where all the system log files reside * foo.log - foo service log file * bar.log - bar service log file Both foo and bar log files have to be rotated, e. g.: foo.log (latest), foo.log.1 (previous), ..., foo.log.N (earliest). Thus, user have to setup 2 collectors and will got log::setup_error.
The easiest way to make described use case working - do not store rotating collectors in repository. The more sophisticated approach would be: use additional criteria for collector identification (file_name pattern?).
Please, comment on this.
Well, the rule of the thumb the library follows is that only one process is managing the target directory. This means: a. If some other process manages the storage, you just don't set up the collector and leave it to this process to collect the rotated log files from the sink. In this case we do nothing in Boost.Log to help it. b. If Boost.Log is managing the storage, you set up the collector and mostly rely on the fact that noone else is messing around with the storage. This means that we don't try to detect sudden file deletions or additions to the storage, although should that happen the collector should handle the situation gracefully. This means that if there are two services (foo and bar) running, they should be collecting their log files to separate directories. This is fairly common in Linux, at least, since many services would just create directories within /var/log and put their logs there. If this is not acceptable, then go for case (a). Now, in (b) you still have to implement Boost.Log so that it is possible to create multiple sinks collecting to the same target directory, and this is solved by the collectors repository. The problem of having different collectors targeting the same directory can be solved by simply prohibiting such configuration, because the desired behavior is not clear in this case. So if you put the new collector to the repository and add a check for its type before adding, you should be fine.
4. I think, it should be possible to avoid the do_rollover method. file_collector_base::store_file should be moved to file_collector, and there should be a separate rotating_file_collector::store_file method as well. Any common subroutines (like checking available free space, erasing old files, etc.) can be extracted as distinct functions into file_collector_base and called from store_file methods of the derivatives.
I don't see any advantages of eliminating do_rollover method in favour of common subroutine methods (as well as vice versa ;) ). I don't mind to implement approach you are describing, but maybe you can provide additional arguments why do we need this change?
This is a design clarity question. The do_rollover method is specific to the new collector, and by exposing it to the base class you're leaking these specifics to the base class, which is supposed to be oblivious to it. Also, the method is virtual and you already have a virtual table dispatch when store_file is called. Not that the performance really matters in this case, but in general this counts as a design flaw to me.
5. You don't seem to rename the collected files when the file pattern contains date/time (see date_rollover_policy). Is that right? Obviously, you will only have the date and time point of the rotation and not the file creation. You're right that collected files are not renamed for date/time patterns. I don't have any strong opinion about whether file name should indicate file creation or file rotation time. However, gut feeling inclines me to the second case (file rotation time).
I think, most, if not all systems I had experience with (including proprietary developments) put file creation time in the file name. And this is quite expected, I think. I mean, if you're looking for a file that might contain logs for a specific time point, you would typically assume the file name to indicate the beginning of the time frame and not the end of it. Of course, we could put both time stamps into the file name, but that's another story.
On the opposite, I have following real life use case. User logins to system and wants to monitor events for day 23. There are 2 log files in the log directory: file.log and file.log.2013-04-21. If filename indicates creation time, then it's hard to guess what file to look at: records could be found in both files. If 2013-04-21 indicates rotation time, then it's clear that the record for day 23 is in file.log.
It will be interesting to hear opinions from you.
If I have a login date (20013-04-23), I would look into file.log.2013-04-21 first. Call it whatever you like, but my intuition tells me that the file name indicates the date of the file creation. :) Also, the last modification time might give you a hint on when the file ends, although it depends on your file manager how apparent this attribute is. You can avoid the confusion if you name the active file with its creation time as well (which is what Boost.Log does, currently). So if you have file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which file contains the records you seek.
Also, I would like to discuss following question with you :
rotate_file invocation in ~text_file_backend - I would prefer to have this call during text_file_backend construction phase.
I'll have to take a closer look later to answer that but it seems odd to have to rotate something when nothing have been written yet. The rotate_file call is intended to place the written file to the storage governed by the collector, and at construction this file hasn't been created yet. Could you describe why you need this?
This question remains actual.
I'll try to explain why I'm requesting this. Point 1 - when rotating collector is used, file name that is currently being written is somewhat constant. But when application shut down, it becames something like file_name.log + ".1" (counter name pattern case).
This behavior is chosen to guarantee there are no files piling up in the directory where the sink backend writes them. Imagine there are timestamps in the file name, so the next time you start your application the file names will be different and the old files will be left unmanaged. So the files are collected when the sink is destroyed and then managed by the collector after the application restarts and runs scan_for_files. Another problem can happen even with file counters only, if file appending is not enabled (the default). After scan_for_files doesn't find the last file's counter, the sink will reconstruct the file name that is the same as it was in the previous run and will overwrite the old file. This kind of gotcha I would like to avoid.
I would like to have file name the same for both cases: when service is running and when not running. Also, if user collects log files to e.g. /var/log/archive folder but the log file itself resides in /var/log directory, then latest application log should be seeked in 2 place which is annoying.
Well, right now I don't have ideas on how to support this without interferring with my previous points. You see, when file collectors are used, the file that the sink writes is a temporary, really. Actually, I think if you open it on Windows while the sink attempts to rotate it, you can screw up the rotation and loose some records while the sink tries to recreate the file. I haven't tried this but if I'm right then you're basically not recommended to open the file while it's being written.
Point 2 - user changed collector configuration (decreased min free space). After starting application he would expect that some files will be removed, but this will occur only on next rotation phase. So, I would re-phrase: * it's preferable to *try* rotate_file during text_file_backend construction phase
You're not talking about file rotation here but rather about checking the target storage constraints when initializing the sink. This might be worthwhile although, as I said before, we don't expect someone else doing something with the storage behind the scene. Besides, the next time a file is stored, the collector will check the constraints anyway _before_ storing the file, so the additional check on the sink construction isn't really needed.
* there is no need to rotate_file during text_file_backend destruction
See my previous points.