Re: [boost] [log] Patch: rotating file collector
On Tue, Apr 23, 2013 at 3:58 PM, Alexander Arhipenko
I've added implementation of rotating file collector - the patch against boost log trunk attached. This is not the final version, I'm going at least to finish most of the TODOs.
However, it would be nice to receive feedback and questions from library author as well as from people interested in this feature.
Andrey, could you please find several minutes to review the attached patch?
Thanks for the patch. I didn't look deeply into it yet but from the first glance it looks like you changed the existing file collector rather than introduce an alternative one. Am I right? If so, could this be made as a pure addition rather than modification? I would prefer to keep the current behavior also available for the users. Also, I would like to discuss following question with you :
rotate_file invocation in ~text_file_backend - I would prefer to have this call during text_file_backend construction phase.
I'll have to take a closer look later to answer that but it seems odd to have to rotate something when nothing have been written yet. The rotate_file call is intended to place the written file to the storage governed by the collector, and at construction this file hasn't been created yet. Could you describe why you need this?
On Tuesday 23 April 2013 16:18:44 you wrote:
On Tue, Apr 23, 2013 at 3:58 PM, Alexander Arhipenko
wrote: I've added implementation of rotating file collector - the patch against boost log trunk attached. This is not the final version, I'm going at least to finish most of the TODOs.
However, it would be nice to receive feedback and questions from library author as well as from people interested in this feature.
Andrey, could you please find several minutes to review the attached patch?
Thanks for the patch. I didn't look deeply into it yet but from the first glance it looks like you changed the existing file collector rather than introduce an alternative one. Am I right? If so, could this be made as a pure addition rather than modification? I would prefer to keep the current behavior also available for the users.
I took a closer look now and my first impression was wrong - the original behavior is still accessible, this is good. However, I have several notes: 1. I would still prefer the interface to be distinct for different types of collectors, and not based on the file name pattern presence. These are distinct objects with different behaviour and it should be apparent from the collector construction what behavior it will implement. I'm prepared to rename the current file::make_collector method to better describe the current behavior (I'm open to suggestions on the new name). The new collector should be created with a different function, also appropriately named (file::make_stacking_collector?). It's also probably a good idea to extract collectors and the base file::collector interface to separate headers. 2. The new collector should also be kept in the collectors repository. This way it is possible to create multiple sinks collecting files in the common target directory, without different collectors interferring with each other. Basically, you'll have to move file_collector_hook and enable_shared_from_this< file_collector > base classes to file_collector_base, as well as m_pRepository. This introduces an additional possible issue if the user tries to create different sinks with different collectors targeting the same directory. This case should be detected (e.g. by checking the type of the collector in file_collector_repository::get_collector) and handle it with an exception (log::setup_error). 3. In rotating_file_collector::scan_for_files, invalid file name pattern should also be handled with setup_error. 4. I think, it should be possible to avoid the do_rollover method. file_collector_base::store_file should be moved to file_collector, and there should be a separate rotating_file_collector::store_file method as well. Any common subroutines (like checking available free space, erasing old files, etc.) can be extracted as distinct functions into file_collector_base and called from store_file methods of the derivatives. 5. You don't seem to rename the collected files when the file pattern contains date/time (see date_rollover_policy). Is that right? Obviously, you will only have the date and time point of the rotation and not the file creation. 6. Actually, the date/time support for the new collector seems quite broken. The current behavior, when the file is originally named by the sink, ensures that the date/time in the file name corresponds to the first records in the file. If the date/time is added by the collector, the timestamp will correspond to the latest written records, which it counterintuitive. The check in rotating_file_collector::scan_for_files prevents users from setting date/time in the sink, so basically, it will only be usable if there is only a file counter placeholder in the file name pattern. I'm not sure what is the best way to solve this. Maybe it's better to avoid dealing with date/time placeholders in the collector at all and only work with file counters. But scan_for_files should be changed so that it is possible to set a pattern with date/time placeholders in the sink backend. In that case I think it's reasonable to remove the file name pattern from the file collector construction and simply always add file counters right before the extension. That way the file name pattern is only set in the sink backend, which simplifies the interface a lot. 7. You should probably rebase your patch to the Boost trunk, as the library has been merged recently. For now it's pretty close to the SourceForge SVN, so you should not have any problems. I'm going to work in Boost SVN now, so the SourceForge SVN will get outdated eventually.
Also, I would like to discuss following question with you :
rotate_file invocation in ~text_file_backend - I would prefer to have this call during text_file_backend construction phase.
I'll have to take a closer look later to answer that but it seems odd to have to rotate something when nothing have been written yet. The rotate_file call is intended to place the written file to the storage governed by the collector, and at construction this file hasn't been created yet. Could you describe why you need this?
This question remains actual.
Hi Andrey,
First of all, thanks for such a detailed review.
Please, find my answers/proposals below.
On Tue, Apr 23, 2013 at 11:27 PM, Andrey Semashev
On Tuesday 23 April 2013 16:18:44 you wrote:
On Tue, Apr 23, 2013 at 3:58 PM, Alexander Arhipenko
wrote: I've added implementation of rotating file collector - the patch against boost log trunk attached. [snip] ...
Thanks for the patch. I didn't look deeply into it yet but from the first glance it looks like you changed the existing file collector ... [snip] ...
I took a closer look now and my first impression was wrong - the original behavior is still accessible, this is good. However, I have several notes:
1. I would still prefer the interface to be distinct for different types of collectors, and not based on the file name pattern presence. These are distinct objects with different behaviour and it should be apparent from the collector construction what behavior it will implement.
Agree to that. I also had an intention to provide separate function for rotating collector, but decided to expand existing one just to save typing.
I'm prepared to rename the current file::make_collector method to better describe the current behavior (I'm open to suggestions on the new name). The new collector should be created with a different function, also appropriately named (file::make_stacking_collector?). It's also probably a good idea to extract collectors and the base file::collector interface to separate headers.
I see 2 options here. First, have 2 functions: make_collector, make_rotating_collector. Second, have: a) enum rotation_method { rotate_sequential, rotate_??? } (enum collection_method ?) and b) pass additional parameter rotation_method to make_collector function Personally I would prefer second approach.
2. The new collector should also be kept in the collectors repository. This way it is possible to create multiple sinks collecting files in the common target directory, without different collectors interferring with each other. Basically, you'll have to move file_collector_hook and enable_shared_from_this< file_collector > base classes to file_collector_base, as well as m_pRepository.
This introduces an additional possible issue if the user tries to create different sinks with different collectors targeting the same directory. This case should be detected (e.g. by checking the type of the collector in file_collector_repository::get_collector) and handle it with an exception (log::setup_error).
The reason I haven't added rotating collector to repository is the "possible issue" you are describing: several collectors targeting one directory but different files. In current implementation, it's impossible to have more than 1 collector for one directory. It is okay when the collector responsibilities are: * files copying to target directory * old files removal (according to max size and/or min free space setup) ...but does not work for rotating collector case. To elaborate more, consider use case: * /var/log - is a directory where all the system log files reside * foo.log - foo service log file * bar.log - bar service log file Both foo and bar log files have to be rotated, e. g.: foo.log (latest), foo.log.1 (previous), ..., foo.log.N (earliest). Thus, user have to setup 2 collectors and will got log::setup_error. The easiest way to make described use case working - do not store rotating collectors in repository. The more sophisticated approach would be: use additional criteria for collector identification (file_name pattern?). Please, comment on this.
3. In rotating_file_collector::scan_for_files, invalid file name pattern should also be handled with setup_error.
Agree, will do.
4. I think, it should be possible to avoid the do_rollover method. file_collector_base::store_file should be moved to file_collector, and there should be a separate rotating_file_collector::store_file method as well. Any common subroutines (like checking available free space, erasing old files, etc.) can be extracted as distinct functions into file_collector_base and called from store_file methods of the derivatives.
I don't see any advantages of eliminating do_rollover method in favour of common subroutine methods (as well as vice versa ;) ). I don't mind to implement approach you are describing, but maybe you can provide additional arguments why do we need this change?
5. You don't seem to rename the collected files when the file pattern contains date/time (see date_rollover_policy). Is that right? Obviously, you will only have the date and time point of the rotation and not the file creation.
You're right that collected files are not renamed for date/time patterns. I don't have any strong opinion about whether file name should indicate file creation or file rotation time. However, gut feeling inclines me to the second case (file rotation time). Let's consider existing logging components. E.g. Python's logging module provides TimedRotatingFileHandler to deal with time based rotation. This handler indicates file creation time in the rotated file name. But the files are rotated periodically (file sizes could differ). On the opposite, I have following real life use case. User logins to system and wants to monitor events for day 23. There are 2 log files in the log directory: file.log and file.log.2013-04-21. If filename indicates creation time, then it's hard to guess what file to look at: records could be found in both files. If 2013-04-21 indicates rotation time, then it's clear that the record for day 23 is in file.log. It will be interesting to hear opinions from you.
6. Actually, the date/time support for the new collector seems quite broken. The current behavior, when the file is originally named by the sink, ensures that the date/time in the file name corresponds to the first records in the file. If the date/time is added by the collector, the timestamp will correspond to the latest written records, which it counterintuitive. The check in rotating_file_collector::scan_for_files prevents users from setting date/time in the sink, so basically, it will only be usable if there is only a file counter placeholder in the file name pattern.
I'm not sure what is the best way to solve this. Maybe it's better to avoid dealing with date/time placeholders in the collector at all and only work with file counters. But scan_for_files should be changed so that it is possible to set a pattern with date/time placeholders in the sink backend. In that case I think it's reasonable to remove the file name pattern from the file collector construction and simply always add file counters right before the extension. That way the file name pattern is only set in the sink backend, which simplifies the interface a lot.
Idea of dealing with counters only seems reasonable to me. I need to analyze this a little bit more... Please, also consider my comments regarding point 5.
7. You should probably rebase your patch to the Boost trunk, as the library has been merged recently. For now it's pretty close to the SourceForge SVN, so you should not have any problems. I'm going to work in Boost SVN now, so the SourceForge SVN will get outdated eventually.
Will do.
Also, I would like to discuss following question with you :
rotate_file invocation in ~text_file_backend - I would prefer to have this call during text_file_backend construction phase.
I'll have to take a closer look later to answer that but it seems odd to have to rotate something when nothing have been written yet. The rotate_file call is intended to place the written file to the storage governed by the collector, and at construction this file hasn't been created yet. Could you describe why you need this?
This question remains actual.
I'll try to explain why I'm requesting this. Point 1 - when rotating collector is used, file name that is currently being written is somewhat constant. But when application shut down, it becames something like file_name.log + ".1" (counter name pattern case). I would like to have file name the same for both cases: when service is running and when not running. Also, if user collects log files to e.g. /var/log/archive folder but the log file itself resides in /var/log directory, then latest application log should be seeked in 2 place which is annoying. Point 2 - user changed collector configuration (decreased min free space). After starting application he would expect that some files will be removed, but this will occur only on next rotation phase. So, I would re-phrase: * it's preferable to *try* rotate_file during text_file_backend construction phase * there is no need to rotate_file during text_file_backend destruction Andrey, I will wait for you response and continue working on patch. Thanks and Regards
Ok, I finally got to this, sorry for the delay. On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:
I'm prepared to rename the current file::make_collector method to better describe the current behavior (I'm open to suggestions on the new name). The new collector should be created with a different function, also appropriately named (file::make_stacking_collector?). It's also probably a good idea to extract collectors and the base file::collector interface to separate headers. I see 2 options here. First, have 2 functions: make_collector, make_rotating_collector. Second, have: a) enum rotation_method { rotate_sequential, rotate_??? } (enum collection_method ?) and b) pass additional parameter rotation_method to make_collector function
Personally I would prefer second approach.
I would rather prefer different factory functions for different collectors so that the parameters are independent and have the appropriate meaning for each collector type. I would also prefer another name for the new collector since I'm afraid "rotation" term would become too overloaded since it is used not with regard to rotating files in the sink backend. "Stacking" collector maybe? "Rolling" collector?
2. The new collector should also be kept in the collectors repository. This way it is possible to create multiple sinks collecting files in the common target directory, without different collectors interferring with each other. Basically, you'll have to move file_collector_hook and enable_shared_from_this< file_collector > base classes to file_collector_base, as well as m_pRepository.
This introduces an additional possible issue if the user tries to create different sinks with different collectors targeting the same directory. This case should be detected (e.g. by checking the type of the collector in file_collector_repository::get_collector) and handle it with an exception (log::setup_error).
The reason I haven't added rotating collector to repository is the "possible issue" you are describing: several collectors targeting one directory but different files. In current implementation, it's impossible to have more than 1 collector for one directory. It is okay when the collector responsibilities are: * files copying to target directory * old files removal (according to max size and/or min free space setup) ...but does not work for rotating collector case. To elaborate more, consider use case: * /var/log - is a directory where all the system log files reside * foo.log - foo service log file * bar.log - bar service log file Both foo and bar log files have to be rotated, e. g.: foo.log (latest), foo.log.1 (previous), ..., foo.log.N (earliest). Thus, user have to setup 2 collectors and will got log::setup_error.
The easiest way to make described use case working - do not store rotating collectors in repository. The more sophisticated approach would be: use additional criteria for collector identification (file_name pattern?).
Please, comment on this.
Well, the rule of the thumb the library follows is that only one process is managing the target directory. This means: a. If some other process manages the storage, you just don't set up the collector and leave it to this process to collect the rotated log files from the sink. In this case we do nothing in Boost.Log to help it. b. If Boost.Log is managing the storage, you set up the collector and mostly rely on the fact that noone else is messing around with the storage. This means that we don't try to detect sudden file deletions or additions to the storage, although should that happen the collector should handle the situation gracefully. This means that if there are two services (foo and bar) running, they should be collecting their log files to separate directories. This is fairly common in Linux, at least, since many services would just create directories within /var/log and put their logs there. If this is not acceptable, then go for case (a). Now, in (b) you still have to implement Boost.Log so that it is possible to create multiple sinks collecting to the same target directory, and this is solved by the collectors repository. The problem of having different collectors targeting the same directory can be solved by simply prohibiting such configuration, because the desired behavior is not clear in this case. So if you put the new collector to the repository and add a check for its type before adding, you should be fine.
4. I think, it should be possible to avoid the do_rollover method. file_collector_base::store_file should be moved to file_collector, and there should be a separate rotating_file_collector::store_file method as well. Any common subroutines (like checking available free space, erasing old files, etc.) can be extracted as distinct functions into file_collector_base and called from store_file methods of the derivatives.
I don't see any advantages of eliminating do_rollover method in favour of common subroutine methods (as well as vice versa ;) ). I don't mind to implement approach you are describing, but maybe you can provide additional arguments why do we need this change?
This is a design clarity question. The do_rollover method is specific to the new collector, and by exposing it to the base class you're leaking these specifics to the base class, which is supposed to be oblivious to it. Also, the method is virtual and you already have a virtual table dispatch when store_file is called. Not that the performance really matters in this case, but in general this counts as a design flaw to me.
5. You don't seem to rename the collected files when the file pattern contains date/time (see date_rollover_policy). Is that right? Obviously, you will only have the date and time point of the rotation and not the file creation. You're right that collected files are not renamed for date/time patterns. I don't have any strong opinion about whether file name should indicate file creation or file rotation time. However, gut feeling inclines me to the second case (file rotation time).
I think, most, if not all systems I had experience with (including proprietary developments) put file creation time in the file name. And this is quite expected, I think. I mean, if you're looking for a file that might contain logs for a specific time point, you would typically assume the file name to indicate the beginning of the time frame and not the end of it. Of course, we could put both time stamps into the file name, but that's another story.
On the opposite, I have following real life use case. User logins to system and wants to monitor events for day 23. There are 2 log files in the log directory: file.log and file.log.2013-04-21. If filename indicates creation time, then it's hard to guess what file to look at: records could be found in both files. If 2013-04-21 indicates rotation time, then it's clear that the record for day 23 is in file.log.
It will be interesting to hear opinions from you.
If I have a login date (20013-04-23), I would look into file.log.2013-04-21 first. Call it whatever you like, but my intuition tells me that the file name indicates the date of the file creation. :) Also, the last modification time might give you a hint on when the file ends, although it depends on your file manager how apparent this attribute is. You can avoid the confusion if you name the active file with its creation time as well (which is what Boost.Log does, currently). So if you have file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which file contains the records you seek.
Also, I would like to discuss following question with you :
rotate_file invocation in ~text_file_backend - I would prefer to have this call during text_file_backend construction phase.
I'll have to take a closer look later to answer that but it seems odd to have to rotate something when nothing have been written yet. The rotate_file call is intended to place the written file to the storage governed by the collector, and at construction this file hasn't been created yet. Could you describe why you need this?
This question remains actual.
I'll try to explain why I'm requesting this. Point 1 - when rotating collector is used, file name that is currently being written is somewhat constant. But when application shut down, it becames something like file_name.log + ".1" (counter name pattern case).
This behavior is chosen to guarantee there are no files piling up in the directory where the sink backend writes them. Imagine there are timestamps in the file name, so the next time you start your application the file names will be different and the old files will be left unmanaged. So the files are collected when the sink is destroyed and then managed by the collector after the application restarts and runs scan_for_files. Another problem can happen even with file counters only, if file appending is not enabled (the default). After scan_for_files doesn't find the last file's counter, the sink will reconstruct the file name that is the same as it was in the previous run and will overwrite the old file. This kind of gotcha I would like to avoid.
I would like to have file name the same for both cases: when service is running and when not running. Also, if user collects log files to e.g. /var/log/archive folder but the log file itself resides in /var/log directory, then latest application log should be seeked in 2 place which is annoying.
Well, right now I don't have ideas on how to support this without interferring with my previous points. You see, when file collectors are used, the file that the sink writes is a temporary, really. Actually, I think if you open it on Windows while the sink attempts to rotate it, you can screw up the rotation and loose some records while the sink tries to recreate the file. I haven't tried this but if I'm right then you're basically not recommended to open the file while it's being written.
Point 2 - user changed collector configuration (decreased min free space). After starting application he would expect that some files will be removed, but this will occur only on next rotation phase. So, I would re-phrase: * it's preferable to *try* rotate_file during text_file_backend construction phase
You're not talking about file rotation here but rather about checking the target storage constraints when initializing the sink. This might be worthwhile although, as I said before, we don't expect someone else doing something with the storage behind the scene. Besides, the next time a file is stored, the collector will check the constraints anyway _before_ storing the file, so the additional check on the sink construction isn't really needed.
* there is no need to rotate_file during text_file_backend destruction
See my previous points.
On May 1, 2013, at 8:53 AM, Andrey Semashev
On Wednesday 24 April 2013 18:09:37 Alexander Arhipenko wrote:
[snip]
I would also prefer another name for the new collector since I'm afraid "rotation" term would become too overloaded since it is used not with regard to rotating files in the sink backend. "Stacking" collector maybe? "Rolling" collector?
I'm ignorant of you library terminology, but "rotation" is appropriate if you maintain a set of numbered files, so a given file's number changes as it ages. In that case, the contents rotates through the numbered filenames. OTOH, if the files are dated, then they are not renamed when a new file is created, so the log output spills over into a new file when the date changes. In that case, "rollover" makes sense.
5. You don't seem to rename the collected files when the file pattern contains date/time (see date_rollover_policy). Is that right? Obviously, you will only have the date and time point of the rotation and not the file creation. You're right that collected files are not renamed for date/time patterns. I don't have any strong opinion about whether file name should indicate file creation or file rotation time. However, gut feeling inclines me to the second case (file rotation time).
I think, most, if not all systems I had experience with (including proprietary developments) put file creation time in the file name. And this is quite expected, I think. I mean, if you're looking for a file that might contain logs for a specific time point, you would typically assume the file name to indicate the beginning of the time frame and not the end of it.
Of course, we could put both time stamps into the file name, but that's another story.
I agree that the creation date/time makes the most sense. Both creation and close dates/times would be ideal, but with a set of creation-dates/times-only files, the end date/time can be inferred from that on the next file.
On the opposite, I have following real life use case. User logins to system and wants to monitor events for day 23. There are 2 log files in the log directory: file.log and file.log.2013-04-21. If filename indicates creation time, then it's hard to guess what file to look at: records could be found in both files. If 2013-04-21 indicates rotation time, then it's clear that the record for day 23 is in file.log.
It will be interesting to hear opinions from you.
If I have a login date (20013-04-23), I would look into file.log.2013-04-21 first. Call it whatever you like, but my intuition tells me that the file name indicates the date of the file creation. :) Also, the last modification time might give you a hint on when the file ends, although it depends on your file manager how apparent this attribute is.
If the current file has no date/time in the name, that's a problem. An entry for the 23rd can only be located by searching. I'd start by running head on the most recent file. If the first entry is past the desired time, then I'd know to look in the file with 21 in the name.
You can avoid the confusion if you name the active file with its creation time as well (which is what Boost.Log does, currently). So if you have file.log.2013-04-21 and file.log.2013-04-25 then it's pretty obvoius which file contains the records you seek.
Right
Actually, I think if you open it on Windows while the sink attempts to rotate it, you can screw up the rotation and loose some records while the sink tries to recreate the file. I haven't tried this but if I'm right then you're basically not recommended to open the file while it's being written.
You must allow for following the current file or reading previous files, at all times, as much as possible. What's not possible just takes more effort. :) ___ Rob (Sent from my portable computation engine)
On Thursday 02 May 2013 08:09:13 Rob Stewart wrote:
On May 1, 2013, at 8:53 AM, Andrey Semashev
wrote: I would also prefer another name for the new collector since I'm afraid "rotation" term would become too overloaded since it is used not with regard to rotating files in the sink backend. "Stacking" collector maybe? "Rolling" collector? I'm ignorant of you library terminology, but "rotation" is appropriate if you maintain a set of numbered files, so a given file's number changes as it ages. In that case, the contents rotates through the numbered filenames.
OTOH, if the files are dated, then they are not renamed when a new file is created, so the log output spills over into a new file when the date changes. In that case, "rollover" makes sense.
The library supports both date/time stamped files and file counters. And when the file is rotated, it is simply put into the storage, and a new file name is generated (with the new timestamp and/or the next counter value). This process of switching the actively written file is called "rotation" (at least, that's the meaning I put into the term). English is not my native language, so there may be a better word for it. What Alexander suggested is a different mechanism of storing the files, when the previous files are renamed (their counters are incremented) and the new file is named so that it has the least counter value. So the storage behaves kind of like a stack of files, that's why I also suggested "stacking" collector. But in any case, whether or not this new mechanism is used, file rotation will still take place, so I think it is a good idea to name the collector differently to avoid the confusion.
Actually, I think if you open it on Windows while the sink attempts to rotate it, you can screw up the rotation and loose some records while the sink tries to recreate the file. I haven't tried this but if I'm right then you're basically not recommended to open the file while it's being written. You must allow for following the current file or reading previous files, at all times, as much as possible. What's not possible just takes more effort. :)
I agree. That shouldn't be a problem on systems when an open file can be deleted or renamed, which covers all UNIX-like systems, if I'm not mistaken. I have always been annoyed with the different behavior of Windows in this regard, especially in conjunction with some antivirus services that love to poke their noses into files. In fact, that was one reason I tried to minimize the number of file system operations that need to be performed on a file rotation. Anyway, I'll need to try this next time I'm running Windows to see the impact.
Hi Andrey, Again, thanks for you comments. I've fixed several items mentioned in code review. To be precise: 1, 3, 4. 1: factory function name, make_rollover_collector was chosen. 3:
3. In rotating_file_collector::scan_for_files, invalid file name pattern should also be handled with setup_error.
4:
4. I think, it should be possible to avoid the do_rollover method. file_collector_base::store_file should be moved to file_collector, and there should be a separate rotating_file_collector::store_file method as well. Any common subroutines (like checking available free space, erasing old files, etc.) can be extracted as distinct functions into file_collector_base and called from store_file methods of the derivatives.
Also, patch was rebased against boost trunk. Regarding points 5 and 6: after analyzing this case, I've dropped support for datetime pattern in rollover collector. Please, find some explanations below. First, using last write time for the file name pattern was not adopted unambiguously by community. Also, there is a counter example in existing component - python TimedRotatingFileHandler, that uses creation timestamp. Second, if user wants to use file creation timestamp in rotated file name, she can use existing make_collector function for that purpose. You've also mentioned an option of using datetime placeholder in file sink:
... Maybe it's better to avoid dealing with date/time placeholders in the collector at all and only work with file counters. But scan_for_files should be changed so that it is possible to set a pattern with date/time placeholders in the sink backend.
I liked that option, but existing implementation should be enhanced to support this: currently during rollover phase files are simply renamed. I.e., fileN -> fileN+1, fileN-1 -> fileN, ..., file -> file1. Thus, datetime placeholder will be invalid after rollover - it will point to the wrong date. If you think this option is very nice to have - please tell. There are still 2 points we need to reach an agreement on: a. > > 2. The new collector should also be kept in the collectors repository. b. > * there is no need to rotate_file during text_file_backend destruction. In regards to a. - collectors repository. I agree that only one file_collector should serve one target directory. Regarding rollover collectors - they are different beasts. They do not serve the whole directory, but only file series in that directory. So, I propose 2 alternatives: a) use target directory and file name pattern as collector "identity" or b) do not put rollover collector into repository Now quoting some of your arguments against:
b. If Boost.Log is managing the storage, you set up the collector and mostly rely on the fact that noone else is messing around with the storage. This means that we don't try to detect sudden file deletions or additions to the storage, although should that happen the collector should handle the situation gracefully. This means that if there are two services (foo and bar) running, they should be collecting their log files to separate directories. This is fairly common in Linux, at least, since many services would just create directories within /var/log and put their logs there. If this is not acceptable, then go for case (a). [snip] ... The problem of having different collectors targeting the same directory can be solved by simply prohibiting such configuration, because the desired behavior is not clear in this case.
Agree that Boost.Log should not detect any "external" file deletions or additions. However, I don't see any problems of having 2 _rollover_ collectors targeting one directory. Each of them will monitor own file series and manipulates (add, remove) only on that series. The only 1 point of "race condition" we can imagine - that is minimum free space requirements. However, this is still not a problem: even if both collectors perform rotation at one point in time and minimum free space requirement is not met, each collector will just remove its file(s), freeing a little bit more of free space. Also, some words regarding Linux example. On my Ubuntu 12.04 there are both files that reside in separate directories: /var/log/samba, /var/log/mongodb, /var/log/nginx as well as in root log directory: /var/log/syslog.log*, /var/log/dpkg.log*, /var/log/auth.log*. That means that having files from different services in one directory is existing practice. We need to give user an ability to cover this cases in Boost.Log library. In regards to b. - rotate_file during text_file_backend destruction. You've provided a good argument that I can't argue against:
[snip] ... Imagine there are timestamps in the file name, so the next time you start your application the file names will be different and the old files will be left unmanaged. So the files are collected when the sink is destroyed and then managed by the collector after the application restarts and runs scan_for_files.
However, I still think that having active log file with persistent name will be a good benefit for many cases (one of the examples - /var/log/*.log* files mentioned above). After some thinking, I've came to conclusion that when * file sink does not have placeholders (i.e., log file name is static), * active log resides in collection directory (this point is not mandatory) - then we can safely do not rotate such file during sink destruction. So, I propose to implement the behavior described above, i.e.: log file is not rotated during sink destruction when it's filename is static. Regards
On Mon, May 6, 2013 at 4:09 PM, Alexander Arhipenko
Hi Andrey,
Regarding points 5 and 6: after analyzing this case, I've dropped support for datetime pattern in rollover collector. You've also mentioned an option of using datetime placeholder in file sink:
... Maybe it's better to avoid dealing with date/time placeholders in the collector at all and only work with file counters. But scan_for_files should be changed so that it is possible to set a pattern with date/time placeholders in the sink backend.
I liked that option, but existing implementation should be enhanced to support this: currently during rollover phase files are simply renamed. I.e., fileN -> fileN+1, fileN-1 -> fileN, ..., file -> file1. Thus, datetime placeholder will be invalid after rollover - it will point to the wrong date. If you think this option is very nice to have - please tell.
Why, the date will stay relevant even after the collector renames the file (assuming it doesn't change the date). Say, you have these files: file-2013-05-01.3.log file-2013-05-03.2.log file-2013-05-05.1.log and you rotate file-2013-05-07.log. After rotation you should have: file-2013-05-01.4.log file-2013-05-03.3.log file-2013-05-05.2.log file-2013-05-07.1.log So the dates in the file names still indicate the beginning of the time frame of the files. I admit, however, that it may look strange to use both dates and file counters with rollover collector. Dates are useful for lookup purposes and it shouldn't be difficult to support them. On the other hand the current collector already supports them. I have no strong opinion on whether to support dates in rollover collector or not, I'll leave it to you. In regards to a. - collectors repository.
I agree that only one file_collector should serve one target directory. Regarding rollover collectors - they are different beasts.
No, they are not, actually. Let's see why.
Agree that Boost.Log should not detect any "external" file deletions or additions. However, I don't see any problems of having 2 _rollover_ collectors targeting one directory. Each of them will monitor own file series and manipulates (add, remove) only on that series. The only 1 point of "race condition" we can imagine - that is minimum free space requirements. However, this is still not a problem: even if both collectors perform rotation at one point in time and minimum free space requirement is not met, each collector will just remove its file(s), freeing a little bit more of free space.
Nope, that case is still broken. Suppose you have two sinks that have two rollover collectors with the common target directory. Disregarding thread contention, the library will feed records to sinks sequentially, always in the same order. When the space limit for the target directory is reached, only one of the collectors will typically invoke old files deletion, and the other collector will typically not. You see, storage limits are set for the target storage and not for the sink or set of files. While maintaining these limits the collector should typically ensure that the files are being deleted in a fair manner. Always deleting the oldest file (regardless of the sink that file came from) maintains this behavior, assuming that all sinks rotate files at a more or less similar rate. This approach also works with rollover collector, IMHO, and that is why I think only one collector should be managing any given storage. I think I can see why you want to make the collectors separate - this would simplify managing the queues of files from different sinks. You can implement multiple queues of files within a single collector, which will still exclusively manage its target directory. You can also implement additional limits for each file queue, if you like, but then you'll have to come up with some rules that will choose the files to delete in a fair manner, and this is also much easier to implement within a single collector. Also, some words regarding Linux example.
On my Ubuntu 12.04 there are both files that reside in separate directories: /var/log/samba, /var/log/mongodb, /var/log/nginx as well as in root log directory: /var/log/syslog.log*, /var/log/dpkg.log*, /var/log/auth.log*. That means that having files from different services in one directory is existing practice. We need to give user an ability to cover this cases in Boost.Log library.
Yes, some services allow that, although I suspect most if not all of the files in /var/log are written by syslog daemon. My point was that storing files in service-specific directories is a fairly common practice already.
However, I still think that having active log file with persistent name will be a good benefit for many cases (one of the examples - /var/log/*.log* files mentioned above). After some thinking, I've came to conclusion that when * file sink does not have placeholders (i.e., log file name is static), * active log resides in collection directory (this point is not mandatory) - then we can safely do not rotate such file during sink destruction. So, I propose to implement the behavior described above, i.e.: log file is not rotated during sink destruction when it's filename is static.
I don't think it is safe to implicitly rely on configuration in this case. After all, directories and file name patterns may change after application restart. And such subtle change in behavior without being explicitly requested by user also doesn't look like a good idea. I don't really like this, but we could add an option to the sink (a flag, rotate_on_destroy), which by default would be true to maintain the current behavior. If the user knows what he's doing, he would be able to disable rotation on destructor.
participants (3)
-
Alexander Arhipenko
-
Andrey Semashev
-
Rob Stewart