[Regex] How to match the "End of buffer"/"End of string"?
Hi, I'm new in regex and this is my first post, so maybe the solution is obvious but I couldn't find it in google... I need to parse the multiline output of a command, every line ends with a \n except the last one, which actually it ends with the end of buffe ("\0" character). The output I need to parse is something like: "text1 this is a multiple-word text\n text2 another text" (the second line does not have a newline) As a result I want only two sub-expression in a line using a regex like: (\w+)\s+([^\n]+)\n The first submatch should be the first word ("text1" and "text2"), while the second submatch would be the rest of the line ("this is a multiple-word text" and "another text") In my program I use regex_search with the boost::match_continuous option, all the other regex objects are created with the default options. The first line matchs the regex expression without any problem but as the second line does not end with a "\n" it does not. I'm unable to find a good regex expression which can match the two possible "ends of line" (the \n or \0 character).. I've tried some expressions without success: 1.- First I tried to match \n || \0 using: (\w+)\s+([^\n\x00]+)([\n\x00]) but it seems the \x00 is not part of the buffer, so the second line does not match. 2.- Then I tried to use the "$" string without success (By the way, I assumed "$" would work as "\n" but it does not match the "end of line" character. When should I use??) 3.- In google I found that I should use "\z" or "\Z". I tried both, but they didn't work: The last line of the text never matches! (I suppose I need to add a new option to a regex object in order the "\z" o "\Z" strings to work) Finally I've found a workaround using the regex: (\w+)\s+([^\n]+)\n* and now it works but I would like to find a way to match the end of buffer/end of string. Any idea?? Thanks in advance, Jordi
I'm new in regex and this is my first post, so maybe the solution is obvious but I couldn't find it in google...
I need to parse the multiline output of a command, every line ends with a \n except the last one, which actually it ends with the end of buffe ("\0" character). The output I need to parse is something like:
"text1 this is a multiple-word text\n text2 another text" (the second line does not have a newline)
As a result I want only two sub-expression in a line using a regex like:
(\w+)\s+([^\n]+)\n
The first submatch should be the first word ("text1" and "text2"), while the second submatch would be the rest of the line ("this is a multiple-word text" and "another text")
In my program I use regex_search with the boost::match_continuous option, all the other regex objects are created with the default options.
It depends whether you want to capture the newline character or not, if you do then: (\w+)\s+([^\n]+)(?:\n|$) would do the trick, otherwise if you don't want the newline character (just the contents of each whole line) then: ^(\w+)\s+([^\n]+)$ Used without the match_continuous flag would do it. John.
John Maddock wrote:
I'm new in regex and this is my first post, so maybe the solution is obvious but I couldn't find it in google...
I need to parse the multiline output of a command, every line ends with a \n except the last one, which actually it ends with the end of buffe ("\0" character). The output I need to parse is something like:
"text1 this is a multiple-word text\n text2 another text" (the second line does not have a newline)
As a result I want only two sub-expression in a line using a regex like:
(\w+)\s+([^\n]+)\n
The first submatch should be the first word ("text1" and "text2"), while the second submatch would be the rest of the line ("this is a multiple-word text" and "another text")
In my program I use regex_search with the boost::match_continuous option, all the other regex objects are created with the default options.
It depends whether you want to capture the newline character or not, if you do then:
(\w+)\s+([^\n]+)(?:\n|$)
would do the trick, otherwise if you don't want the newline character (just the contents of each whole line) then:
^(\w+)\s+([^\n]+)$
Used without the match_continuous flag would do it.
John. Thanks for your reply.
I will try... Best regards, Jordi
participants (2)
-
John Maddock
-
jordi