Regex failing to match postcodes
*** Before acting on this e-mail or opening any attachment you are advised to read the disclaimer at the end of this e-mail *** Hi all, I'm having some difficulty getting my regular expression working. Basically, I need to make sure that a UK postcode is valid. The postcode that is passed to my function sometimes has extra things with it such as: Wakefield, WF1 3RD Shrewsbury SY2 5PT Shropshire It now seems to be failing to find the postcode in the above examples. Also, when I pass my function a postcode that I know is invalid, such as JG2 7L5 it matches it as G2 7L5 instead of failing to do the match. The regular expression I'm using is below: (?:(?:(^|\s)+ A[BL]|B[ABDHLNRST]?| C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]?|F[KY]|G[LUY]?| H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?| N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?| T[0ADFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE) \d(?:\d|[A-Z])?\s+\d[A-Z]{2}($|\s)+) Can anyone tell me whats wrong with my expression? BTW I'm using boost 1.31.0 on VC++ 7.1, Windows XP. James Gunn Software Developer --LongSig Computer Bureau Communisis DM Manston Lane Crossgates Leeds LS15 8AH Telephone +44 (0)113 225 5306 Fax +44 (0)113 225 5921 Email James.Gunn@communisis-dm.co.uk ********************************************************************** Please note: This e-mail and its attachments contain only the opinions of the sender and do not necessarily reflect the policy(s) of the communisis group in general. Employees of the communisis group are required not to make any defamatory statements and not to infringe or authorise any infringement of copyright or any other legal right by e-mail. Any such communication is therefore outside the scope of employment of the individual concerned. The communisis group will not accept any liability in respect of such a communication. Confidentiality: This e-mail and any attachments, together with their contents, are confidential unless otherwise explicitly stated in writing by the sender of this e-mail and are for the intended recipient only. If they have come to you in error you must not take any action in respect of them, which includes but is not limited to reproducing, sending or storing them, other than to notifying the sender immediately of the mistake, and deleting the e-mail, any attachments and any reproductions made by replying to it. Viruses: This e-mail and any attachments have been scanned for viruses but we cannot guarantee that they are virus free. The recipient should check this e-mail and any attachments for viruses. The communisis group accepts no responsibility for any damage caused by any virus transmitted by this e-mail or any of its attachments. In the event of any unauthorised copying or forwarding, the recipient will be required to indemnify the communisis group against any claim for loss or damage caused by any viruses or otherwise. ********************************************************************** ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
On Mon, Jun 14, 2004 at 10:00:55AM +0100, James Gunn wrote:
*** Before acting on this e-mail or opening any attachment you are advised to read the disclaimer at the end of this e-mail ***
Hi all,
I'm having some difficulty getting my regular expression working. Basically, I need to make sure that a UK postcode is valid. The postcode that is passed to my function sometimes has extra things with it such as:
Wakefield, WF1 3RD Shrewsbury SY2 5PT Shropshire
Given that you've got messy data, have you considered matching everything that might be a valid post code and then checking that against a symbol table? That would reduce the complexity of your regex a lot. You could also Spirit, though the learning curve on that is a bit steep.
It now seems to be failing to find the postcode in the above examples. Also, when I pass my function a postcode that I know is invalid, such as JG2 7L5 it matches it as G2 7L5 instead of failing to do the match.
However, part of the problem with your regex is that J only matches if it is with JE (it's between I[GMPV] and K[ATWY]) Hope this helps. Here is the changed regex: --- cut --- (?:(?:(^|\s)+ A[BL]|B[ABDHLNRST]?| C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]?|F[KY]|G[LUY]?| H[ADGPRSUX]|I[GMPV]|J[GE]|K[ATWY]|L[ADELNSU]?|M[EKL]?| N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?| T[0ADFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE) \d(?:\d|[A-Z])?\s+\d[A-Z]{2}($|\s)+) --- cut --- -jbs
The regular expression I'm using is below:
(?:(?:(^|\s)+ A[BL]|B[ABDHLNRST]?| C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]?|F[KY]|G[LUY]?| H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?| N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]?| T[0ADFNQRSW]|UB|W[ACDFNRSV]?|YO|ZE) \d(?:\d|[A-Z])?\s+\d[A-Z]{2}($|\s)+)
Can anyone tell me whats wrong with my expression? BTW I'm using boost 1.31.0 on VC++ 7.1, Windows XP.
I'm having some difficulty getting my regular expression working. Basically, I need to make sure that a UK postcode is valid. The postcode that is passed to my function sometimes has extra things with it such as:
Have you looked at http://regexlib.com/REDetails.aspx?regexp_id=260 as it seems to get a high rating.
It now seems to be failing to find the postcode in the above examples. Also, when I pass my function a postcode that I know is invalid, such as JG2 7L5 it matches it as G2 7L5 instead of failing to do the match.
Try prefixing with \< to insist that the match starts at a word boundary. John.
participants (3)
-
James Gunn
-
John Maddock
-
Joshua B. Smith