How to grep for groups of n digits, but no more than n?“find” command and piping its output through another programDisplay files which contains a word on more then three lines?Looping through csv, removing patternsgrep recursively for specific filesWindows FINDSTR Ubuntu equivalent to search .csvproper syntax for grep: search a string, copy two rows above and transposeGet one element of path string using bashHow to use grep on all files non-recursively in a directory?How to use sed to modify last 3 digit of a line with random number when a string matches in a fileSearch for text in files of same name in multiple folders of various depths

D&D Monsters and Copyright

What kind of mission objective would make a parabolic escape trajectory desirable?

Was Wayne Brady considered a guest star on "Whose Line Is It Anyway?"

Why is the logical NOT operator in C-style languages "!" and not "~~"?

Does "Op. cit." stand for "opus citatum" or "opere citato"?

Lazy Approval In SharePoint Online

Does journal access significantly influence choice in which journal to publish in?

Can I color text by using an image, so that the color isn't flat?

Would it be easier to colonise a living world or a dead world?

Why are engines with carburetors hard to start in cold weather?

"Es gefällt ihm." How to identify similar exceptions?

Why didn't Snape ask Dumbledore why he let "Moody" search his office?

Actual cropping dimensions are "wrong" in PS

In Men at Arms, why announce Edward was caught?

Low-magic medieval fantasy clothes that allow the wearer to grow?

Why does transition from one electron shell to another shell always produce massless photon?

Modern warfare theory in a medieval setting

Conveying the idea of " judge a book by its cover" by " juger un livre par sa couverture"

Is there a simple way to apply a function to the RHS of a substitution?

What can I do to avoid potential charges for bribery?

Rat proofing compost bin but allowing worms in

How to make "acts of patience" exciting?

How to make a gift without seeming creepy?

Difference between $HOME and ~



How to grep for groups of n digits, but no more than n?


“find” command and piping its output through another programDisplay files which contains a word on more then three lines?Looping through csv, removing patternsgrep recursively for specific filesWindows FINDSTR Ubuntu equivalent to search .csvproper syntax for grep: search a string, copy two rows above and transposeGet one element of path string using bashHow to use grep on all files non-recursively in a directory?How to use sed to modify last 3 digit of a line with random number when a string matches in a fileSearch for text in files of same name in multiple folders of various depths






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









33















I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:




grep a line from a file which contains 4 numbers in a row but not more than 4.




I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.










share|improve this question





















  • 2





    Should a line like 1234a12345 be displayed, or not?

    – Eliah Kagan
    Oct 18 '14 at 22:17











  • @Buddha you need to explain your question along with an example.

    – Avinash Raj
    Oct 19 '14 at 2:14











  • if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries. bd4b

    – Avinash Raj
    Oct 19 '14 at 2:22






  • 1





    This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.

    – Eliah Kagan
    Oct 19 '14 at 16:34







  • 1





    Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999

    – Buddha
    Oct 20 '14 at 16:20


















33















I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:




grep a line from a file which contains 4 numbers in a row but not more than 4.




I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.










share|improve this question





















  • 2





    Should a line like 1234a12345 be displayed, or not?

    – Eliah Kagan
    Oct 18 '14 at 22:17











  • @Buddha you need to explain your question along with an example.

    – Avinash Raj
    Oct 19 '14 at 2:14











  • if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries. bd4b

    – Avinash Raj
    Oct 19 '14 at 2:22






  • 1





    This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.

    – Eliah Kagan
    Oct 19 '14 at 16:34







  • 1





    Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999

    – Buddha
    Oct 20 '14 at 16:20














33












33








33


7






I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:




grep a line from a file which contains 4 numbers in a row but not more than 4.




I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.










share|improve this question
















I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:




grep a line from a file which contains 4 numbers in a row but not more than 4.




I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.







command-line grep text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 19 '14 at 4:05









Eliah Kagan

90.8k23 gold badges251 silver badges397 bronze badges




90.8k23 gold badges251 silver badges397 bronze badges










asked Oct 18 '14 at 21:19









BuddhaBuddha

1741 gold badge2 silver badges4 bronze badges




1741 gold badge2 silver badges4 bronze badges










  • 2





    Should a line like 1234a12345 be displayed, or not?

    – Eliah Kagan
    Oct 18 '14 at 22:17











  • @Buddha you need to explain your question along with an example.

    – Avinash Raj
    Oct 19 '14 at 2:14











  • if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries. bd4b

    – Avinash Raj
    Oct 19 '14 at 2:22






  • 1





    This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.

    – Eliah Kagan
    Oct 19 '14 at 16:34







  • 1





    Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999

    – Buddha
    Oct 20 '14 at 16:20













  • 2





    Should a line like 1234a12345 be displayed, or not?

    – Eliah Kagan
    Oct 18 '14 at 22:17











  • @Buddha you need to explain your question along with an example.

    – Avinash Raj
    Oct 19 '14 at 2:14











  • if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries. bd4b

    – Avinash Raj
    Oct 19 '14 at 2:22






  • 1





    This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.

    – Eliah Kagan
    Oct 19 '14 at 16:34







  • 1





    Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999

    – Buddha
    Oct 20 '14 at 16:20








2




2





Should a line like 1234a12345 be displayed, or not?

– Eliah Kagan
Oct 18 '14 at 22:17





Should a line like 1234a12345 be displayed, or not?

– Eliah Kagan
Oct 18 '14 at 22:17













@Buddha you need to explain your question along with an example.

– Avinash Raj
Oct 19 '14 at 2:14





@Buddha you need to explain your question along with an example.

– Avinash Raj
Oct 19 '14 at 2:14













if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries. bd4b

– Avinash Raj
Oct 19 '14 at 2:22





if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries. bd4b

– Avinash Raj
Oct 19 '14 at 2:22




1




1





This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.

– Eliah Kagan
Oct 19 '14 at 16:34






This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.

– Eliah Kagan
Oct 19 '14 at 16:34





1




1





Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999

– Buddha
Oct 20 '14 at 16:20






Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999

– Buddha
Oct 20 '14 at 16:20











4 Answers
4






active

oldest

votes


















51
















There are two ways to interpret this question; I'll address both cases. You might want to display lines:



  1. that contain a sequence of four digits that is itself not part of any longer sequence of digits, or

  2. that contains a four-digit sequence but no longer sequence of digits (not even separately).

For example, (1) would display 1234a56789, but (2) wouldn't.




If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:



grep -P '(?<!d)d4(?!d)' file


This uses Perl regular expressions, which Ubuntu's grep (GNU grep) supports via -P. It won't match text like 12345, nor will it match the 1234 or 2345 that are part of it. But it will match the 1234 in 1234a56789.



In Perl regular expressions:




  • d means any digit (it's a short way to say [0-9] or [[:digit:]]).


  • x4 matches x 4 times. ( syntax isn't specific to Perl regular expressions; it's in extended regular expressions via grep -E as well.) So d4 is the same as dddd.


  • (?<!d) is a zero-width negative look-behind assertion. It means "unless preceded by d."


  • (?!d) is a zero-width negative look-ahead assertion. It means "unless followed by d."

(?<!d) and (?!d) don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.



Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.



One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color option).



ek@Io:~$ grep -P '(?&lt!d)d4(?!d)' &lt&lt&lt 12345abc789d0123e4
12345abc789d0123e4


By default in Ubuntu, each user has alias grep='grep --color=auto' in their ~.bashrc file. So you get color highlighting automatically when you run a simple command starting with grep (this is when aliases are expanded) and standard output is a terminal (this is what --color=auto checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
Screenshot showing that grep command, with 12345abc789d0123e4 as output, with the 0123 highlighted in red.



And you can even make grep print only matching text, and not the whole line, with -o:



ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
0123


Alternative Way, Without Look-Behind and Look-Ahead Assertions



However, if you:



  1. need a command that will also run on systems where grep doesn't support -P or otherwise don't want to use a Perl regular expression, and

  2. don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and

  3. are okay with a solution that is a bit less elegant

...then you can achieve this with an extended regular expression instead:



grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file


This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:




  • [0-9] matches any digit (like [[:digit:]], or d in Perl regular expressions) and 4 means "four times." So [0-9]4 matches a four-digit sequence.


  • [^0-9] matches characters not in the range of 0 through 9. It is equivalent to [^[:digit:]] (or D, in Perl regular expressions).


  • ^, when it doesn't appear in [ ] brackets, matches the beginning of a line. Similarly, $ matches the end of a line.


  • | means or and parentheses are for grouping (as in algebra). So (^|[^0-9]) matches the beginning of the line or a non-digit character, while ($|[^0-9]) matches the end of the line or a non-digit character.

So matches occur only in lines containing a four-digit sequence ([0-9]4) that is simultaneously:



  • at the beginning of the line or preceded by a non-digit ((^|[^0-9])), and

  • at the end of the line or followed by a non-digit (($|[^0-9])).


If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.



Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, greping for the two patterns separately.



You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d (and braces) in place of [0-9]:



grep -P 'd4' file | grep -Pv 'd5'


Since it uses [0-9], matt's way is more portable--it will work on systems where grep doesn't support Perl regular expressions. If you use [0-9] (or [[:digit:]]) instead of d, but continue to use , you get the portability of matt's way a bit more concisely:



grep -E '[0-9]4' file | grep -Ev '[0-9]5'


Alternative Way, With a Single Pattern



If you really do prefer a grep command that




  1. uses a single regular expression (not two greps separated by a pipe, as above)

  2. to display lines that contain at least one sequence of four digits,

  3. but no sequences of five (or more) digits,

  4. and you don't mind matching the whole line, not just the digits (you probably don't mind this)

...then you can use:



grep -Px '(d0,4D)*d4(Dd0,4)*' file


The -x flag makes grep display only lines where the entire line matches (rather than any line containing a match).



I've used a Perl regular expression because I think the brevity of d and D substantially increase clarity in this case. But if you need something portable to systems where grep doesn't support -P, you can replace them with [0-9] and [^0-9] (or with [[:digit:]] and [^[:digit]]):



grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file


The way these regular expressions work is:



  • In the middle, d4 or [0-9]4 matches one sequence of four digits. We may have more than one of these, but we need to have at least one.



  • On the left, (d0,4D)* or ([0-9]0,4[^0-9])* matches zero or more (*) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.



    Since the text immediately to the left of the central d4 (or [0-9]4) must either be empty or end with a non-digit, this prevents the central d4 from matching four digits that have a another (fifth) digit just to the left of them.




  • On the right, (Dd0,4)* or ([^0-9][0-9]0,4)* matches zero or more (*) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.



    Since the text immediately to the right of the central d4 (or [0-9]4) must either be empty or start with a non-digit, this prevents the central d4 from matching four digits that have another (fifth) digit just to the right of them.



This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.



It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5' (or similar) instead, as suggested above and in matt's answer.



With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).






share|improve this answer






















  • 1





    Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

    – Buddha
    Oct 20 '14 at 17:04



















8
















This will show you 4 numbers in a row but not more



grep '[0-9][0-9][0-9][0-9][^0-9]' file


Note the ^ means not



There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.



This uglier version however would work for that case



grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]





share|improve this answer



























  • oops, didnt need to be egrep - i've edited it

    – matt
    Oct 18 '14 at 22:23






  • 2





    The first one is wrong - it finds a12345b, because it matches 2345b.

    – Volker Siegel
    Oct 19 '14 at 1:28


















0
















If grep doesn't support perl regular expressions (-P), use the following shell command:



grep -w "$(printf '[0-9]%.0s' 1..4)" file


where printf '[0-9]%.0s' 1..4 will produce 4 times [0-9]. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4 with your number of your digits to look for).



Using -w will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a, then add [^0-9] at the end of the pattern, e.g.



grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file


Using $() is basically a command substitution. Check this post to see how printf repeats the pattern.






share|improve this answer
































    0
















    You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:



    grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file






    share|improve this answer


























      Your Answer








      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "89"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );














      draft saved

      draft discarded
















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f538730%2fhow-to-grep-for-groups-of-n-digits-but-no-more-than-n%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      51
















      There are two ways to interpret this question; I'll address both cases. You might want to display lines:



      1. that contain a sequence of four digits that is itself not part of any longer sequence of digits, or

      2. that contains a four-digit sequence but no longer sequence of digits (not even separately).

      For example, (1) would display 1234a56789, but (2) wouldn't.




      If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:



      grep -P '(?<!d)d4(?!d)' file


      This uses Perl regular expressions, which Ubuntu's grep (GNU grep) supports via -P. It won't match text like 12345, nor will it match the 1234 or 2345 that are part of it. But it will match the 1234 in 1234a56789.



      In Perl regular expressions:




      • d means any digit (it's a short way to say [0-9] or [[:digit:]]).


      • x4 matches x 4 times. ( syntax isn't specific to Perl regular expressions; it's in extended regular expressions via grep -E as well.) So d4 is the same as dddd.


      • (?<!d) is a zero-width negative look-behind assertion. It means "unless preceded by d."


      • (?!d) is a zero-width negative look-ahead assertion. It means "unless followed by d."

      (?<!d) and (?!d) don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.



      Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.



      One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color option).



      ek@Io:~$ grep -P '(?&lt!d)d4(?!d)' &lt&lt&lt 12345abc789d0123e4
      12345abc789d0123e4


      By default in Ubuntu, each user has alias grep='grep --color=auto' in their ~.bashrc file. So you get color highlighting automatically when you run a simple command starting with grep (this is when aliases are expanded) and standard output is a terminal (this is what --color=auto checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
      Screenshot showing that grep command, with 12345abc789d0123e4 as output, with the 0123 highlighted in red.



      And you can even make grep print only matching text, and not the whole line, with -o:



      ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
      0123


      Alternative Way, Without Look-Behind and Look-Ahead Assertions



      However, if you:



      1. need a command that will also run on systems where grep doesn't support -P or otherwise don't want to use a Perl regular expression, and

      2. don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and

      3. are okay with a solution that is a bit less elegant

      ...then you can achieve this with an extended regular expression instead:



      grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file


      This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:




      • [0-9] matches any digit (like [[:digit:]], or d in Perl regular expressions) and 4 means "four times." So [0-9]4 matches a four-digit sequence.


      • [^0-9] matches characters not in the range of 0 through 9. It is equivalent to [^[:digit:]] (or D, in Perl regular expressions).


      • ^, when it doesn't appear in [ ] brackets, matches the beginning of a line. Similarly, $ matches the end of a line.


      • | means or and parentheses are for grouping (as in algebra). So (^|[^0-9]) matches the beginning of the line or a non-digit character, while ($|[^0-9]) matches the end of the line or a non-digit character.

      So matches occur only in lines containing a four-digit sequence ([0-9]4) that is simultaneously:



      • at the beginning of the line or preceded by a non-digit ((^|[^0-9])), and

      • at the end of the line or followed by a non-digit (($|[^0-9])).


      If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.



      Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, greping for the two patterns separately.



      You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d (and braces) in place of [0-9]:



      grep -P 'd4' file | grep -Pv 'd5'


      Since it uses [0-9], matt's way is more portable--it will work on systems where grep doesn't support Perl regular expressions. If you use [0-9] (or [[:digit:]]) instead of d, but continue to use , you get the portability of matt's way a bit more concisely:



      grep -E '[0-9]4' file | grep -Ev '[0-9]5'


      Alternative Way, With a Single Pattern



      If you really do prefer a grep command that




      1. uses a single regular expression (not two greps separated by a pipe, as above)

      2. to display lines that contain at least one sequence of four digits,

      3. but no sequences of five (or more) digits,

      4. and you don't mind matching the whole line, not just the digits (you probably don't mind this)

      ...then you can use:



      grep -Px '(d0,4D)*d4(Dd0,4)*' file


      The -x flag makes grep display only lines where the entire line matches (rather than any line containing a match).



      I've used a Perl regular expression because I think the brevity of d and D substantially increase clarity in this case. But if you need something portable to systems where grep doesn't support -P, you can replace them with [0-9] and [^0-9] (or with [[:digit:]] and [^[:digit]]):



      grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file


      The way these regular expressions work is:



      • In the middle, d4 or [0-9]4 matches one sequence of four digits. We may have more than one of these, but we need to have at least one.



      • On the left, (d0,4D)* or ([0-9]0,4[^0-9])* matches zero or more (*) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the left of the central d4 (or [0-9]4) must either be empty or end with a non-digit, this prevents the central d4 from matching four digits that have a another (fifth) digit just to the left of them.




      • On the right, (Dd0,4)* or ([^0-9][0-9]0,4)* matches zero or more (*) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the right of the central d4 (or [0-9]4) must either be empty or start with a non-digit, this prevents the central d4 from matching four digits that have another (fifth) digit just to the right of them.



      This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.



      It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5' (or similar) instead, as suggested above and in matt's answer.



      With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).






      share|improve this answer






















      • 1





        Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

        – Buddha
        Oct 20 '14 at 17:04
















      51
















      There are two ways to interpret this question; I'll address both cases. You might want to display lines:



      1. that contain a sequence of four digits that is itself not part of any longer sequence of digits, or

      2. that contains a four-digit sequence but no longer sequence of digits (not even separately).

      For example, (1) would display 1234a56789, but (2) wouldn't.




      If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:



      grep -P '(?<!d)d4(?!d)' file


      This uses Perl regular expressions, which Ubuntu's grep (GNU grep) supports via -P. It won't match text like 12345, nor will it match the 1234 or 2345 that are part of it. But it will match the 1234 in 1234a56789.



      In Perl regular expressions:




      • d means any digit (it's a short way to say [0-9] or [[:digit:]]).


      • x4 matches x 4 times. ( syntax isn't specific to Perl regular expressions; it's in extended regular expressions via grep -E as well.) So d4 is the same as dddd.


      • (?<!d) is a zero-width negative look-behind assertion. It means "unless preceded by d."


      • (?!d) is a zero-width negative look-ahead assertion. It means "unless followed by d."

      (?<!d) and (?!d) don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.



      Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.



      One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color option).



      ek@Io:~$ grep -P '(?&lt!d)d4(?!d)' &lt&lt&lt 12345abc789d0123e4
      12345abc789d0123e4


      By default in Ubuntu, each user has alias grep='grep --color=auto' in their ~.bashrc file. So you get color highlighting automatically when you run a simple command starting with grep (this is when aliases are expanded) and standard output is a terminal (this is what --color=auto checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
      Screenshot showing that grep command, with 12345abc789d0123e4 as output, with the 0123 highlighted in red.



      And you can even make grep print only matching text, and not the whole line, with -o:



      ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
      0123


      Alternative Way, Without Look-Behind and Look-Ahead Assertions



      However, if you:



      1. need a command that will also run on systems where grep doesn't support -P or otherwise don't want to use a Perl regular expression, and

      2. don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and

      3. are okay with a solution that is a bit less elegant

      ...then you can achieve this with an extended regular expression instead:



      grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file


      This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:




      • [0-9] matches any digit (like [[:digit:]], or d in Perl regular expressions) and 4 means "four times." So [0-9]4 matches a four-digit sequence.


      • [^0-9] matches characters not in the range of 0 through 9. It is equivalent to [^[:digit:]] (or D, in Perl regular expressions).


      • ^, when it doesn't appear in [ ] brackets, matches the beginning of a line. Similarly, $ matches the end of a line.


      • | means or and parentheses are for grouping (as in algebra). So (^|[^0-9]) matches the beginning of the line or a non-digit character, while ($|[^0-9]) matches the end of the line or a non-digit character.

      So matches occur only in lines containing a four-digit sequence ([0-9]4) that is simultaneously:



      • at the beginning of the line or preceded by a non-digit ((^|[^0-9])), and

      • at the end of the line or followed by a non-digit (($|[^0-9])).


      If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.



      Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, greping for the two patterns separately.



      You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d (and braces) in place of [0-9]:



      grep -P 'd4' file | grep -Pv 'd5'


      Since it uses [0-9], matt's way is more portable--it will work on systems where grep doesn't support Perl regular expressions. If you use [0-9] (or [[:digit:]]) instead of d, but continue to use , you get the portability of matt's way a bit more concisely:



      grep -E '[0-9]4' file | grep -Ev '[0-9]5'


      Alternative Way, With a Single Pattern



      If you really do prefer a grep command that




      1. uses a single regular expression (not two greps separated by a pipe, as above)

      2. to display lines that contain at least one sequence of four digits,

      3. but no sequences of five (or more) digits,

      4. and you don't mind matching the whole line, not just the digits (you probably don't mind this)

      ...then you can use:



      grep -Px '(d0,4D)*d4(Dd0,4)*' file


      The -x flag makes grep display only lines where the entire line matches (rather than any line containing a match).



      I've used a Perl regular expression because I think the brevity of d and D substantially increase clarity in this case. But if you need something portable to systems where grep doesn't support -P, you can replace them with [0-9] and [^0-9] (or with [[:digit:]] and [^[:digit]]):



      grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file


      The way these regular expressions work is:



      • In the middle, d4 or [0-9]4 matches one sequence of four digits. We may have more than one of these, but we need to have at least one.



      • On the left, (d0,4D)* or ([0-9]0,4[^0-9])* matches zero or more (*) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the left of the central d4 (or [0-9]4) must either be empty or end with a non-digit, this prevents the central d4 from matching four digits that have a another (fifth) digit just to the left of them.




      • On the right, (Dd0,4)* or ([^0-9][0-9]0,4)* matches zero or more (*) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the right of the central d4 (or [0-9]4) must either be empty or start with a non-digit, this prevents the central d4 from matching four digits that have another (fifth) digit just to the right of them.



      This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.



      It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5' (or similar) instead, as suggested above and in matt's answer.



      With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).






      share|improve this answer






















      • 1





        Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

        – Buddha
        Oct 20 '14 at 17:04














      51














      51










      51









      There are two ways to interpret this question; I'll address both cases. You might want to display lines:



      1. that contain a sequence of four digits that is itself not part of any longer sequence of digits, or

      2. that contains a four-digit sequence but no longer sequence of digits (not even separately).

      For example, (1) would display 1234a56789, but (2) wouldn't.




      If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:



      grep -P '(?<!d)d4(?!d)' file


      This uses Perl regular expressions, which Ubuntu's grep (GNU grep) supports via -P. It won't match text like 12345, nor will it match the 1234 or 2345 that are part of it. But it will match the 1234 in 1234a56789.



      In Perl regular expressions:




      • d means any digit (it's a short way to say [0-9] or [[:digit:]]).


      • x4 matches x 4 times. ( syntax isn't specific to Perl regular expressions; it's in extended regular expressions via grep -E as well.) So d4 is the same as dddd.


      • (?<!d) is a zero-width negative look-behind assertion. It means "unless preceded by d."


      • (?!d) is a zero-width negative look-ahead assertion. It means "unless followed by d."

      (?<!d) and (?!d) don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.



      Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.



      One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color option).



      ek@Io:~$ grep -P '(?&lt!d)d4(?!d)' &lt&lt&lt 12345abc789d0123e4
      12345abc789d0123e4


      By default in Ubuntu, each user has alias grep='grep --color=auto' in their ~.bashrc file. So you get color highlighting automatically when you run a simple command starting with grep (this is when aliases are expanded) and standard output is a terminal (this is what --color=auto checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
      Screenshot showing that grep command, with 12345abc789d0123e4 as output, with the 0123 highlighted in red.



      And you can even make grep print only matching text, and not the whole line, with -o:



      ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
      0123


      Alternative Way, Without Look-Behind and Look-Ahead Assertions



      However, if you:



      1. need a command that will also run on systems where grep doesn't support -P or otherwise don't want to use a Perl regular expression, and

      2. don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and

      3. are okay with a solution that is a bit less elegant

      ...then you can achieve this with an extended regular expression instead:



      grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file


      This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:




      • [0-9] matches any digit (like [[:digit:]], or d in Perl regular expressions) and 4 means "four times." So [0-9]4 matches a four-digit sequence.


      • [^0-9] matches characters not in the range of 0 through 9. It is equivalent to [^[:digit:]] (or D, in Perl regular expressions).


      • ^, when it doesn't appear in [ ] brackets, matches the beginning of a line. Similarly, $ matches the end of a line.


      • | means or and parentheses are for grouping (as in algebra). So (^|[^0-9]) matches the beginning of the line or a non-digit character, while ($|[^0-9]) matches the end of the line or a non-digit character.

      So matches occur only in lines containing a four-digit sequence ([0-9]4) that is simultaneously:



      • at the beginning of the line or preceded by a non-digit ((^|[^0-9])), and

      • at the end of the line or followed by a non-digit (($|[^0-9])).


      If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.



      Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, greping for the two patterns separately.



      You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d (and braces) in place of [0-9]:



      grep -P 'd4' file | grep -Pv 'd5'


      Since it uses [0-9], matt's way is more portable--it will work on systems where grep doesn't support Perl regular expressions. If you use [0-9] (or [[:digit:]]) instead of d, but continue to use , you get the portability of matt's way a bit more concisely:



      grep -E '[0-9]4' file | grep -Ev '[0-9]5'


      Alternative Way, With a Single Pattern



      If you really do prefer a grep command that




      1. uses a single regular expression (not two greps separated by a pipe, as above)

      2. to display lines that contain at least one sequence of four digits,

      3. but no sequences of five (or more) digits,

      4. and you don't mind matching the whole line, not just the digits (you probably don't mind this)

      ...then you can use:



      grep -Px '(d0,4D)*d4(Dd0,4)*' file


      The -x flag makes grep display only lines where the entire line matches (rather than any line containing a match).



      I've used a Perl regular expression because I think the brevity of d and D substantially increase clarity in this case. But if you need something portable to systems where grep doesn't support -P, you can replace them with [0-9] and [^0-9] (or with [[:digit:]] and [^[:digit]]):



      grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file


      The way these regular expressions work is:



      • In the middle, d4 or [0-9]4 matches one sequence of four digits. We may have more than one of these, but we need to have at least one.



      • On the left, (d0,4D)* or ([0-9]0,4[^0-9])* matches zero or more (*) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the left of the central d4 (or [0-9]4) must either be empty or end with a non-digit, this prevents the central d4 from matching four digits that have a another (fifth) digit just to the left of them.




      • On the right, (Dd0,4)* or ([^0-9][0-9]0,4)* matches zero or more (*) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the right of the central d4 (or [0-9]4) must either be empty or start with a non-digit, this prevents the central d4 from matching four digits that have another (fifth) digit just to the right of them.



      This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.



      It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5' (or similar) instead, as suggested above and in matt's answer.



      With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).






      share|improve this answer















      There are two ways to interpret this question; I'll address both cases. You might want to display lines:



      1. that contain a sequence of four digits that is itself not part of any longer sequence of digits, or

      2. that contains a four-digit sequence but no longer sequence of digits (not even separately).

      For example, (1) would display 1234a56789, but (2) wouldn't.




      If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:



      grep -P '(?<!d)d4(?!d)' file


      This uses Perl regular expressions, which Ubuntu's grep (GNU grep) supports via -P. It won't match text like 12345, nor will it match the 1234 or 2345 that are part of it. But it will match the 1234 in 1234a56789.



      In Perl regular expressions:




      • d means any digit (it's a short way to say [0-9] or [[:digit:]]).


      • x4 matches x 4 times. ( syntax isn't specific to Perl regular expressions; it's in extended regular expressions via grep -E as well.) So d4 is the same as dddd.


      • (?<!d) is a zero-width negative look-behind assertion. It means "unless preceded by d."


      • (?!d) is a zero-width negative look-ahead assertion. It means "unless followed by d."

      (?<!d) and (?!d) don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.



      Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.



      One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color option).



      ek@Io:~$ grep -P '(?&lt!d)d4(?!d)' &lt&lt&lt 12345abc789d0123e4
      12345abc789d0123e4


      By default in Ubuntu, each user has alias grep='grep --color=auto' in their ~.bashrc file. So you get color highlighting automatically when you run a simple command starting with grep (this is when aliases are expanded) and standard output is a terminal (this is what --color=auto checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
      Screenshot showing that grep command, with 12345abc789d0123e4 as output, with the 0123 highlighted in red.



      And you can even make grep print only matching text, and not the whole line, with -o:



      ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
      0123


      Alternative Way, Without Look-Behind and Look-Ahead Assertions



      However, if you:



      1. need a command that will also run on systems where grep doesn't support -P or otherwise don't want to use a Perl regular expression, and

      2. don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and

      3. are okay with a solution that is a bit less elegant

      ...then you can achieve this with an extended regular expression instead:



      grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file


      This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:




      • [0-9] matches any digit (like [[:digit:]], or d in Perl regular expressions) and 4 means "four times." So [0-9]4 matches a four-digit sequence.


      • [^0-9] matches characters not in the range of 0 through 9. It is equivalent to [^[:digit:]] (or D, in Perl regular expressions).


      • ^, when it doesn't appear in [ ] brackets, matches the beginning of a line. Similarly, $ matches the end of a line.


      • | means or and parentheses are for grouping (as in algebra). So (^|[^0-9]) matches the beginning of the line or a non-digit character, while ($|[^0-9]) matches the end of the line or a non-digit character.

      So matches occur only in lines containing a four-digit sequence ([0-9]4) that is simultaneously:



      • at the beginning of the line or preceded by a non-digit ((^|[^0-9])), and

      • at the end of the line or followed by a non-digit (($|[^0-9])).


      If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.



      Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, greping for the two patterns separately.



      You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d (and braces) in place of [0-9]:



      grep -P 'd4' file | grep -Pv 'd5'


      Since it uses [0-9], matt's way is more portable--it will work on systems where grep doesn't support Perl regular expressions. If you use [0-9] (or [[:digit:]]) instead of d, but continue to use , you get the portability of matt's way a bit more concisely:



      grep -E '[0-9]4' file | grep -Ev '[0-9]5'


      Alternative Way, With a Single Pattern



      If you really do prefer a grep command that




      1. uses a single regular expression (not two greps separated by a pipe, as above)

      2. to display lines that contain at least one sequence of four digits,

      3. but no sequences of five (or more) digits,

      4. and you don't mind matching the whole line, not just the digits (you probably don't mind this)

      ...then you can use:



      grep -Px '(d0,4D)*d4(Dd0,4)*' file


      The -x flag makes grep display only lines where the entire line matches (rather than any line containing a match).



      I've used a Perl regular expression because I think the brevity of d and D substantially increase clarity in this case. But if you need something portable to systems where grep doesn't support -P, you can replace them with [0-9] and [^0-9] (or with [[:digit:]] and [^[:digit]]):



      grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file


      The way these regular expressions work is:



      • In the middle, d4 or [0-9]4 matches one sequence of four digits. We may have more than one of these, but we need to have at least one.



      • On the left, (d0,4D)* or ([0-9]0,4[^0-9])* matches zero or more (*) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the left of the central d4 (or [0-9]4) must either be empty or end with a non-digit, this prevents the central d4 from matching four digits that have a another (fifth) digit just to the left of them.




      • On the right, (Dd0,4)* or ([^0-9][0-9]0,4)* matches zero or more (*) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.



        Since the text immediately to the right of the central d4 (or [0-9]4) must either be empty or start with a non-digit, this prevents the central d4 from matching four digits that have another (fifth) digit just to the right of them.



      This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.



      It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5' (or similar) instead, as suggested above and in matt's answer.



      With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Apr 13 '17 at 12:23









      Community

      1




      1










      answered Oct 18 '14 at 23:36









      Eliah KaganEliah Kagan

      90.8k23 gold badges251 silver badges397 bronze badges




      90.8k23 gold badges251 silver badges397 bronze badges










      • 1





        Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

        – Buddha
        Oct 20 '14 at 17:04













      • 1





        Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

        – Buddha
        Oct 20 '14 at 17:04








      1




      1





      Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

      – Buddha
      Oct 20 '14 at 17:04






      Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.

      – Buddha
      Oct 20 '14 at 17:04














      8
















      This will show you 4 numbers in a row but not more



      grep '[0-9][0-9][0-9][0-9][^0-9]' file


      Note the ^ means not



      There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.



      This uglier version however would work for that case



      grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]





      share|improve this answer



























      • oops, didnt need to be egrep - i've edited it

        – matt
        Oct 18 '14 at 22:23






      • 2





        The first one is wrong - it finds a12345b, because it matches 2345b.

        – Volker Siegel
        Oct 19 '14 at 1:28















      8
















      This will show you 4 numbers in a row but not more



      grep '[0-9][0-9][0-9][0-9][^0-9]' file


      Note the ^ means not



      There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.



      This uglier version however would work for that case



      grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]





      share|improve this answer



























      • oops, didnt need to be egrep - i've edited it

        – matt
        Oct 18 '14 at 22:23






      • 2





        The first one is wrong - it finds a12345b, because it matches 2345b.

        – Volker Siegel
        Oct 19 '14 at 1:28













      8














      8










      8









      This will show you 4 numbers in a row but not more



      grep '[0-9][0-9][0-9][0-9][^0-9]' file


      Note the ^ means not



      There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.



      This uglier version however would work for that case



      grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]





      share|improve this answer















      This will show you 4 numbers in a row but not more



      grep '[0-9][0-9][0-9][0-9][^0-9]' file


      Note the ^ means not



      There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.



      This uglier version however would work for that case



      grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Oct 18 '14 at 22:22

























      answered Oct 18 '14 at 21:44









      mattmatt

      1311 gold badge1 silver badge6 bronze badges




      1311 gold badge1 silver badge6 bronze badges















      • oops, didnt need to be egrep - i've edited it

        – matt
        Oct 18 '14 at 22:23






      • 2





        The first one is wrong - it finds a12345b, because it matches 2345b.

        – Volker Siegel
        Oct 19 '14 at 1:28

















      • oops, didnt need to be egrep - i've edited it

        – matt
        Oct 18 '14 at 22:23






      • 2





        The first one is wrong - it finds a12345b, because it matches 2345b.

        – Volker Siegel
        Oct 19 '14 at 1:28
















      oops, didnt need to be egrep - i've edited it

      – matt
      Oct 18 '14 at 22:23





      oops, didnt need to be egrep - i've edited it

      – matt
      Oct 18 '14 at 22:23




      2




      2





      The first one is wrong - it finds a12345b, because it matches 2345b.

      – Volker Siegel
      Oct 19 '14 at 1:28





      The first one is wrong - it finds a12345b, because it matches 2345b.

      – Volker Siegel
      Oct 19 '14 at 1:28











      0
















      If grep doesn't support perl regular expressions (-P), use the following shell command:



      grep -w "$(printf '[0-9]%.0s' 1..4)" file


      where printf '[0-9]%.0s' 1..4 will produce 4 times [0-9]. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4 with your number of your digits to look for).



      Using -w will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a, then add [^0-9] at the end of the pattern, e.g.



      grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file


      Using $() is basically a command substitution. Check this post to see how printf repeats the pattern.






      share|improve this answer





























        0
















        If grep doesn't support perl regular expressions (-P), use the following shell command:



        grep -w "$(printf '[0-9]%.0s' 1..4)" file


        where printf '[0-9]%.0s' 1..4 will produce 4 times [0-9]. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4 with your number of your digits to look for).



        Using -w will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a, then add [^0-9] at the end of the pattern, e.g.



        grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file


        Using $() is basically a command substitution. Check this post to see how printf repeats the pattern.






        share|improve this answer



























          0














          0










          0









          If grep doesn't support perl regular expressions (-P), use the following shell command:



          grep -w "$(printf '[0-9]%.0s' 1..4)" file


          where printf '[0-9]%.0s' 1..4 will produce 4 times [0-9]. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4 with your number of your digits to look for).



          Using -w will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a, then add [^0-9] at the end of the pattern, e.g.



          grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file


          Using $() is basically a command substitution. Check this post to see how printf repeats the pattern.






          share|improve this answer













          If grep doesn't support perl regular expressions (-P), use the following shell command:



          grep -w "$(printf '[0-9]%.0s' 1..4)" file


          where printf '[0-9]%.0s' 1..4 will produce 4 times [0-9]. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4 with your number of your digits to look for).



          Using -w will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a, then add [^0-9] at the end of the pattern, e.g.



          grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file


          Using $() is basically a command substitution. Check this post to see how printf repeats the pattern.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 3 '18 at 17:18









          kenorbkenorb

          5,4821 gold badge45 silver badges67 bronze badges




          5,4821 gold badge45 silver badges67 bronze badges
























              0
















              You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:



              grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file






              share|improve this answer





























                0
















                You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:



                grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file






                share|improve this answer



























                  0














                  0










                  0









                  You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:



                  grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file






                  share|improve this answer













                  You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:



                  grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 17 at 17:53









                  Mike TysonMike Tyson

                  1




                  1































                      draft saved

                      draft discarded















































                      Thanks for contributing an answer to Ask Ubuntu!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f538730%2fhow-to-grep-for-groups-of-n-digits-but-no-more-than-n%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Tamil (spriik) Luke uk diar | Nawigatjuun

                      Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                      Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?