How to grep for groups of n digits, but no more than n?“find” command and piping its output through another programDisplay files which contains a word on more then three lines?Looping through csv, removing patternsgrep recursively for specific filesWindows FINDSTR Ubuntu equivalent to search .csvproper syntax for grep: search a string, copy two rows above and transposeGet one element of path string using bashHow to use grep on all files non-recursively in a directory?How to use sed to modify last 3 digit of a line with random number when a string matches in a fileSearch for text in files of same name in multiple folders of various depths
D&D Monsters and Copyright
What kind of mission objective would make a parabolic escape trajectory desirable?
Was Wayne Brady considered a guest star on "Whose Line Is It Anyway?"
Why is the logical NOT operator in C-style languages "!" and not "~~"?
Does "Op. cit." stand for "opus citatum" or "opere citato"?
Lazy Approval In SharePoint Online
Does journal access significantly influence choice in which journal to publish in?
Can I color text by using an image, so that the color isn't flat?
Would it be easier to colonise a living world or a dead world?
Why are engines with carburetors hard to start in cold weather?
"Es gefällt ihm." How to identify similar exceptions?
Why didn't Snape ask Dumbledore why he let "Moody" search his office?
Actual cropping dimensions are "wrong" in PS
In Men at Arms, why announce Edward was caught?
Low-magic medieval fantasy clothes that allow the wearer to grow?
Why does transition from one electron shell to another shell always produce massless photon?
Modern warfare theory in a medieval setting
Conveying the idea of " judge a book by its cover" by " juger un livre par sa couverture"
Is there a simple way to apply a function to the RHS of a substitution?
What can I do to avoid potential charges for bribery?
Rat proofing compost bin but allowing worms in
How to make "acts of patience" exciting?
How to make a gift without seeming creepy?
Difference between $HOME and ~
How to grep for groups of n digits, but no more than n?
“find” command and piping its output through another programDisplay files which contains a word on more then three lines?Looping through csv, removing patternsgrep recursively for specific filesWindows FINDSTR Ubuntu equivalent to search .csvproper syntax for grep: search a string, copy two rows above and transposeGet one element of path string using bashHow to use grep on all files non-recursively in a directory?How to use sed to modify last 3 digit of a line with random number when a string matches in a fileSearch for text in files of same name in multiple folders of various depths
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:
grep a line from a file which contains 4 numbers in a row but not more than 4.
I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.
command-line grep text-processing
|
show 1 more comment
I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:
grep a line from a file which contains 4 numbers in a row but not more than 4.
I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.
command-line grep text-processing
2
Should a line like1234a12345
be displayed, or not?
– Eliah Kagan
Oct 18 '14 at 22:17
@Buddha you need to explain your question along with an example.
– Avinash Raj
Oct 19 '14 at 2:14
if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries.bd4b
– Avinash Raj
Oct 19 '14 at 2:22
1
This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.
– Eliah Kagan
Oct 19 '14 at 16:34
1
Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999
– Buddha
Oct 20 '14 at 16:20
|
show 1 more comment
I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:
grep a line from a file which contains 4 numbers in a row but not more than 4.
I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.
command-line grep text-processing
I'm learning Linux, and I have a challenge that I seem to fail to solve on my own. Here it is:
grep a line from a file which contains 4 numbers in a row but not more than 4.
I'm not sure how to approach this. I can search for specific numbers but not their amount in a string.
command-line grep text-processing
command-line grep text-processing
edited Oct 19 '14 at 4:05
Eliah Kagan
90.8k23 gold badges251 silver badges397 bronze badges
90.8k23 gold badges251 silver badges397 bronze badges
asked Oct 18 '14 at 21:19
BuddhaBuddha
1741 gold badge2 silver badges4 bronze badges
1741 gold badge2 silver badges4 bronze badges
2
Should a line like1234a12345
be displayed, or not?
– Eliah Kagan
Oct 18 '14 at 22:17
@Buddha you need to explain your question along with an example.
– Avinash Raj
Oct 19 '14 at 2:14
if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries.bd4b
– Avinash Raj
Oct 19 '14 at 2:22
1
This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.
– Eliah Kagan
Oct 19 '14 at 16:34
1
Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999
– Buddha
Oct 20 '14 at 16:20
|
show 1 more comment
2
Should a line like1234a12345
be displayed, or not?
– Eliah Kagan
Oct 18 '14 at 22:17
@Buddha you need to explain your question along with an example.
– Avinash Raj
Oct 19 '14 at 2:14
if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries.bd4b
– Avinash Raj
Oct 19 '14 at 2:22
1
This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.
– Eliah Kagan
Oct 19 '14 at 16:34
1
Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999
– Buddha
Oct 20 '14 at 16:20
2
2
Should a line like
1234a12345
be displayed, or not?– Eliah Kagan
Oct 18 '14 at 22:17
Should a line like
1234a12345
be displayed, or not?– Eliah Kagan
Oct 18 '14 at 22:17
@Buddha you need to explain your question along with an example.
– Avinash Raj
Oct 19 '14 at 2:14
@Buddha you need to explain your question along with an example.
– Avinash Raj
Oct 19 '14 at 2:14
if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries.
bd4b
– Avinash Raj
Oct 19 '14 at 2:22
if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries.
bd4b
– Avinash Raj
Oct 19 '14 at 2:22
1
1
This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.
– Eliah Kagan
Oct 19 '14 at 16:34
This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.
– Eliah Kagan
Oct 19 '14 at 16:34
1
1
Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999
– Buddha
Oct 20 '14 at 16:20
Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999
– Buddha
Oct 20 '14 at 16:20
|
show 1 more comment
4 Answers
4
active
oldest
votes
There are two ways to interpret this question; I'll address both cases. You might want to display lines:
- that contain a sequence of four digits that is itself not part of any longer sequence of digits, or
- that contains a four-digit sequence but no longer sequence of digits (not even separately).
For example, (1) would display 1234a56789
, but (2) wouldn't.
If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:
grep -P '(?<!d)d4(?!d)' file
This uses Perl regular expressions, which Ubuntu's grep
(GNU grep) supports via -P
. It won't match text like 12345
, nor will it match the 1234
or 2345
that are part of it. But it will match the 1234
in 1234a56789
.
In Perl regular expressions:
d
means any digit (it's a short way to say[0-9]
or[[:digit:]]
).x4
matchesx
4 times. (syntax isn't specific to Perl regular expressions; it's in extended regular expressions via
grep -E
as well.) Sod4
is the same asdddd
.(?<!d)
is a zero-width negative look-behind assertion. It means "unless preceded byd
."(?!d)
is a zero-width negative look-ahead assertion. It means "unless followed byd
."
(?<!d)
and (?!d)
don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.
Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.
One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color
option).
ek@Io:~$ grep -P '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
12345abc789d0123e4
By default in Ubuntu, each user has alias grep='grep --color=auto'
in their ~.bashrc
file. So you get color highlighting automatically when you run a simple command starting with grep
(this is when aliases are expanded) and standard output is a terminal (this is what --color=auto
checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
And you can even make grep
print only matching text, and not the whole line, with -o
:
ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
0123
Alternative Way, Without Look-Behind and Look-Ahead Assertions
However, if you:
- need a command that will also run on systems where
grep
doesn't support-P
or otherwise don't want to use a Perl regular expression, and - don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and
- are okay with a solution that is a bit less elegant
...then you can achieve this with an extended regular expression instead:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:
[0-9]
matches any digit (like[[:digit:]]
, ord
in Perl regular expressions) and4
means "four times." So[0-9]4
matches a four-digit sequence.[^0-9]
matches characters not in the range of0
through9
. It is equivalent to[^[:digit:]]
(orD
, in Perl regular expressions).^
, when it doesn't appear in[
]
brackets, matches the beginning of a line. Similarly,$
matches the end of a line.|
means or and parentheses are for grouping (as in algebra). So(^|[^0-9])
matches the beginning of the line or a non-digit character, while($|[^0-9])
matches the end of the line or a non-digit character.
So matches occur only in lines containing a four-digit sequence ([0-9]4
) that is simultaneously:
- at the beginning of the line or preceded by a non-digit (
(^|[^0-9])
), and - at the end of the line or followed by a non-digit (
($|[^0-9])
).
If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.
Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, grep
ing for the two patterns separately.
You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d
(and braces) in place of [0-9]
:
grep -P 'd4' file | grep -Pv 'd5'
Since it uses [0-9]
, matt's way is more portable--it will work on systems where grep
doesn't support Perl regular expressions. If you use [0-9]
(or [[:digit:]]
) instead of d
, but continue to use
, you get the portability of matt's way a bit more concisely:
grep -E '[0-9]4' file | grep -Ev '[0-9]5'
Alternative Way, With a Single Pattern
If you really do prefer a grep
command that
uses a single regular expression (not twogrep
s separated by a pipe, as above)- to display lines that contain at least one sequence of four digits,
- but no sequences of five (or more) digits,
- and you don't mind matching the whole line, not just the digits (you probably don't mind this)
...then you can use:
grep -Px '(d0,4D)*d4(Dd0,4)*' file
The -x
flag makes grep
display only lines where the entire line matches (rather than any line containing a match).
I've used a Perl regular expression because I think the brevity of d
and D
substantially increase clarity in this case. But if you need something portable to systems where grep
doesn't support -P
, you can replace them with [0-9]
and [^0-9]
(or with [[:digit:]]
and [^[:digit]]
):
grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file
The way these regular expressions work is:
In the middle,
d4
or[0-9]4
matches one sequence of four digits. We may have more than one of these, but we need to have at least one.On the left,
(d0,4D)*
or([0-9]0,4[^0-9])*
matches zero or more (*
) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the left of the central
d4
(or[0-9]4
) must either be empty or end with a non-digit, this prevents the centrald4
from matching four digits that have a another (fifth) digit just to the left of them.On the right,
(Dd0,4)*
or([^0-9][0-9]0,4)*
matches zero or more (*
) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the right of the central
d4
(or[0-9]4
) must either be empty or start with a non-digit, this prevents the centrald4
from matching four digits that have another (fifth) digit just to the right of them.
This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.
It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5'
(or similar) instead, as suggested above and in matt's answer.
With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).
1
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
add a comment
|
This will show you 4 numbers in a row but not more
grep '[0-9][0-9][0-9][0-9][^0-9]' file
Note the ^ means not
There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.
This uglier version however would work for that case
grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
2
The first one is wrong - it findsa12345b
, because it matches2345b
.
– Volker Siegel
Oct 19 '14 at 1:28
add a comment
|
If grep
doesn't support perl regular expressions (-P
), use the following shell command:
grep -w "$(printf '[0-9]%.0s' 1..4)" file
where printf '[0-9]%.0s' 1..4
will produce 4 times [0-9]
. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4
with your number of your digits to look for).
Using -w
will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a
, then add [^0-9]
at the end of the pattern, e.g.
grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file
Using $()
is basically a command substitution. Check this post to see how printf
repeats the pattern.
add a comment
|
You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f538730%2fhow-to-grep-for-groups-of-n-digits-but-no-more-than-n%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
There are two ways to interpret this question; I'll address both cases. You might want to display lines:
- that contain a sequence of four digits that is itself not part of any longer sequence of digits, or
- that contains a four-digit sequence but no longer sequence of digits (not even separately).
For example, (1) would display 1234a56789
, but (2) wouldn't.
If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:
grep -P '(?<!d)d4(?!d)' file
This uses Perl regular expressions, which Ubuntu's grep
(GNU grep) supports via -P
. It won't match text like 12345
, nor will it match the 1234
or 2345
that are part of it. But it will match the 1234
in 1234a56789
.
In Perl regular expressions:
d
means any digit (it's a short way to say[0-9]
or[[:digit:]]
).x4
matchesx
4 times. (syntax isn't specific to Perl regular expressions; it's in extended regular expressions via
grep -E
as well.) Sod4
is the same asdddd
.(?<!d)
is a zero-width negative look-behind assertion. It means "unless preceded byd
."(?!d)
is a zero-width negative look-ahead assertion. It means "unless followed byd
."
(?<!d)
and (?!d)
don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.
Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.
One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color
option).
ek@Io:~$ grep -P '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
12345abc789d0123e4
By default in Ubuntu, each user has alias grep='grep --color=auto'
in their ~.bashrc
file. So you get color highlighting automatically when you run a simple command starting with grep
(this is when aliases are expanded) and standard output is a terminal (this is what --color=auto
checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
And you can even make grep
print only matching text, and not the whole line, with -o
:
ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
0123
Alternative Way, Without Look-Behind and Look-Ahead Assertions
However, if you:
- need a command that will also run on systems where
grep
doesn't support-P
or otherwise don't want to use a Perl regular expression, and - don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and
- are okay with a solution that is a bit less elegant
...then you can achieve this with an extended regular expression instead:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:
[0-9]
matches any digit (like[[:digit:]]
, ord
in Perl regular expressions) and4
means "four times." So[0-9]4
matches a four-digit sequence.[^0-9]
matches characters not in the range of0
through9
. It is equivalent to[^[:digit:]]
(orD
, in Perl regular expressions).^
, when it doesn't appear in[
]
brackets, matches the beginning of a line. Similarly,$
matches the end of a line.|
means or and parentheses are for grouping (as in algebra). So(^|[^0-9])
matches the beginning of the line or a non-digit character, while($|[^0-9])
matches the end of the line or a non-digit character.
So matches occur only in lines containing a four-digit sequence ([0-9]4
) that is simultaneously:
- at the beginning of the line or preceded by a non-digit (
(^|[^0-9])
), and - at the end of the line or followed by a non-digit (
($|[^0-9])
).
If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.
Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, grep
ing for the two patterns separately.
You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d
(and braces) in place of [0-9]
:
grep -P 'd4' file | grep -Pv 'd5'
Since it uses [0-9]
, matt's way is more portable--it will work on systems where grep
doesn't support Perl regular expressions. If you use [0-9]
(or [[:digit:]]
) instead of d
, but continue to use
, you get the portability of matt's way a bit more concisely:
grep -E '[0-9]4' file | grep -Ev '[0-9]5'
Alternative Way, With a Single Pattern
If you really do prefer a grep
command that
uses a single regular expression (not twogrep
s separated by a pipe, as above)- to display lines that contain at least one sequence of four digits,
- but no sequences of five (or more) digits,
- and you don't mind matching the whole line, not just the digits (you probably don't mind this)
...then you can use:
grep -Px '(d0,4D)*d4(Dd0,4)*' file
The -x
flag makes grep
display only lines where the entire line matches (rather than any line containing a match).
I've used a Perl regular expression because I think the brevity of d
and D
substantially increase clarity in this case. But if you need something portable to systems where grep
doesn't support -P
, you can replace them with [0-9]
and [^0-9]
(or with [[:digit:]]
and [^[:digit]]
):
grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file
The way these regular expressions work is:
In the middle,
d4
or[0-9]4
matches one sequence of four digits. We may have more than one of these, but we need to have at least one.On the left,
(d0,4D)*
or([0-9]0,4[^0-9])*
matches zero or more (*
) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the left of the central
d4
(or[0-9]4
) must either be empty or end with a non-digit, this prevents the centrald4
from matching four digits that have a another (fifth) digit just to the left of them.On the right,
(Dd0,4)*
or([^0-9][0-9]0,4)*
matches zero or more (*
) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the right of the central
d4
(or[0-9]4
) must either be empty or start with a non-digit, this prevents the centrald4
from matching four digits that have another (fifth) digit just to the right of them.
This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.
It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5'
(or similar) instead, as suggested above and in matt's answer.
With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).
1
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
add a comment
|
There are two ways to interpret this question; I'll address both cases. You might want to display lines:
- that contain a sequence of four digits that is itself not part of any longer sequence of digits, or
- that contains a four-digit sequence but no longer sequence of digits (not even separately).
For example, (1) would display 1234a56789
, but (2) wouldn't.
If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:
grep -P '(?<!d)d4(?!d)' file
This uses Perl regular expressions, which Ubuntu's grep
(GNU grep) supports via -P
. It won't match text like 12345
, nor will it match the 1234
or 2345
that are part of it. But it will match the 1234
in 1234a56789
.
In Perl regular expressions:
d
means any digit (it's a short way to say[0-9]
or[[:digit:]]
).x4
matchesx
4 times. (syntax isn't specific to Perl regular expressions; it's in extended regular expressions via
grep -E
as well.) Sod4
is the same asdddd
.(?<!d)
is a zero-width negative look-behind assertion. It means "unless preceded byd
."(?!d)
is a zero-width negative look-ahead assertion. It means "unless followed byd
."
(?<!d)
and (?!d)
don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.
Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.
One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color
option).
ek@Io:~$ grep -P '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
12345abc789d0123e4
By default in Ubuntu, each user has alias grep='grep --color=auto'
in their ~.bashrc
file. So you get color highlighting automatically when you run a simple command starting with grep
(this is when aliases are expanded) and standard output is a terminal (this is what --color=auto
checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
And you can even make grep
print only matching text, and not the whole line, with -o
:
ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
0123
Alternative Way, Without Look-Behind and Look-Ahead Assertions
However, if you:
- need a command that will also run on systems where
grep
doesn't support-P
or otherwise don't want to use a Perl regular expression, and - don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and
- are okay with a solution that is a bit less elegant
...then you can achieve this with an extended regular expression instead:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:
[0-9]
matches any digit (like[[:digit:]]
, ord
in Perl regular expressions) and4
means "four times." So[0-9]4
matches a four-digit sequence.[^0-9]
matches characters not in the range of0
through9
. It is equivalent to[^[:digit:]]
(orD
, in Perl regular expressions).^
, when it doesn't appear in[
]
brackets, matches the beginning of a line. Similarly,$
matches the end of a line.|
means or and parentheses are for grouping (as in algebra). So(^|[^0-9])
matches the beginning of the line or a non-digit character, while($|[^0-9])
matches the end of the line or a non-digit character.
So matches occur only in lines containing a four-digit sequence ([0-9]4
) that is simultaneously:
- at the beginning of the line or preceded by a non-digit (
(^|[^0-9])
), and - at the end of the line or followed by a non-digit (
($|[^0-9])
).
If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.
Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, grep
ing for the two patterns separately.
You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d
(and braces) in place of [0-9]
:
grep -P 'd4' file | grep -Pv 'd5'
Since it uses [0-9]
, matt's way is more portable--it will work on systems where grep
doesn't support Perl regular expressions. If you use [0-9]
(or [[:digit:]]
) instead of d
, but continue to use
, you get the portability of matt's way a bit more concisely:
grep -E '[0-9]4' file | grep -Ev '[0-9]5'
Alternative Way, With a Single Pattern
If you really do prefer a grep
command that
uses a single regular expression (not twogrep
s separated by a pipe, as above)- to display lines that contain at least one sequence of four digits,
- but no sequences of five (or more) digits,
- and you don't mind matching the whole line, not just the digits (you probably don't mind this)
...then you can use:
grep -Px '(d0,4D)*d4(Dd0,4)*' file
The -x
flag makes grep
display only lines where the entire line matches (rather than any line containing a match).
I've used a Perl regular expression because I think the brevity of d
and D
substantially increase clarity in this case. But if you need something portable to systems where grep
doesn't support -P
, you can replace them with [0-9]
and [^0-9]
(or with [[:digit:]]
and [^[:digit]]
):
grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file
The way these regular expressions work is:
In the middle,
d4
or[0-9]4
matches one sequence of four digits. We may have more than one of these, but we need to have at least one.On the left,
(d0,4D)*
or([0-9]0,4[^0-9])*
matches zero or more (*
) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the left of the central
d4
(or[0-9]4
) must either be empty or end with a non-digit, this prevents the centrald4
from matching four digits that have a another (fifth) digit just to the left of them.On the right,
(Dd0,4)*
or([^0-9][0-9]0,4)*
matches zero or more (*
) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the right of the central
d4
(or[0-9]4
) must either be empty or start with a non-digit, this prevents the centrald4
from matching four digits that have another (fifth) digit just to the right of them.
This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.
It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5'
(or similar) instead, as suggested above and in matt's answer.
With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).
1
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
add a comment
|
There are two ways to interpret this question; I'll address both cases. You might want to display lines:
- that contain a sequence of four digits that is itself not part of any longer sequence of digits, or
- that contains a four-digit sequence but no longer sequence of digits (not even separately).
For example, (1) would display 1234a56789
, but (2) wouldn't.
If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:
grep -P '(?<!d)d4(?!d)' file
This uses Perl regular expressions, which Ubuntu's grep
(GNU grep) supports via -P
. It won't match text like 12345
, nor will it match the 1234
or 2345
that are part of it. But it will match the 1234
in 1234a56789
.
In Perl regular expressions:
d
means any digit (it's a short way to say[0-9]
or[[:digit:]]
).x4
matchesx
4 times. (syntax isn't specific to Perl regular expressions; it's in extended regular expressions via
grep -E
as well.) Sod4
is the same asdddd
.(?<!d)
is a zero-width negative look-behind assertion. It means "unless preceded byd
."(?!d)
is a zero-width negative look-ahead assertion. It means "unless followed byd
."
(?<!d)
and (?!d)
don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.
Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.
One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color
option).
ek@Io:~$ grep -P '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
12345abc789d0123e4
By default in Ubuntu, each user has alias grep='grep --color=auto'
in their ~.bashrc
file. So you get color highlighting automatically when you run a simple command starting with grep
(this is when aliases are expanded) and standard output is a terminal (this is what --color=auto
checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
And you can even make grep
print only matching text, and not the whole line, with -o
:
ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
0123
Alternative Way, Without Look-Behind and Look-Ahead Assertions
However, if you:
- need a command that will also run on systems where
grep
doesn't support-P
or otherwise don't want to use a Perl regular expression, and - don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and
- are okay with a solution that is a bit less elegant
...then you can achieve this with an extended regular expression instead:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:
[0-9]
matches any digit (like[[:digit:]]
, ord
in Perl regular expressions) and4
means "four times." So[0-9]4
matches a four-digit sequence.[^0-9]
matches characters not in the range of0
through9
. It is equivalent to[^[:digit:]]
(orD
, in Perl regular expressions).^
, when it doesn't appear in[
]
brackets, matches the beginning of a line. Similarly,$
matches the end of a line.|
means or and parentheses are for grouping (as in algebra). So(^|[^0-9])
matches the beginning of the line or a non-digit character, while($|[^0-9])
matches the end of the line or a non-digit character.
So matches occur only in lines containing a four-digit sequence ([0-9]4
) that is simultaneously:
- at the beginning of the line or preceded by a non-digit (
(^|[^0-9])
), and - at the end of the line or followed by a non-digit (
($|[^0-9])
).
If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.
Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, grep
ing for the two patterns separately.
You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d
(and braces) in place of [0-9]
:
grep -P 'd4' file | grep -Pv 'd5'
Since it uses [0-9]
, matt's way is more portable--it will work on systems where grep
doesn't support Perl regular expressions. If you use [0-9]
(or [[:digit:]]
) instead of d
, but continue to use
, you get the portability of matt's way a bit more concisely:
grep -E '[0-9]4' file | grep -Ev '[0-9]5'
Alternative Way, With a Single Pattern
If you really do prefer a grep
command that
uses a single regular expression (not twogrep
s separated by a pipe, as above)- to display lines that contain at least one sequence of four digits,
- but no sequences of five (or more) digits,
- and you don't mind matching the whole line, not just the digits (you probably don't mind this)
...then you can use:
grep -Px '(d0,4D)*d4(Dd0,4)*' file
The -x
flag makes grep
display only lines where the entire line matches (rather than any line containing a match).
I've used a Perl regular expression because I think the brevity of d
and D
substantially increase clarity in this case. But if you need something portable to systems where grep
doesn't support -P
, you can replace them with [0-9]
and [^0-9]
(or with [[:digit:]]
and [^[:digit]]
):
grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file
The way these regular expressions work is:
In the middle,
d4
or[0-9]4
matches one sequence of four digits. We may have more than one of these, but we need to have at least one.On the left,
(d0,4D)*
or([0-9]0,4[^0-9])*
matches zero or more (*
) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the left of the central
d4
(or[0-9]4
) must either be empty or end with a non-digit, this prevents the centrald4
from matching four digits that have a another (fifth) digit just to the left of them.On the right,
(Dd0,4)*
or([^0-9][0-9]0,4)*
matches zero or more (*
) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the right of the central
d4
(or[0-9]4
) must either be empty or start with a non-digit, this prevents the centrald4
from matching four digits that have another (fifth) digit just to the right of them.
This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.
It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5'
(or similar) instead, as suggested above and in matt's answer.
With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).
There are two ways to interpret this question; I'll address both cases. You might want to display lines:
- that contain a sequence of four digits that is itself not part of any longer sequence of digits, or
- that contains a four-digit sequence but no longer sequence of digits (not even separately).
For example, (1) would display 1234a56789
, but (2) wouldn't.
If you want to display all lines that contain a sequence of four digits that is itself not part of any longer sequence of digits, one way is:
grep -P '(?<!d)d4(?!d)' file
This uses Perl regular expressions, which Ubuntu's grep
(GNU grep) supports via -P
. It won't match text like 12345
, nor will it match the 1234
or 2345
that are part of it. But it will match the 1234
in 1234a56789
.
In Perl regular expressions:
d
means any digit (it's a short way to say[0-9]
or[[:digit:]]
).x4
matchesx
4 times. (syntax isn't specific to Perl regular expressions; it's in extended regular expressions via
grep -E
as well.) Sod4
is the same asdddd
.(?<!d)
is a zero-width negative look-behind assertion. It means "unless preceded byd
."(?!d)
is a zero-width negative look-ahead assertion. It means "unless followed byd
."
(?<!d)
and (?!d)
don't match text outside the sequence of four digits; instead, they will (when used together) prevent a sequence of four digits from itself being matched if it is part of a longer sequence of digits.
Using just the look-behind or just the look-ahead is insufficient because the rightmost or leftmost four-digit subsequence would still be matched.
One benefit of using look-behind and look-ahead assertions is that your pattern matches only the four-digit sequences themselves, and not the surrounding text. This is helpful when using color highlighting (with the --color
option).
ek@Io:~$ grep -P '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
12345abc789d0123e4
By default in Ubuntu, each user has alias grep='grep --color=auto'
in their ~.bashrc
file. So you get color highlighting automatically when you run a simple command starting with grep
(this is when aliases are expanded) and standard output is a terminal (this is what --color=auto
checks for). Matches are typically highlighted in a shade of red (close to vermilion), but I've shown it in italicized bold. Here's a screenshot:
And you can even make grep
print only matching text, and not the whole line, with -o
:
ek@Io:~$ grep -oP '(?<!d)d4(?!d)' <<< 12345abc789d0123e4
0123
Alternative Way, Without Look-Behind and Look-Ahead Assertions
However, if you:
- need a command that will also run on systems where
grep
doesn't support-P
or otherwise don't want to use a Perl regular expression, and - don't need to match the four digits specifically--which is usually the case if your goal is simply to display lines containing matches, and
- are okay with a solution that is a bit less elegant
...then you can achieve this with an extended regular expression instead:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
This matches four digits and the non-digit character--or beginning or end of the line--surrounding them. Specifically:
[0-9]
matches any digit (like[[:digit:]]
, ord
in Perl regular expressions) and4
means "four times." So[0-9]4
matches a four-digit sequence.[^0-9]
matches characters not in the range of0
through9
. It is equivalent to[^[:digit:]]
(orD
, in Perl regular expressions).^
, when it doesn't appear in[
]
brackets, matches the beginning of a line. Similarly,$
matches the end of a line.|
means or and parentheses are for grouping (as in algebra). So(^|[^0-9])
matches the beginning of the line or a non-digit character, while($|[^0-9])
matches the end of the line or a non-digit character.
So matches occur only in lines containing a four-digit sequence ([0-9]4
) that is simultaneously:
- at the beginning of the line or preceded by a non-digit (
(^|[^0-9])
), and - at the end of the line or followed by a non-digit (
($|[^0-9])
).
If, on the other hand, you want to display all lines that contain a four-digit sequence, but do not contain any sequence of more than four digits (even one that is separate from another sequence of only four digits), then conceptually your goal is to find lines that match one pattern but not another.
Therefore, even if you know how to do it with a single pattern, I'd suggest using something like matt's second suggestion, grep
ing for the two patterns separately.
You don't strongly benefit from any of the advanced features of Perl regular expressions when doing that, so you might prefer not to use them. But in keeping with the above style, here's a shortening of matt's solution using d
(and braces) in place of [0-9]
:
grep -P 'd4' file | grep -Pv 'd5'
Since it uses [0-9]
, matt's way is more portable--it will work on systems where grep
doesn't support Perl regular expressions. If you use [0-9]
(or [[:digit:]]
) instead of d
, but continue to use
, you get the portability of matt's way a bit more concisely:
grep -E '[0-9]4' file | grep -Ev '[0-9]5'
Alternative Way, With a Single Pattern
If you really do prefer a grep
command that
uses a single regular expression (not twogrep
s separated by a pipe, as above)- to display lines that contain at least one sequence of four digits,
- but no sequences of five (or more) digits,
- and you don't mind matching the whole line, not just the digits (you probably don't mind this)
...then you can use:
grep -Px '(d0,4D)*d4(Dd0,4)*' file
The -x
flag makes grep
display only lines where the entire line matches (rather than any line containing a match).
I've used a Perl regular expression because I think the brevity of d
and D
substantially increase clarity in this case. But if you need something portable to systems where grep
doesn't support -P
, you can replace them with [0-9]
and [^0-9]
(or with [[:digit:]]
and [^[:digit]]
):
grep -Ex '([0-9]0,4[^0-9])*[0-9]4([^0-9][0-9]0,4)*' file
The way these regular expressions work is:
In the middle,
d4
or[0-9]4
matches one sequence of four digits. We may have more than one of these, but we need to have at least one.On the left,
(d0,4D)*
or([0-9]0,4[^0-9])*
matches zero or more (*
) instances of not more than four digits followed by a non-digit. Zero digits (i.e., nothing) is one possibility for "not more than four digits." This matches (a) the empty string or (b) any string ending in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the left of the central
d4
(or[0-9]4
) must either be empty or end with a non-digit, this prevents the centrald4
from matching four digits that have a another (fifth) digit just to the left of them.On the right,
(Dd0,4)*
or([^0-9][0-9]0,4)*
matches zero or more (*
) instances of a non-digit followed by not more than four digits (which, like before, could be four, three, two, one, or even none at all). This matches (a) the empty string or (b) any string beginning in a non-digit and not containing any sequences of more than four digits.Since the text immediately to the right of the central
d4
(or[0-9]4
) must either be empty or start with a non-digit, this prevents the centrald4
from matching four digits that have another (fifth) digit just to the right of them.
This ensures a four-digit sequence is present somewhere, and that no sequence of five or more digits is present anywhere.
It is not bad or wrong to do it this way. But perhaps the most important reason to consider this alternative is that it clarifies the benefit of using grep -P 'd4' file | grep -Pv 'd5'
(or similar) instead, as suggested above and in matt's answer.
With that way, it's clear your goal is to select lines that contain one thing but not another. Plus the syntax is simpler (so it may be more quickly understood by many readers/maintainers).
edited Apr 13 '17 at 12:23
Community♦
1
1
answered Oct 18 '14 at 23:36
Eliah KaganEliah Kagan
90.8k23 gold badges251 silver badges397 bronze badges
90.8k23 gold badges251 silver badges397 bronze badges
1
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
add a comment
|
1
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
1
1
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
Thank you! I really did not expect to get an entire article in response. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull both 1234 and abc1234abc99999 from the data set, which your solution does perfectly. I'm still a little confused with the string anchors though.
– Buddha
Oct 20 '14 at 17:04
add a comment
|
This will show you 4 numbers in a row but not more
grep '[0-9][0-9][0-9][0-9][^0-9]' file
Note the ^ means not
There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.
This uglier version however would work for that case
grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
2
The first one is wrong - it findsa12345b
, because it matches2345b
.
– Volker Siegel
Oct 19 '14 at 1:28
add a comment
|
This will show you 4 numbers in a row but not more
grep '[0-9][0-9][0-9][0-9][^0-9]' file
Note the ^ means not
There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.
This uglier version however would work for that case
grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
2
The first one is wrong - it findsa12345b
, because it matches2345b
.
– Volker Siegel
Oct 19 '14 at 1:28
add a comment
|
This will show you 4 numbers in a row but not more
grep '[0-9][0-9][0-9][0-9][^0-9]' file
Note the ^ means not
There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.
This uglier version however would work for that case
grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]
This will show you 4 numbers in a row but not more
grep '[0-9][0-9][0-9][0-9][^0-9]' file
Note the ^ means not
There is a problem with this though I'm not sure how to fix ... if the number is the end of the line then it wont show up.
This uglier version however would work for that case
grep '[0-9][0-9][0-9][0-9]' file | grep -v [0-9][0-9][0-9][0-9][0-9]
edited Oct 18 '14 at 22:22
answered Oct 18 '14 at 21:44
mattmatt
1311 gold badge1 silver badge6 bronze badges
1311 gold badge1 silver badge6 bronze badges
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
2
The first one is wrong - it findsa12345b
, because it matches2345b
.
– Volker Siegel
Oct 19 '14 at 1:28
add a comment
|
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
2
The first one is wrong - it findsa12345b
, because it matches2345b
.
– Volker Siegel
Oct 19 '14 at 1:28
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
oops, didnt need to be egrep - i've edited it
– matt
Oct 18 '14 at 22:23
2
2
The first one is wrong - it finds
a12345b
, because it matches 2345b
.– Volker Siegel
Oct 19 '14 at 1:28
The first one is wrong - it finds
a12345b
, because it matches 2345b
.– Volker Siegel
Oct 19 '14 at 1:28
add a comment
|
If grep
doesn't support perl regular expressions (-P
), use the following shell command:
grep -w "$(printf '[0-9]%.0s' 1..4)" file
where printf '[0-9]%.0s' 1..4
will produce 4 times [0-9]
. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4
with your number of your digits to look for).
Using -w
will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a
, then add [^0-9]
at the end of the pattern, e.g.
grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file
Using $()
is basically a command substitution. Check this post to see how printf
repeats the pattern.
add a comment
|
If grep
doesn't support perl regular expressions (-P
), use the following shell command:
grep -w "$(printf '[0-9]%.0s' 1..4)" file
where printf '[0-9]%.0s' 1..4
will produce 4 times [0-9]
. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4
with your number of your digits to look for).
Using -w
will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a
, then add [^0-9]
at the end of the pattern, e.g.
grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file
Using $()
is basically a command substitution. Check this post to see how printf
repeats the pattern.
add a comment
|
If grep
doesn't support perl regular expressions (-P
), use the following shell command:
grep -w "$(printf '[0-9]%.0s' 1..4)" file
where printf '[0-9]%.0s' 1..4
will produce 4 times [0-9]
. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4
with your number of your digits to look for).
Using -w
will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a
, then add [^0-9]
at the end of the pattern, e.g.
grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file
Using $()
is basically a command substitution. Check this post to see how printf
repeats the pattern.
If grep
doesn't support perl regular expressions (-P
), use the following shell command:
grep -w "$(printf '[0-9]%.0s' 1..4)" file
where printf '[0-9]%.0s' 1..4
will produce 4 times [0-9]
. This method is useful, when you've got long digits and you don't want to repeat the pattern (just replace 4
with your number of your digits to look for).
Using -w
will look for the whole words. However if you're interested in alphanumeric strings, such as 1234a
, then add [^0-9]
at the end of the pattern, e.g.
grep "$(printf '[0-9]%.0s' 1..4)[^0-9]" file
Using $()
is basically a command substitution. Check this post to see how printf
repeats the pattern.
answered Mar 3 '18 at 17:18
kenorbkenorb
5,4821 gold badge45 silver badges67 bronze badges
5,4821 gold badge45 silver badges67 bronze badges
add a comment
|
add a comment
|
You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
add a comment
|
You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
add a comment
|
You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
You can try below command by replacing file actual file name in your system you can also check this tutorial for more uses of grep command:
grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file
answered Apr 17 at 17:53
Mike TysonMike Tyson
1
1
add a comment
|
add a comment
|
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f538730%2fhow-to-grep-for-groups-of-n-digits-but-no-more-than-n%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Should a line like
1234a12345
be displayed, or not?– Eliah Kagan
Oct 18 '14 at 22:17
@Buddha you need to explain your question along with an example.
– Avinash Raj
Oct 19 '14 at 2:14
if the numbers are preceded by space or start of the line anchor and followed by a space or end of the line anchor then you could simply use word boundaries.
bd4b
– Avinash Raj
Oct 19 '14 at 2:22
1
This question differs from some questions about regular expressions by being explicitly about grep usage. Questions about using Unix utilities in Ubuntu, such as grep, sed, and awk, have always been considered fine here. Sometimes people ask how to do a job with the wrong tool; then a lack of context is a big problem, but that's not what's happening here. This is on-topic, clear enough to be usefully answered, helpful to our community, and there's no benefit in preventing further answers or pushing it toward deletion or migration. I'm voting to reopen it.
– Eliah Kagan
Oct 19 '14 at 16:34
1
Thank you guys so much, I had no idea I would get this much feedback. This is the answer I was looking for: grep -E '(^|[^0-9])[0-9]4($|[^0-9])' file. The command must be able to pull a string like this (which it does): abc1234abcd99999
– Buddha
Oct 20 '14 at 16:20