Extract specific characters from each linePrint certain fields of each line until a marker is encountered, then print whole lines till the end of fileExtract keyword from lineSearch for a specific word in each line and print rest of the lineUse awk/sed to remove everything but matching pattern in a specific columnunix: get characters 10 to 80 in a fileprint if next line containsExtract specific thing from each row in columnExtract specific fields from file
Where does the budget surplus of a conference go?
Isn't any conversation with the US president quid-pro-quo?
Why has no one requested the tape of the Trump/Ukraine call?
How to exit read-only mode
What is :>filename.txt Doing?
Are there any dishes that can only be cooked with a microwave?
Can't CD to Desktop anymore
Would an intelligent alien civilisation categorise EM radiation the same as us?
Two button calculator part 2
Was it possible for a message from Paris to reach London within 48 hours in 1782?
N-Dimensional Cartesian Product
Ethics: Is it ethical for a professor to conduct research using a student's ideas without giving them credit?
Are we sinners because we sin or do we sin because we are sinners?
How to write a vertically centered asterisk in LaTex in a normal text?
What is the "two-drive trick" that can read Amiga disks on a PC?
Is it a mistake to use a password that has previously been used (by anyone ever)?
What is the name of this landform?
What is the pKaH of pyrrole?
What does the Node2D transform property do?
What are the minimum element requirements for a star?
If a photon truly goes through both slits (at the same time), then why can't we detect it at both slits (at the same time)?
Using characters to delimit commands (like markdown)
What exactly is "Japanese" Salt and Pepper?
Can Counterspell be used to prevent a Mystic from using a Discipline?
Extract specific characters from each line
Print certain fields of each line until a marker is encountered, then print whole lines till the end of fileExtract keyword from lineSearch for a specific word in each line and print rest of the lineUse awk/sed to remove everything but matching pattern in a specific columnunix: get characters 10 to 80 in a fileprint if next line containsExtract specific thing from each row in columnExtract specific fields from file
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
I have a text file, and I want extract the string from each line coming after "OS="
input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1
Output desired
OS=Arundo donax
OS=Setaria italica
OR
Arundo donax
Setaria italica
text-processing awk perl
add a comment
|
I have a text file, and I want extract the string from each line coming after "OS="
input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1
Output desired
OS=Arundo donax
OS=Setaria italica
OR
Arundo donax
Setaria italica
text-processing awk perl
Are there always 2 words to print afterOS=
or do you want all words betweenOS=
andOX=
?
– oliv
Sep 5 at 14:13
i need only two words
– shahzad
Sep 5 at 14:17
3
This is a work order, not a question. No demonstrated effort.
– Peter Mortensen
Sep 6 at 8:25
add a comment
|
I have a text file, and I want extract the string from each line coming after "OS="
input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1
Output desired
OS=Arundo donax
OS=Setaria italica
OR
Arundo donax
Setaria italica
text-processing awk perl
I have a text file, and I want extract the string from each line coming after "OS="
input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1
Output desired
OS=Arundo donax
OS=Setaria italica
OR
Arundo donax
Setaria italica
text-processing awk perl
text-processing awk perl
edited Sep 6 at 9:54
Jeff Schaller♦
52k11 gold badges76 silver badges172 bronze badges
52k11 gold badges76 silver badges172 bronze badges
asked Sep 5 at 14:04
shahzadshahzad
494 bronze badges
494 bronze badges
Are there always 2 words to print afterOS=
or do you want all words betweenOS=
andOX=
?
– oliv
Sep 5 at 14:13
i need only two words
– shahzad
Sep 5 at 14:17
3
This is a work order, not a question. No demonstrated effort.
– Peter Mortensen
Sep 6 at 8:25
add a comment
|
Are there always 2 words to print afterOS=
or do you want all words betweenOS=
andOX=
?
– oliv
Sep 5 at 14:13
i need only two words
– shahzad
Sep 5 at 14:17
3
This is a work order, not a question. No demonstrated effort.
– Peter Mortensen
Sep 6 at 8:25
Are there always 2 words to print after
OS=
or do you want all words between OS=
and OX=
?– oliv
Sep 5 at 14:13
Are there always 2 words to print after
OS=
or do you want all words between OS=
and OX=
?– oliv
Sep 5 at 14:13
i need only two words
– shahzad
Sep 5 at 14:17
i need only two words
– shahzad
Sep 5 at 14:17
3
3
This is a work order, not a question. No demonstrated effort.
– Peter Mortensen
Sep 6 at 8:25
This is a work order, not a question. No demonstrated effort.
– Peter Mortensen
Sep 6 at 8:25
add a comment
|
4 Answers
4
active
oldest
votes
Use GNU grep
(or compatible) with extended regex:
grep -Eo "OS=w+ w+" file
or basic regex (you need to escape +
grep -o "OS=w+ w+" file
# or
grep -o "OS=w* w*" file
To get everything from OS=
up to OX=
you can use grep
with perl-compatible regex (PCRE) (-P
option) if available and make lookahead:
grep -Po "OS=.*(?=OX=)" file
#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out K
grep -Po "OS=K.*(?=OX=)" file
or use grep
including OX=
and remove it with sed
afterwards:
grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'
Output:
OS=Arundo donax
OS=Setaria italica
add a comment
|
In Perl, two non-whitespace "words":
$ perl -lne 'print $1 if /OS=(S+ S+)/' input
or everything up to OX=
:
$ perl -lne 'print $1 if /OS=(.*?) OX=/' input
or everything up to the next something=
:
$ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input
With your sample input, they all give the same output, but the output would be different with e.g. an input like this:
ABC=something here OS=foo bar doo PE=3 OX=1234
add a comment
|
A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).
sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'
The first block grabs everything up to OS=
, the second block in the capture group (denoted by ()
's) matches upto the next =
and can be referred to in the replacement as 1
. The next substitution rids the last word which is a fragment from the next assignment.
Note: the ^
in []
's is to exclude match the character in this case everything that is not an =
sign.
add a comment
|
awk 'print $(NF-4), $(NF-3)' file
OS=Arundo donax
OS=Setaria italica
or
awk -F= 'sub(/OX/,""); print $(NF-3)' file
Arundo donax
Setaria italica
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f539203%2fextract-specific-characters-from-each-line%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use GNU grep
(or compatible) with extended regex:
grep -Eo "OS=w+ w+" file
or basic regex (you need to escape +
grep -o "OS=w+ w+" file
# or
grep -o "OS=w* w*" file
To get everything from OS=
up to OX=
you can use grep
with perl-compatible regex (PCRE) (-P
option) if available and make lookahead:
grep -Po "OS=.*(?=OX=)" file
#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out K
grep -Po "OS=K.*(?=OX=)" file
or use grep
including OX=
and remove it with sed
afterwards:
grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'
Output:
OS=Arundo donax
OS=Setaria italica
add a comment
|
Use GNU grep
(or compatible) with extended regex:
grep -Eo "OS=w+ w+" file
or basic regex (you need to escape +
grep -o "OS=w+ w+" file
# or
grep -o "OS=w* w*" file
To get everything from OS=
up to OX=
you can use grep
with perl-compatible regex (PCRE) (-P
option) if available and make lookahead:
grep -Po "OS=.*(?=OX=)" file
#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out K
grep -Po "OS=K.*(?=OX=)" file
or use grep
including OX=
and remove it with sed
afterwards:
grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'
Output:
OS=Arundo donax
OS=Setaria italica
add a comment
|
Use GNU grep
(or compatible) with extended regex:
grep -Eo "OS=w+ w+" file
or basic regex (you need to escape +
grep -o "OS=w+ w+" file
# or
grep -o "OS=w* w*" file
To get everything from OS=
up to OX=
you can use grep
with perl-compatible regex (PCRE) (-P
option) if available and make lookahead:
grep -Po "OS=.*(?=OX=)" file
#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out K
grep -Po "OS=K.*(?=OX=)" file
or use grep
including OX=
and remove it with sed
afterwards:
grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'
Output:
OS=Arundo donax
OS=Setaria italica
Use GNU grep
(or compatible) with extended regex:
grep -Eo "OS=w+ w+" file
or basic regex (you need to escape +
grep -o "OS=w+ w+" file
# or
grep -o "OS=w* w*" file
To get everything from OS=
up to OX=
you can use grep
with perl-compatible regex (PCRE) (-P
option) if available and make lookahead:
grep -Po "OS=.*(?=OX=)" file
#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out K
grep -Po "OS=K.*(?=OX=)" file
or use grep
including OX=
and remove it with sed
afterwards:
grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'
Output:
OS=Arundo donax
OS=Setaria italica
edited Sep 6 at 6:25
answered Sep 5 at 14:20
pLumopLumo
8,90915 silver badges38 bronze badges
8,90915 silver badges38 bronze badges
add a comment
|
add a comment
|
In Perl, two non-whitespace "words":
$ perl -lne 'print $1 if /OS=(S+ S+)/' input
or everything up to OX=
:
$ perl -lne 'print $1 if /OS=(.*?) OX=/' input
or everything up to the next something=
:
$ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input
With your sample input, they all give the same output, but the output would be different with e.g. an input like this:
ABC=something here OS=foo bar doo PE=3 OX=1234
add a comment
|
In Perl, two non-whitespace "words":
$ perl -lne 'print $1 if /OS=(S+ S+)/' input
or everything up to OX=
:
$ perl -lne 'print $1 if /OS=(.*?) OX=/' input
or everything up to the next something=
:
$ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input
With your sample input, they all give the same output, but the output would be different with e.g. an input like this:
ABC=something here OS=foo bar doo PE=3 OX=1234
add a comment
|
In Perl, two non-whitespace "words":
$ perl -lne 'print $1 if /OS=(S+ S+)/' input
or everything up to OX=
:
$ perl -lne 'print $1 if /OS=(.*?) OX=/' input
or everything up to the next something=
:
$ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input
With your sample input, they all give the same output, but the output would be different with e.g. an input like this:
ABC=something here OS=foo bar doo PE=3 OX=1234
In Perl, two non-whitespace "words":
$ perl -lne 'print $1 if /OS=(S+ S+)/' input
or everything up to OX=
:
$ perl -lne 'print $1 if /OS=(.*?) OX=/' input
or everything up to the next something=
:
$ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input
With your sample input, they all give the same output, but the output would be different with e.g. an input like this:
ABC=something here OS=foo bar doo PE=3 OX=1234
answered Sep 5 at 14:25
ilkkachuilkkachu
72k11 gold badges119 silver badges210 bronze badges
72k11 gold badges119 silver badges210 bronze badges
add a comment
|
add a comment
|
A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).
sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'
The first block grabs everything up to OS=
, the second block in the capture group (denoted by ()
's) matches upto the next =
and can be referred to in the replacement as 1
. The next substitution rids the last word which is a fragment from the next assignment.
Note: the ^
in []
's is to exclude match the character in this case everything that is not an =
sign.
add a comment
|
A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).
sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'
The first block grabs everything up to OS=
, the second block in the capture group (denoted by ()
's) matches upto the next =
and can be referred to in the replacement as 1
. The next substitution rids the last word which is a fragment from the next assignment.
Note: the ^
in []
's is to exclude match the character in this case everything that is not an =
sign.
add a comment
|
A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).
sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'
The first block grabs everything up to OS=
, the second block in the capture group (denoted by ()
's) matches upto the next =
and can be referred to in the replacement as 1
. The next substitution rids the last word which is a fragment from the next assignment.
Note: the ^
in []
's is to exclude match the character in this case everything that is not an =
sign.
A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).
sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'
The first block grabs everything up to OS=
, the second block in the capture group (denoted by ()
's) matches upto the next =
and can be referred to in the replacement as 1
. The next substitution rids the last word which is a fragment from the next assignment.
Note: the ^
in []
's is to exclude match the character in this case everything that is not an =
sign.
edited Sep 5 at 15:03
answered Sep 5 at 14:54
A.DanischewskiA.Danischewski
3422 silver badges7 bronze badges
3422 silver badges7 bronze badges
add a comment
|
add a comment
|
awk 'print $(NF-4), $(NF-3)' file
OS=Arundo donax
OS=Setaria italica
or
awk -F= 'sub(/OX/,""); print $(NF-3)' file
Arundo donax
Setaria italica
add a comment
|
awk 'print $(NF-4), $(NF-3)' file
OS=Arundo donax
OS=Setaria italica
or
awk -F= 'sub(/OX/,""); print $(NF-3)' file
Arundo donax
Setaria italica
add a comment
|
awk 'print $(NF-4), $(NF-3)' file
OS=Arundo donax
OS=Setaria italica
or
awk -F= 'sub(/OX/,""); print $(NF-3)' file
Arundo donax
Setaria italica
awk 'print $(NF-4), $(NF-3)' file
OS=Arundo donax
OS=Setaria italica
or
awk -F= 'sub(/OX/,""); print $(NF-3)' file
Arundo donax
Setaria italica
edited Sep 7 at 17:43
answered Sep 6 at 22:58
Claes WiknerClaes Wikner
1471 silver badge3 bronze badges
1471 silver badge3 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f539203%2fextract-specific-characters-from-each-line%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are there always 2 words to print after
OS=
or do you want all words betweenOS=
andOX=
?– oliv
Sep 5 at 14:13
i need only two words
– shahzad
Sep 5 at 14:17
3
This is a work order, not a question. No demonstrated effort.
– Peter Mortensen
Sep 6 at 8:25