Extract specific characters from each linePrint certain fields of each line until a marker is encountered, then print whole lines till the end of fileExtract keyword from lineSearch for a specific word in each line and print rest of the lineUse awk/sed to remove everything but matching pattern in a specific columnunix: get characters 10 to 80 in a fileprint if next line containsExtract specific thing from each row in columnExtract specific fields from file

Where does the budget surplus of a conference go?

Isn't any conversation with the US president quid-pro-quo?

Why has no one requested the tape of the Trump/Ukraine call?

How to exit read-only mode

What is :>filename.txt Doing?

Are there any dishes that can only be cooked with a microwave?

Can't CD to Desktop anymore

Would an intelligent alien civilisation categorise EM radiation the same as us?

Two button calculator part 2

Was it possible for a message from Paris to reach London within 48 hours in 1782?

N-Dimensional Cartesian Product

Ethics: Is it ethical for a professor to conduct research using a student's ideas without giving them credit?

Are we sinners because we sin or do we sin because we are sinners?

How to write a vertically centered asterisk in LaTex in a normal text?

What is the "two-drive trick" that can read Amiga disks on a PC?

Is it a mistake to use a password that has previously been used (by anyone ever)?

What is the name of this landform?

What is the pKaH of pyrrole?

What does the Node2D transform property do?

What are the minimum element requirements for a star?

If a photon truly goes through both slits (at the same time), then why can't we detect it at both slits (at the same time)?

Using characters to delimit commands (like markdown)

What exactly is "Japanese" Salt and Pepper?

Can Counterspell be used to prevent a Mystic from using a Discipline?



Extract specific characters from each line


Print certain fields of each line until a marker is encountered, then print whole lines till the end of fileExtract keyword from lineSearch for a specific word in each line and print rest of the lineUse awk/sed to remove everything but matching pattern in a specific columnunix: get characters 10 to 80 in a fileprint if next line containsExtract specific thing from each row in columnExtract specific fields from file






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









3


















I have a text file, and I want extract the string from each line coming after "OS="



input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1


Output desired



OS=Arundo donax
OS=Setaria italica


OR



Arundo donax
Setaria italica









share|improve this question



























  • Are there always 2 words to print after OS= or do you want all words between OS= and OX=?

    – oliv
    Sep 5 at 14:13











  • i need only two words

    – shahzad
    Sep 5 at 14:17






  • 3





    This is a work order, not a question. No demonstrated effort.

    – Peter Mortensen
    Sep 6 at 8:25

















3


















I have a text file, and I want extract the string from each line coming after "OS="



input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1


Output desired



OS=Arundo donax
OS=Setaria italica


OR



Arundo donax
Setaria italica









share|improve this question



























  • Are there always 2 words to print after OS= or do you want all words between OS= and OX=?

    – oliv
    Sep 5 at 14:13











  • i need only two words

    – shahzad
    Sep 5 at 14:17






  • 3





    This is a work order, not a question. No demonstrated effort.

    – Peter Mortensen
    Sep 6 at 8:25













3













3









3


1






I have a text file, and I want extract the string from each line coming after "OS="



input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1


Output desired



OS=Arundo donax
OS=Setaria italica


OR



Arundo donax
Setaria italica









share|improve this question
















I have a text file, and I want extract the string from each line coming after "OS="



input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1


Output desired



OS=Arundo donax
OS=Setaria italica


OR



Arundo donax
Setaria italica






text-processing awk perl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 6 at 9:54









Jeff Schaller

52k11 gold badges76 silver badges172 bronze badges




52k11 gold badges76 silver badges172 bronze badges










asked Sep 5 at 14:04









shahzadshahzad

494 bronze badges




494 bronze badges















  • Are there always 2 words to print after OS= or do you want all words between OS= and OX=?

    – oliv
    Sep 5 at 14:13











  • i need only two words

    – shahzad
    Sep 5 at 14:17






  • 3





    This is a work order, not a question. No demonstrated effort.

    – Peter Mortensen
    Sep 6 at 8:25

















  • Are there always 2 words to print after OS= or do you want all words between OS= and OX=?

    – oliv
    Sep 5 at 14:13











  • i need only two words

    – shahzad
    Sep 5 at 14:17






  • 3





    This is a work order, not a question. No demonstrated effort.

    – Peter Mortensen
    Sep 6 at 8:25
















Are there always 2 words to print after OS= or do you want all words between OS= and OX=?

– oliv
Sep 5 at 14:13





Are there always 2 words to print after OS= or do you want all words between OS= and OX=?

– oliv
Sep 5 at 14:13













i need only two words

– shahzad
Sep 5 at 14:17





i need only two words

– shahzad
Sep 5 at 14:17




3




3





This is a work order, not a question. No demonstrated effort.

– Peter Mortensen
Sep 6 at 8:25





This is a work order, not a question. No demonstrated effort.

– Peter Mortensen
Sep 6 at 8:25










4 Answers
4






active

oldest

votes


















7



















Use GNU grep (or compatible) with extended regex:



grep -Eo "OS=w+ w+" file


or basic regex (you need to escape +



grep -o "OS=w+ w+" file
# or
grep -o "OS=w* w*" file


To get everything from OS= up to OX= you can use grep with perl-compatible regex (PCRE) (-P option) if available and make lookahead:



grep -Po "OS=.*(?=OX=)" file

#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out K
grep -Po "OS=K.*(?=OX=)" file


or use grep including OX= and remove it with sed afterwards:



grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'


Output:



OS=Arundo donax
OS=Setaria italica





share|improve this answer



































    4



















    In Perl, two non-whitespace "words":



    $ perl -lne 'print $1 if /OS=(S+ S+)/' input


    or everything up to OX=:



    $ perl -lne 'print $1 if /OS=(.*?) OX=/' input 


    or everything up to the next something=:



    $ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input


    With your sample input, they all give the same output, but the output would be different with e.g. an input like this:



    ABC=something here OS=foo bar doo PE=3 OX=1234





    share|improve this answer

































      3



















      A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).



      sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'


      The first block grabs everything up to OS=, the second block in the capture group (denoted by ()'s) matches upto the next = and can be referred to in the replacement as 1. The next substitution rids the last word which is a fragment from the next assignment.



      Note: the ^ in []'s is to exclude match the character in this case everything that is not an = sign.






      share|improve this answer



































        1



















        awk 'print $(NF-4), $(NF-3)' file

        OS=Arundo donax
        OS=Setaria italica


        or



        awk -F= 'sub(/OX/,""); print $(NF-3)' file 

        Arundo donax
        Setaria italica





        share|improve this answer





























          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "106"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );














          draft saved

          draft discarded
















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f539203%2fextract-specific-characters-from-each-line%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown


























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          7



















          Use GNU grep (or compatible) with extended regex:



          grep -Eo "OS=w+ w+" file


          or basic regex (you need to escape +



          grep -o "OS=w+ w+" file
          # or
          grep -o "OS=w* w*" file


          To get everything from OS= up to OX= you can use grep with perl-compatible regex (PCRE) (-P option) if available and make lookahead:



          grep -Po "OS=.*(?=OX=)" file

          #to also leave out "OS="
          #use lookbehind
          grep -Po "(?<=OS=).*(?=OX=)" file
          #or Keep-out K
          grep -Po "OS=K.*(?=OX=)" file


          or use grep including OX= and remove it with sed afterwards:



          grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'


          Output:



          OS=Arundo donax
          OS=Setaria italica





          share|improve this answer
































            7



















            Use GNU grep (or compatible) with extended regex:



            grep -Eo "OS=w+ w+" file


            or basic regex (you need to escape +



            grep -o "OS=w+ w+" file
            # or
            grep -o "OS=w* w*" file


            To get everything from OS= up to OX= you can use grep with perl-compatible regex (PCRE) (-P option) if available and make lookahead:



            grep -Po "OS=.*(?=OX=)" file

            #to also leave out "OS="
            #use lookbehind
            grep -Po "(?<=OS=).*(?=OX=)" file
            #or Keep-out K
            grep -Po "OS=K.*(?=OX=)" file


            or use grep including OX= and remove it with sed afterwards:



            grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'


            Output:



            OS=Arundo donax
            OS=Setaria italica





            share|improve this answer






























              7















              7











              7









              Use GNU grep (or compatible) with extended regex:



              grep -Eo "OS=w+ w+" file


              or basic regex (you need to escape +



              grep -o "OS=w+ w+" file
              # or
              grep -o "OS=w* w*" file


              To get everything from OS= up to OX= you can use grep with perl-compatible regex (PCRE) (-P option) if available and make lookahead:



              grep -Po "OS=.*(?=OX=)" file

              #to also leave out "OS="
              #use lookbehind
              grep -Po "(?<=OS=).*(?=OX=)" file
              #or Keep-out K
              grep -Po "OS=K.*(?=OX=)" file


              or use grep including OX= and remove it with sed afterwards:



              grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'


              Output:



              OS=Arundo donax
              OS=Setaria italica





              share|improve this answer
















              Use GNU grep (or compatible) with extended regex:



              grep -Eo "OS=w+ w+" file


              or basic regex (you need to escape +



              grep -o "OS=w+ w+" file
              # or
              grep -o "OS=w* w*" file


              To get everything from OS= up to OX= you can use grep with perl-compatible regex (PCRE) (-P option) if available and make lookahead:



              grep -Po "OS=.*(?=OX=)" file

              #to also leave out "OS="
              #use lookbehind
              grep -Po "(?<=OS=).*(?=OX=)" file
              #or Keep-out K
              grep -Po "OS=K.*(?=OX=)" file


              or use grep including OX= and remove it with sed afterwards:



              grep -o "OS=.*( OX=)" file | sed 's/ OX=$//'


              Output:



              OS=Arundo donax
              OS=Setaria italica






              share|improve this answer















              share|improve this answer




              share|improve this answer








              edited Sep 6 at 6:25

























              answered Sep 5 at 14:20









              pLumopLumo

              8,90915 silver badges38 bronze badges




              8,90915 silver badges38 bronze badges


























                  4



















                  In Perl, two non-whitespace "words":



                  $ perl -lne 'print $1 if /OS=(S+ S+)/' input


                  or everything up to OX=:



                  $ perl -lne 'print $1 if /OS=(.*?) OX=/' input 


                  or everything up to the next something=:



                  $ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input


                  With your sample input, they all give the same output, but the output would be different with e.g. an input like this:



                  ABC=something here OS=foo bar doo PE=3 OX=1234





                  share|improve this answer






























                    4



















                    In Perl, two non-whitespace "words":



                    $ perl -lne 'print $1 if /OS=(S+ S+)/' input


                    or everything up to OX=:



                    $ perl -lne 'print $1 if /OS=(.*?) OX=/' input 


                    or everything up to the next something=:



                    $ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input


                    With your sample input, they all give the same output, but the output would be different with e.g. an input like this:



                    ABC=something here OS=foo bar doo PE=3 OX=1234





                    share|improve this answer




























                      4















                      4











                      4









                      In Perl, two non-whitespace "words":



                      $ perl -lne 'print $1 if /OS=(S+ S+)/' input


                      or everything up to OX=:



                      $ perl -lne 'print $1 if /OS=(.*?) OX=/' input 


                      or everything up to the next something=:



                      $ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input


                      With your sample input, they all give the same output, but the output would be different with e.g. an input like this:



                      ABC=something here OS=foo bar doo PE=3 OX=1234





                      share|improve this answer














                      In Perl, two non-whitespace "words":



                      $ perl -lne 'print $1 if /OS=(S+ S+)/' input


                      or everything up to OX=:



                      $ perl -lne 'print $1 if /OS=(.*?) OX=/' input 


                      or everything up to the next something=:



                      $ perl -lne 'print $1 if /OS=(.*?) (w+)=/' input


                      With your sample input, they all give the same output, but the output would be different with e.g. an input like this:



                      ABC=something here OS=foo bar doo PE=3 OX=1234






                      share|improve this answer













                      share|improve this answer




                      share|improve this answer










                      answered Sep 5 at 14:25









                      ilkkachuilkkachu

                      72k11 gold badges119 silver badges210 bronze badges




                      72k11 gold badges119 silver badges210 bronze badges
























                          3



















                          A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).



                          sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'


                          The first block grabs everything up to OS=, the second block in the capture group (denoted by ()'s) matches upto the next = and can be referred to in the replacement as 1. The next substitution rids the last word which is a fragment from the next assignment.



                          Note: the ^ in []'s is to exclude match the character in this case everything that is not an = sign.






                          share|improve this answer
































                            3



















                            A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).



                            sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'


                            The first block grabs everything up to OS=, the second block in the capture group (denoted by ()'s) matches upto the next = and can be referred to in the replacement as 1. The next substitution rids the last word which is a fragment from the next assignment.



                            Note: the ^ in []'s is to exclude match the character in this case everything that is not an = sign.






                            share|improve this answer






























                              3















                              3











                              3









                              A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).



                              sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'


                              The first block grabs everything up to OS=, the second block in the capture group (denoted by ()'s) matches upto the next = and can be referred to in the replacement as 1. The next substitution rids the last word which is a fragment from the next assignment.



                              Note: the ^ in []'s is to exclude match the character in this case everything that is not an = sign.






                              share|improve this answer
















                              A more robust way is to use sed to parse the full value until the word containing the next = is found. That way it will work on any sized value (e.g. if you have a font with one word or three words).



                              sed 's/.*OS=([^=]*).*/1/;s/ [^ ]*$//'


                              The first block grabs everything up to OS=, the second block in the capture group (denoted by ()'s) matches upto the next = and can be referred to in the replacement as 1. The next substitution rids the last word which is a fragment from the next assignment.



                              Note: the ^ in []'s is to exclude match the character in this case everything that is not an = sign.







                              share|improve this answer















                              share|improve this answer




                              share|improve this answer








                              edited Sep 5 at 15:03

























                              answered Sep 5 at 14:54









                              A.DanischewskiA.Danischewski

                              3422 silver badges7 bronze badges




                              3422 silver badges7 bronze badges
























                                  1



















                                  awk 'print $(NF-4), $(NF-3)' file

                                  OS=Arundo donax
                                  OS=Setaria italica


                                  or



                                  awk -F= 'sub(/OX/,""); print $(NF-3)' file 

                                  Arundo donax
                                  Setaria italica





                                  share|improve this answer
































                                    1



















                                    awk 'print $(NF-4), $(NF-3)' file

                                    OS=Arundo donax
                                    OS=Setaria italica


                                    or



                                    awk -F= 'sub(/OX/,""); print $(NF-3)' file 

                                    Arundo donax
                                    Setaria italica





                                    share|improve this answer






























                                      1















                                      1











                                      1









                                      awk 'print $(NF-4), $(NF-3)' file

                                      OS=Arundo donax
                                      OS=Setaria italica


                                      or



                                      awk -F= 'sub(/OX/,""); print $(NF-3)' file 

                                      Arundo donax
                                      Setaria italica





                                      share|improve this answer
















                                      awk 'print $(NF-4), $(NF-3)' file

                                      OS=Arundo donax
                                      OS=Setaria italica


                                      or



                                      awk -F= 'sub(/OX/,""); print $(NF-3)' file 

                                      Arundo donax
                                      Setaria italica






                                      share|improve this answer















                                      share|improve this answer




                                      share|improve this answer








                                      edited Sep 7 at 17:43

























                                      answered Sep 6 at 22:58









                                      Claes WiknerClaes Wikner

                                      1471 silver badge3 bronze badges




                                      1471 silver badge3 bronze badges































                                          draft saved

                                          draft discarded















































                                          Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f539203%2fextract-specific-characters-from-each-line%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown









                                          Popular posts from this blog

                                          Tamil (spriik) Luke uk diar | Nawigatjuun

                                          Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                                          Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?