How to delete random lines from a file?How do I delete a random line in a text file from a given range?How to delete words from txt file, that exists on another txt file?How to find lines matching a pattern and delete them?Delete multiple lines from file when text is foundRemove any trailing blank lines or lines with whitespaces from end of fileDeleting duplicate lines in text file…?How to delete the lines from a file that do not contain dot?How to remove particular words from lines of a text file?How to delete specific lines from a txt fileDelete ranges of lines, but skip the comments which come in between the lines

"The cow" OR "a cow" OR "cows" in this context

SFDX - Create Objects with Custom Properties

Why didn't the Space Shuttle bounce back into space as many times as possible so as to lose a lot of kinetic energy up there?

Von Neumann Extractor - Which bit is retained?

How long after the last departure shall the airport stay open for an emergency return?

Injection into a proper class and choice without regularity

What is the term for a person whose job is to place products on shelves in stores?

What makes accurate emulation of old systems a difficult task?

Why do distances seem to matter in the Foundation world?

Unknown code in script

Contradiction proof for inequality of P and NP?

Creating a chemical industry from a medieval tech level without petroleum

Is there metaphorical meaning of "aus der Haft entlassen"?

Cayley's Matrix Notation

Nails holding drywall

Is there a word for the censored part of a video?

What to do with someone that cheated their way through university and a PhD program?

Where was the County of Thurn und Taxis located?

Find the identical rows in a matrix

Can a stored procedure reference the database in which it is stored?

Mistake in years of experience in resume?

Can someone publish a story that happened to you?

Multiple options vs single option UI

Why do real positive eigenvalues result in an unstable system? What about eigenvalues between 0 and 1? or 1?



How to delete random lines from a file?


How do I delete a random line in a text file from a given range?How to delete words from txt file, that exists on another txt file?How to find lines matching a pattern and delete them?Delete multiple lines from file when text is foundRemove any trailing blank lines or lines with whitespaces from end of fileDeleting duplicate lines in text file…?How to delete the lines from a file that do not contain dot?How to remove particular words from lines of a text file?How to delete specific lines from a txt fileDelete ranges of lines, but skip the comments which come in between the lines






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








6















I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?










share|improve this question



















  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    Apr 13 at 17:44











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    Apr 15 at 16:20












  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    Apr 15 at 22:52

















6















I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?










share|improve this question



















  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    Apr 13 at 17:44











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    Apr 15 at 16:20












  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    Apr 15 at 22:52













6












6








6








I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?










share|improve this question
















I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?







command-line text-processing






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 15 at 16:14









dessert

25.7k674108




25.7k674108










asked Apr 13 at 13:54









Pravin GaddamPravin Gaddam

334




334







  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    Apr 13 at 17:44











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    Apr 15 at 16:20












  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    Apr 15 at 22:52












  • 2





    related: Python: Choose random line from file, then delete that line

    – jfs
    Apr 13 at 17:44











  • To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

    – dessert
    Apr 15 at 16:20












  • @jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

    – dessert
    Apr 15 at 22:52







2




2





related: Python: Choose random line from file, then delete that line

– jfs
Apr 13 at 17:44





related: Python: Choose random line from file, then delete that line

– jfs
Apr 13 at 17:44













To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

– dessert
Apr 15 at 16:20






To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.

– dessert
Apr 15 at 16:20














@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

– dessert
Apr 15 at 22:52





@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)

– dessert
Apr 15 at 22:52










5 Answers
5






active

oldest

votes


















14














You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



filename="/PATH/TO/FILE"
number=5

line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"

sed -i.bak -e "$sed_script" "$filename"


Or in one line (after defining the filename and number variables or replacing them manually):



sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.




To break down and explain the rest of the command:



sed -e "SCRIPT" "$filename"


runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:




  • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.



    • In your case, this should return roughly 10000 according to the size you mentioned in the question.



  • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).



    • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.



  • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.



    • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.


The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






share|improve this answer
































    6














    You can use for loop to get random number and use sed command to delete the line.



    for i in 0..5;
    do sed -i "$((1 + RANDOM % 10000))d" filename;
    done





    share|improve this answer

























    • 0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

      – dessert
      Apr 14 at 16:15


















    5














    Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



    sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


    Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






    share|improve this answer




















    • 2





      What if two or more of these random numbers are the same?

      – dessert
      Apr 14 at 16:16


















    3














    An answer on U&L has this nice awk solution for the problem:





    <file awk -v p=5 -v n=$(<file wc -l) '
    BEGIN srand()
    rand() * n-- < p p--; next
    print'


    Explanation




    • -v p=5 – set variable p holding the number of lines to delete


    • -v n=$(<file wc -l) – set variable n holding the line count of the file


    • BEGIN srand() – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


    • rand() * n-- < p … – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


    • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


    • print – print the currently processed line

    The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



    Example run



    I created a file with the letters a–e each in an own line with



    printf '%sn' a..e >file


    and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
    n=5 p=1
    n=4 p=0 b
    n=3 p=0 c
    n=2 p=0 d
    n=1 p=0 e
    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
    n=5 p=1 a
    n=4 p=1 b
    n=3 p=1
    n=2 p=0 d
    n=1 p=0 e
    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
    n=5 p=1 a
    n=4 p=1 b
    n=3 p=1 c
    n=2 p=1 d
    n=1 p=1


    Further reading



    • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions





    share|improve this answer
































      2














      With gawk, drop the following code into a file (called say, del_random)



      function randint(n)

      return int(n * rand()) + 1


      BEGINFILE
      command = sprintf("wc -l <"%s"", FILENAME)
      command
      !(FNR in arr)


      and then execute it as



      gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


      Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
      The -i inplace is the gawk equivalent to sed's -i



      On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



      gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





      share|improve this answer

























        Your Answer








        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "89"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );













        draft saved

        draft discarded


















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown

























        5 Answers
        5






        active

        oldest

        votes








        5 Answers
        5






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        14














        You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



        filename="/PATH/TO/FILE"
        number=5

        line_count="$(wc -l < "$filename")"
        line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
        sed_script="$(printf '%dd;' $line_nums_to_delete)"

        sed -i.bak -e "$sed_script" "$filename"


        Or in one line (after defining the filename and number variables or replacing them manually):



        sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


        The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



        Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.




        To break down and explain the rest of the command:



        sed -e "SCRIPT" "$filename"


        runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



        Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:




        • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.



          • In your case, this should return roughly 10000 according to the size you mentioned in the question.



        • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).



          • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.



        • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.



          • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.


        The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



        All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






        share|improve this answer





























          14














          You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



          filename="/PATH/TO/FILE"
          number=5

          line_count="$(wc -l < "$filename")"
          line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
          sed_script="$(printf '%dd;' $line_nums_to_delete)"

          sed -i.bak -e "$sed_script" "$filename"


          Or in one line (after defining the filename and number variables or replacing them manually):



          sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


          The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



          Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.




          To break down and explain the rest of the command:



          sed -e "SCRIPT" "$filename"


          runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



          Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:




          • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.



            • In your case, this should return roughly 10000 according to the size you mentioned in the question.



          • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).



            • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.



          • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.



            • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.


          The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



          All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






          share|improve this answer



























            14












            14








            14







            You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



            filename="/PATH/TO/FILE"
            number=5

            line_count="$(wc -l < "$filename")"
            line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
            sed_script="$(printf '%dd;' $line_nums_to_delete)"

            sed -i.bak -e "$sed_script" "$filename"


            Or in one line (after defining the filename and number variables or replacing them manually):



            sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


            The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



            Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.




            To break down and explain the rest of the command:



            sed -e "SCRIPT" "$filename"


            runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



            Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:




            • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.



              • In your case, this should return roughly 10000 according to the size you mentioned in the question.



            • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).



              • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.



            • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.



              • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.


            The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



            All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.






            share|improve this answer















            You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.



            filename="/PATH/TO/FILE"
            number=5

            line_count="$(wc -l < "$filename")"
            line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
            sed_script="$(printf '%dd;' $line_nums_to_delete)"

            sed -i.bak -e "$sed_script" "$filename"


            Or in one line (after defining the filename and number variables or replacing them manually):



            sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"


            The -i.bak switch tells sed to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak appended to the file name. If you don't want it to make a copy, just write -i.



            Btw, you don't have to use variables as I did. You can also directly replace "$number" and both occurrences of "$filename" with the appropriate values. I just did it this way for clarity.




            To break down and explain the rest of the command:



            sed -e "SCRIPT" "$filename"


            runs the text processing tool sed on the file specified by the filename variable, applying the instructions given as SCRIPT argument.



            Our SCRIPT is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:




            • wc -l < "$filename" reads in the file specified by the filename variable and outputs the number of lines this file contains.



              • In your case, this should return roughly 10000 according to the size you mentioned in the question.



            • shuf -i "1-$line_count" -n "$number returns as many unique random numbers as specified by the number variable in the range 1 to $line_count (both boundaries inclusive).



              • For example, shuf -i 1-6 -n 2 would emulate throwing two regular six-sided dies.



            • printf '%dd;' ARGUMENTS returns a formatted string, taking in all ARGUMENTS (not quoted this time to treat each random number as a separate argument). The format string %dd; will be repeated while there are arguments left, and %d will be replaced with the argument represented as a decimal number.



              • Therefore, e.g. an input of 1 7 42 would result in an output of 1d;7d;42d;.


            The resulting $sed_script is finally our SCRIPT for sed. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d is the command to delete the specified line, and ; separates multiple sed script commands.



            All together, the whole command first examines your input file as specified in the filename variable and counts its lines. Then it generates number many unique random numbers in the range 1 to the number of lines and constructs a sed script out of these to delete each mentioned random line. Finally sed runs that script on the file, modifying it.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Apr 14 at 11:47

























            answered Apr 13 at 15:16









            Byte CommanderByte Commander

            67.2k27181311




            67.2k27181311























                6














                You can use for loop to get random number and use sed command to delete the line.



                for i in 0..5;
                do sed -i "$((1 + RANDOM % 10000))d" filename;
                done





                share|improve this answer

























                • 0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                  – dessert
                  Apr 14 at 16:15















                6














                You can use for loop to get random number and use sed command to delete the line.



                for i in 0..5;
                do sed -i "$((1 + RANDOM % 10000))d" filename;
                done





                share|improve this answer

























                • 0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                  – dessert
                  Apr 14 at 16:15













                6












                6








                6







                You can use for loop to get random number and use sed command to delete the line.



                for i in 0..5;
                do sed -i "$((1 + RANDOM % 10000))d" filename;
                done





                share|improve this answer















                You can use for loop to get random number and use sed command to delete the line.



                for i in 0..5;
                do sed -i "$((1 + RANDOM % 10000))d" filename;
                done






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 15 at 22:57









                dessert

                25.7k674108




                25.7k674108










                answered Apr 13 at 14:40









                ShivadityaShivaditya

                44934




                44934












                • 0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                  – dessert
                  Apr 14 at 16:15

















                • 0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                  – dessert
                  Apr 14 at 16:15
















                0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                – dessert
                Apr 14 at 16:15





                0..5 expands to 0 1 2 3 4 5, so this deletes six lines, you probably mean 1..5. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?

                – dessert
                Apr 14 at 16:15











                5














                Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



                sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


                Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






                share|improve this answer




















                • 2





                  What if two or more of these random numbers are the same?

                  – dessert
                  Apr 14 at 16:16















                5














                Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



                sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


                Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






                share|improve this answer




















                • 2





                  What if two or more of these random numbers are the same?

                  – dessert
                  Apr 14 at 16:16













                5












                5








                5







                Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



                sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


                Will select five random numbers between 1 and 10000 and delete those lines in a single operation.






                share|improve this answer















                Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:



                sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename


                Will select five random numbers between 1 and 10000 and delete those lines in a single operation.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 15 at 22:58









                dessert

                25.7k674108




                25.7k674108










                answered Apr 13 at 19:01









                Jesse_bJesse_b

                1512




                1512







                • 2





                  What if two or more of these random numbers are the same?

                  – dessert
                  Apr 14 at 16:16












                • 2





                  What if two or more of these random numbers are the same?

                  – dessert
                  Apr 14 at 16:16







                2




                2





                What if two or more of these random numbers are the same?

                – dessert
                Apr 14 at 16:16





                What if two or more of these random numbers are the same?

                – dessert
                Apr 14 at 16:16











                3














                An answer on U&L has this nice awk solution for the problem:





                <file awk -v p=5 -v n=$(<file wc -l) '
                BEGIN srand()
                rand() * n-- < p p--; next
                print'


                Explanation




                • -v p=5 – set variable p holding the number of lines to delete


                • -v n=$(<file wc -l) – set variable n holding the line count of the file


                • BEGIN srand() – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                • rand() * n-- < p … – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                • print – print the currently processed line

                The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                Example run



                I created a file with the letters a–e each in an own line with



                printf '%sn' a..e >file


                and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                n=5 p=1
                n=4 p=0 b
                n=3 p=0 c
                n=2 p=0 d
                n=1 p=0 e
                $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                n=5 p=1 a
                n=4 p=1 b
                n=3 p=1
                n=2 p=0 d
                n=1 p=0 e
                $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                n=5 p=1 a
                n=4 p=1 b
                n=3 p=1 c
                n=2 p=1 d
                n=1 p=1


                Further reading



                • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions





                share|improve this answer





























                  3














                  An answer on U&L has this nice awk solution for the problem:





                  <file awk -v p=5 -v n=$(<file wc -l) '
                  BEGIN srand()
                  rand() * n-- < p p--; next
                  print'


                  Explanation




                  • -v p=5 – set variable p holding the number of lines to delete


                  • -v n=$(<file wc -l) – set variable n holding the line count of the file


                  • BEGIN srand() – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                  • rand() * n-- < p … – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                  • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                  • print – print the currently processed line

                  The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                  Example run



                  I created a file with the letters a–e each in an own line with



                  printf '%sn' a..e >file


                  and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                  n=5 p=1
                  n=4 p=0 b
                  n=3 p=0 c
                  n=2 p=0 d
                  n=1 p=0 e
                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                  n=5 p=1 a
                  n=4 p=1 b
                  n=3 p=1
                  n=2 p=0 d
                  n=1 p=0 e
                  $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                  n=5 p=1 a
                  n=4 p=1 b
                  n=3 p=1 c
                  n=2 p=1 d
                  n=1 p=1


                  Further reading



                  • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions





                  share|improve this answer



























                    3












                    3








                    3







                    An answer on U&L has this nice awk solution for the problem:





                    <file awk -v p=5 -v n=$(<file wc -l) '
                    BEGIN srand()
                    rand() * n-- < p p--; next
                    print'


                    Explanation




                    • -v p=5 – set variable p holding the number of lines to delete


                    • -v n=$(<file wc -l) – set variable n holding the line count of the file


                    • BEGIN srand() – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                    • rand() * n-- < p … – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                    • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                    • print – print the currently processed line

                    The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                    Example run



                    I created a file with the letters a–e each in an own line with



                    printf '%sn' a..e >file


                    and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                    n=5 p=1
                    n=4 p=0 b
                    n=3 p=0 c
                    n=2 p=0 d
                    n=1 p=0 e
                    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                    n=5 p=1 a
                    n=4 p=1 b
                    n=3 p=1
                    n=2 p=0 d
                    n=1 p=0 e
                    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                    n=5 p=1 a
                    n=4 p=1 b
                    n=3 p=1 c
                    n=2 p=1 d
                    n=1 p=1


                    Further reading



                    • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions





                    share|improve this answer















                    An answer on U&L has this nice awk solution for the problem:





                    <file awk -v p=5 -v n=$(<file wc -l) '
                    BEGIN srand()
                    rand() * n-- < p p--; next
                    print'


                    Explanation




                    • -v p=5 – set variable p holding the number of lines to delete


                    • -v n=$(<file wc -l) – set variable n holding the line count of the file


                    • BEGIN srand() – before processing the file, set the seed for generating random numbers, that’s the prerequisite for using rand() to get truly™ random numbers


                    • rand() * n-- < p … – A conditional expression running the part in braces if it is true. rand() creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line count n, which is decreased by 1. If the result is smaller than p, the expression is true.


                    • p--; next – decrease p by 1 and proceed to the next line ignoring subsequent commands


                    • print – print the currently processed line

                    The second and last line of the awk script are run for every line of the input file, so on every line there’s a chance of p / n for the line to be skipped and not printed, while the default action is to just print the line.



                    Example run



                    I created a file with the letters a–e each in an own line with



                    printf '%sn' a..e >file


                    and set p=1 to delete one line randomly. I changed the code to also print the values of n and p for each line before any of them is decreased.



                    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                    n=5 p=1
                    n=4 p=0 b
                    n=3 p=0 c
                    n=2 p=0 d
                    n=1 p=0 e
                    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                    n=5 p=1 a
                    n=4 p=1 b
                    n=3 p=1
                    n=2 p=0 d
                    n=1 p=0 e
                    $ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
                    n=5 p=1 a
                    n=4 p=1 b
                    n=3 p=1 c
                    n=2 p=1 d
                    n=1 p=1


                    Further reading



                    • The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Apr 15 at 23:01

























                    answered Apr 15 at 22:25









                    dessertdessert

                    25.7k674108




                    25.7k674108





















                        2














                        With gawk, drop the following code into a file (called say, del_random)



                        function randint(n)

                        return int(n * rand()) + 1


                        BEGINFILE
                        command = sprintf("wc -l <"%s"", FILENAME)
                        command
                        !(FNR in arr)


                        and then execute it as



                        gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


                        Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
                        The -i inplace is the gawk equivalent to sed's -i



                        On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



                        gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





                        share|improve this answer





























                          2














                          With gawk, drop the following code into a file (called say, del_random)



                          function randint(n)

                          return int(n * rand()) + 1


                          BEGINFILE
                          command = sprintf("wc -l <"%s"", FILENAME)
                          command
                          !(FNR in arr)


                          and then execute it as



                          gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


                          Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
                          The -i inplace is the gawk equivalent to sed's -i



                          On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



                          gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





                          share|improve this answer



























                            2












                            2








                            2







                            With gawk, drop the following code into a file (called say, del_random)



                            function randint(n)

                            return int(n * rand()) + 1


                            BEGINFILE
                            command = sprintf("wc -l <"%s"", FILENAME)
                            command
                            !(FNR in arr)


                            and then execute it as



                            gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


                            Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
                            The -i inplace is the gawk equivalent to sed's -i



                            On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



                            gawk -i inplace -v lines_to_del=5 -f del_random file1 file2





                            share|improve this answer















                            With gawk, drop the following code into a file (called say, del_random)



                            function randint(n)

                            return int(n * rand()) + 1


                            BEGINFILE
                            command = sprintf("wc -l <"%s"", FILENAME)
                            command
                            !(FNR in arr)


                            and then execute it as



                            gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2


                            Any number of files can be passed (file1, file2, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del parameter as show.
                            The -i inplace is the gawk equivalent to sed's -i



                            On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del once as follows:



                            gawk -i inplace -v lines_to_del=5 -f del_random file1 file2






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Apr 15 at 22:59









                            dessert

                            25.7k674108




                            25.7k674108










                            answered Apr 13 at 20:06









                            iruvariruvar

                            11510




                            11510



























                                draft saved

                                draft discarded
















































                                Thanks for contributing an answer to Ask Ubuntu!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Tamil (spriik) Luke uk diar | Nawigatjuun

                                Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                                Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?