How to delete random lines from a file?How do I delete a random line in a text file from a given range?How to find lines matching a pattern and delete them?How to delete all lines before and after the specific strings in the file?Delete multiple lines from file when text is foundRemove any trailing blank lines or lines with whitespaces from end of fileDeleting duplicate lines in text file…?How to delete the lines from a file that do not contain dot?How to remove particular words from lines of a text file?How to delete specific lines from a txt fileDelete ranges of lines, but skip the comments which come in between the lines
What are these ingforms of learning?
Is it really necessary to have a four hour meeting in Sprint planning?
What do you do if you have developments on your paper during the long peer review process?
A drug that allows people to survive on less food
Going to France with limited French for a day
Does the Orange League not count as an official Pokemon League, making the Alolan League his first-ever win?
Which museums have artworks of all four Ninja Turtles' namesakes?
Is this a Sherman, and if so what model?
What's the story to "WotC gave up on fixing Polymorph"?
Guitar tuning (EADGBE), "perfect" fourths?
reverse a list of generic type
Social leper versus social leopard
Find missing number in the transformation
Do all creatures have souls?
How do I improve in sight reading?
Will Proving or Disproving of any of the following have effects on Chemistry in general?
I reverse the source code, you negate the output!
Hilbert's hotel, why can't I repeat it infinitely many times?
Is it impolite to ask for halal food when traveling to and in Thailand?
Why does NASA publish all the results/data it gets?
My 15 year old son is gay. How do I express my feelings about this?
What is the need of methods like GET and POST in the HTTP protocol?
Allocating credit card points
Wrong result by FindRoot
How to delete random lines from a file?
How do I delete a random line in a text file from a given range?How to find lines matching a pattern and delete them?How to delete all lines before and after the specific strings in the file?Delete multiple lines from file when text is foundRemove any trailing blank lines or lines with whitespaces from end of fileDeleting duplicate lines in text file…?How to delete the lines from a file that do not contain dot?How to remove particular words from lines of a text file?How to delete specific lines from a txt fileDelete ranges of lines, but skip the comments which come in between the lines
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
add a comment
|
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
2
related: Python: Choose random line from file, then delete that line
– jfs
Apr 13 at 17:44
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
Apr 15 at 16:20
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
Apr 15 at 22:52
add a comment
|
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
I have a file which contains 10000 lines and I want to delete 5 randomly determined lines from it. How can I do that?
command-line text-processing
command-line text-processing
edited Apr 15 at 16:14
dessert
29.1k7 gold badges87 silver badges120 bronze badges
29.1k7 gold badges87 silver badges120 bronze badges
asked Apr 13 at 13:54
Pravin GaddamPravin Gaddam
384 bronze badges
384 bronze badges
2
related: Python: Choose random line from file, then delete that line
– jfs
Apr 13 at 17:44
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
Apr 15 at 16:20
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
Apr 15 at 22:52
add a comment
|
2
related: Python: Choose random line from file, then delete that line
– jfs
Apr 13 at 17:44
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
Apr 15 at 16:20
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
Apr 15 at 22:52
2
2
related: Python: Choose random line from file, then delete that line
– jfs
Apr 13 at 17:44
related: Python: Choose random line from file, then delete that line
– jfs
Apr 13 at 17:44
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
Apr 15 at 16:20
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
Apr 15 at 16:20
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
Apr 15 at 22:52
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
Apr 15 at 22:52
add a comment
|
5 Answers
5
active
oldest
votes
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename
and number
variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak
switch tells sed
to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak
appended to the file name. If you don't want it to make a copy, just write -i
.
Btw, you don't have to use variables as I did. You can also directly replace "$number"
and both occurrences of "$filename"
with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed
on the file specified by the filename
variable, applying the instructions given as SCRIPT
argument.
Our SCRIPT
is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"
reads in the file specified by thefilename
variable and outputs the number of lines this file contains.- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$number
returns as many unique random numbers as specified by thenumber
variable in the range 1 to$line_count
(both boundaries inclusive).- For example,
shuf -i 1-6 -n 2
would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTS
returns a formatted string, taking in allARGUMENTS
(not quoted this time to treat each random number as a separate argument). The format string%dd;
will be repeated while there are arguments left, and%d
will be replaced with the argument represented as a decimal number.- Therefore, e.g. an input of
1 7 42
would result in an output of1d;7d;42d;
.
- Therefore, e.g. an input of
The resulting $sed_script
is finally our SCRIPT
for sed
. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d
is the command to delete the specified line, and ;
separates multiple sed
script commands.
All together, the whole command first examines your input file as specified in the filename
variable and counts its lines. Then it generates number
many unique random numbers in the range 1 to the number of lines and constructs a sed
script out of these to delete each mentioned random line. Finally sed
runs that script on the file, modifying it.
add a comment
|
You can use for loop to get random number and use sed command to delete the line.
for i in 0..5;
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
0..5
expands to0 1 2 3 4 5
, so this deletes six lines, you probably mean1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
Apr 14 at 16:15
add a comment
|
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
2
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
add a comment
|
An answer on U&L has this nice awk
solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN srand()
rand() * n-- < p p--; next
print'
Explanation
-v p=5
– set variablep
holding the number of lines to delete-v n=$(<file wc -l)
– set variablen
holding the line count of the fileBEGIN srand()
– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()
to get truly™ random numbersrand() * n-- < p …
– A conditional expression running the part in braces if it is true.rand()
creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn
, which is decreased by 1. If the result is smaller thanp
, the expression is true.p--; next
– decreasep
by 1 and proceed to the next line ignoring subsequent commandsprint
– print the currently processed line
The second and last line of the awk
script are run for every line of the input file, so on every line there’s a chance of p / n
for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' a..e >file
and set p=1
to delete one line randomly. I changed the code to also print the values of n
and p
for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
add a comment
|
With gawk, drop the following code into a file (called say, del_random
)
function randint(n)
return int(n * rand()) + 1
BEGINFILE getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1
, file2
, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del
parameter as show.
The -i inplace
is the gawk
equivalent to sed
's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del
once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename
and number
variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak
switch tells sed
to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak
appended to the file name. If you don't want it to make a copy, just write -i
.
Btw, you don't have to use variables as I did. You can also directly replace "$number"
and both occurrences of "$filename"
with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed
on the file specified by the filename
variable, applying the instructions given as SCRIPT
argument.
Our SCRIPT
is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"
reads in the file specified by thefilename
variable and outputs the number of lines this file contains.- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$number
returns as many unique random numbers as specified by thenumber
variable in the range 1 to$line_count
(both boundaries inclusive).- For example,
shuf -i 1-6 -n 2
would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTS
returns a formatted string, taking in allARGUMENTS
(not quoted this time to treat each random number as a separate argument). The format string%dd;
will be repeated while there are arguments left, and%d
will be replaced with the argument represented as a decimal number.- Therefore, e.g. an input of
1 7 42
would result in an output of1d;7d;42d;
.
- Therefore, e.g. an input of
The resulting $sed_script
is finally our SCRIPT
for sed
. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d
is the command to delete the specified line, and ;
separates multiple sed
script commands.
All together, the whole command first examines your input file as specified in the filename
variable and counts its lines. Then it generates number
many unique random numbers in the range 1 to the number of lines and constructs a sed
script out of these to delete each mentioned random line. Finally sed
runs that script on the file, modifying it.
add a comment
|
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename
and number
variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak
switch tells sed
to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak
appended to the file name. If you don't want it to make a copy, just write -i
.
Btw, you don't have to use variables as I did. You can also directly replace "$number"
and both occurrences of "$filename"
with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed
on the file specified by the filename
variable, applying the instructions given as SCRIPT
argument.
Our SCRIPT
is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"
reads in the file specified by thefilename
variable and outputs the number of lines this file contains.- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$number
returns as many unique random numbers as specified by thenumber
variable in the range 1 to$line_count
(both boundaries inclusive).- For example,
shuf -i 1-6 -n 2
would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTS
returns a formatted string, taking in allARGUMENTS
(not quoted this time to treat each random number as a separate argument). The format string%dd;
will be repeated while there are arguments left, and%d
will be replaced with the argument represented as a decimal number.- Therefore, e.g. an input of
1 7 42
would result in an output of1d;7d;42d;
.
- Therefore, e.g. an input of
The resulting $sed_script
is finally our SCRIPT
for sed
. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d
is the command to delete the specified line, and ;
separates multiple sed
script commands.
All together, the whole command first examines your input file as specified in the filename
variable and counts its lines. Then it generates number
many unique random numbers in the range 1 to the number of lines and constructs a sed
script out of these to delete each mentioned random line. Finally sed
runs that script on the file, modifying it.
add a comment
|
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename
and number
variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak
switch tells sed
to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak
appended to the file name. If you don't want it to make a copy, just write -i
.
Btw, you don't have to use variables as I did. You can also directly replace "$number"
and both occurrences of "$filename"
with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed
on the file specified by the filename
variable, applying the instructions given as SCRIPT
argument.
Our SCRIPT
is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"
reads in the file specified by thefilename
variable and outputs the number of lines this file contains.- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$number
returns as many unique random numbers as specified by thenumber
variable in the range 1 to$line_count
(both boundaries inclusive).- For example,
shuf -i 1-6 -n 2
would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTS
returns a formatted string, taking in allARGUMENTS
(not quoted this time to treat each random number as a separate argument). The format string%dd;
will be repeated while there are arguments left, and%d
will be replaced with the argument represented as a decimal number.- Therefore, e.g. an input of
1 7 42
would result in an output of1d;7d;42d;
.
- Therefore, e.g. an input of
The resulting $sed_script
is finally our SCRIPT
for sed
. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d
is the command to delete the specified line, and ;
separates multiple sed
script commands.
All together, the whole command first examines your input file as specified in the filename
variable and counts its lines. Then it generates number
many unique random numbers in the range 1 to the number of lines and constructs a sed
script out of these to delete each mentioned random line. Finally sed
runs that script on the file, modifying it.
You can probably solve it more efficiently than with a for-loop that needs to process the whole file once per line to remove.
filename="/PATH/TO/FILE"
number=5
line_count="$(wc -l < "$filename")"
line_nums_to_delete="$(shuf -i "1-$line_count" -n "$number")"
sed_script="$(printf '%dd;' $line_nums_to_delete)"
sed -i.bak -e "$sed_script" "$filename"
Or in one line (after defining the filename
and number
variables or replacing them manually):
sed -i.bak -e "$(printf '%dd;' $(shuf -i "1-$(wc -l < "$filename")" -n "$number"))" "$filename"
The -i.bak
switch tells sed
to edit/replace the input file immediately, but keep a backup copy of the original data, named like the input file but with .bak
appended to the file name. If you don't want it to make a copy, just write -i
.
Btw, you don't have to use variables as I did. You can also directly replace "$number"
and both occurrences of "$filename"
with the appropriate values. I just did it this way for clarity.
To break down and explain the rest of the command:
sed -e "SCRIPT" "$filename"
runs the text processing tool sed
on the file specified by the filename
variable, applying the instructions given as SCRIPT
argument.
Our SCRIPT
is dynamically generated in the lines above it, which run commands and assign their outputs to variables. Here we use these commands:
wc -l < "$filename"
reads in the file specified by thefilename
variable and outputs the number of lines this file contains.- In your case, this should return roughly 10000 according to the size you mentioned in the question.
shuf -i "1-$line_count" -n "$number
returns as many unique random numbers as specified by thenumber
variable in the range 1 to$line_count
(both boundaries inclusive).- For example,
shuf -i 1-6 -n 2
would emulate throwing two regular six-sided dies.
- For example,
printf '%dd;' ARGUMENTS
returns a formatted string, taking in allARGUMENTS
(not quoted this time to treat each random number as a separate argument). The format string%dd;
will be repeated while there are arguments left, and%d
will be replaced with the argument represented as a decimal number.- Therefore, e.g. an input of
1 7 42
would result in an output of1d;7d;42d;
.
- Therefore, e.g. an input of
The resulting $sed_script
is finally our SCRIPT
for sed
. A plain number is treated as address, i.e. the line number on which to apply an action, starting at 1 for the first line of the input file. d
is the command to delete the specified line, and ;
separates multiple sed
script commands.
All together, the whole command first examines your input file as specified in the filename
variable and counts its lines. Then it generates number
many unique random numbers in the range 1 to the number of lines and constructs a sed
script out of these to delete each mentioned random line. Finally sed
runs that script on the file, modifying it.
edited Apr 14 at 11:47
answered Apr 13 at 15:16
Byte Commander♦Byte Commander
72.2k28 gold badges196 silver badges330 bronze badges
72.2k28 gold badges196 silver badges330 bronze badges
add a comment
|
add a comment
|
You can use for loop to get random number and use sed command to delete the line.
for i in 0..5;
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
0..5
expands to0 1 2 3 4 5
, so this deletes six lines, you probably mean1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
Apr 14 at 16:15
add a comment
|
You can use for loop to get random number and use sed command to delete the line.
for i in 0..5;
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
0..5
expands to0 1 2 3 4 5
, so this deletes six lines, you probably mean1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
Apr 14 at 16:15
add a comment
|
You can use for loop to get random number and use sed command to delete the line.
for i in 0..5;
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
You can use for loop to get random number and use sed command to delete the line.
for i in 0..5;
do sed -i "$((1 + RANDOM % 10000))d" filename;
done
edited Apr 15 at 22:57
dessert
29.1k7 gold badges87 silver badges120 bronze badges
29.1k7 gold badges87 silver badges120 bronze badges
answered Apr 13 at 14:40
ShivadityaShivaditya
4743 silver badges5 bronze badges
4743 silver badges5 bronze badges
0..5
expands to0 1 2 3 4 5
, so this deletes six lines, you probably mean1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
Apr 14 at 16:15
add a comment
|
0..5
expands to0 1 2 3 4 5
, so this deletes six lines, you probably mean1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?
– dessert
Apr 14 at 16:15
0..5
expands to 0 1 2 3 4 5
, so this deletes six lines, you probably mean 1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?– dessert
Apr 14 at 16:15
0..5
expands to 0 1 2 3 4 5
, so this deletes six lines, you probably mean 1..5
. More importantly: What if it tries to delete e.g. line 10000 as the second one, or 9999 as the third… ?– dessert
Apr 14 at 16:15
add a comment
|
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
2
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
add a comment
|
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
2
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
add a comment
|
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
Similar to Shivaditya's answer but without a loop and will delete lines from the whole file not just the first 10 lines:
sed -i "$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d;$((1+RANDOM%10000))d" filename
Will select five random numbers between 1 and 10000 and delete those lines in a single operation.
edited Apr 15 at 22:58
dessert
29.1k7 gold badges87 silver badges120 bronze badges
29.1k7 gold badges87 silver badges120 bronze badges
answered Apr 13 at 19:01
Jesse_bJesse_b
1513 bronze badges
1513 bronze badges
2
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
add a comment
|
2
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
2
2
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
What if two or more of these random numbers are the same?
– dessert
Apr 14 at 16:16
add a comment
|
An answer on U&L has this nice awk
solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN srand()
rand() * n-- < p p--; next
print'
Explanation
-v p=5
– set variablep
holding the number of lines to delete-v n=$(<file wc -l)
– set variablen
holding the line count of the fileBEGIN srand()
– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()
to get truly™ random numbersrand() * n-- < p …
– A conditional expression running the part in braces if it is true.rand()
creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn
, which is decreased by 1. If the result is smaller thanp
, the expression is true.p--; next
– decreasep
by 1 and proceed to the next line ignoring subsequent commandsprint
– print the currently processed line
The second and last line of the awk
script are run for every line of the input file, so on every line there’s a chance of p / n
for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' a..e >file
and set p=1
to delete one line randomly. I changed the code to also print the values of n
and p
for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
add a comment
|
An answer on U&L has this nice awk
solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN srand()
rand() * n-- < p p--; next
print'
Explanation
-v p=5
– set variablep
holding the number of lines to delete-v n=$(<file wc -l)
– set variablen
holding the line count of the fileBEGIN srand()
– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()
to get truly™ random numbersrand() * n-- < p …
– A conditional expression running the part in braces if it is true.rand()
creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn
, which is decreased by 1. If the result is smaller thanp
, the expression is true.p--; next
– decreasep
by 1 and proceed to the next line ignoring subsequent commandsprint
– print the currently processed line
The second and last line of the awk
script are run for every line of the input file, so on every line there’s a chance of p / n
for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' a..e >file
and set p=1
to delete one line randomly. I changed the code to also print the values of n
and p
for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
add a comment
|
An answer on U&L has this nice awk
solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN srand()
rand() * n-- < p p--; next
print'
Explanation
-v p=5
– set variablep
holding the number of lines to delete-v n=$(<file wc -l)
– set variablen
holding the line count of the fileBEGIN srand()
– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()
to get truly™ random numbersrand() * n-- < p …
– A conditional expression running the part in braces if it is true.rand()
creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn
, which is decreased by 1. If the result is smaller thanp
, the expression is true.p--; next
– decreasep
by 1 and proceed to the next line ignoring subsequent commandsprint
– print the currently processed line
The second and last line of the awk
script are run for every line of the input file, so on every line there’s a chance of p / n
for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' a..e >file
and set p=1
to delete one line randomly. I changed the code to also print the values of n
and p
for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
An answer on U&L has this nice awk
solution for the problem:
<file awk -v p=5 -v n=$(<file wc -l) '
BEGIN srand()
rand() * n-- < p p--; next
print'
Explanation
-v p=5
– set variablep
holding the number of lines to delete-v n=$(<file wc -l)
– set variablen
holding the line count of the fileBEGIN srand()
– before processing the file, set the seed for generating random numbers, that’s the prerequisite for usingrand()
to get truly™ random numbersrand() * n-- < p …
– A conditional expression running the part in braces if it is true.rand()
creates a random number between (including) 0 and (excluding) 1, this is multiplied with the line countn
, which is decreased by 1. If the result is smaller thanp
, the expression is true.p--; next
– decreasep
by 1 and proceed to the next line ignoring subsequent commandsprint
– print the currently processed line
The second and last line of the awk
script are run for every line of the input file, so on every line there’s a chance of p / n
for the line to be skipped and not printed, while the default action is to just print the line.
Example run
I created a file with the letters a–e each in an own line with
printf '%sn' a..e >file
and set p=1
to delete one line randomly. I changed the code to also print the values of n
and p
for each line before any of them is decreased.
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1
n=4 p=0 b
n=3 p=0 c
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1
n=2 p=0 d
n=1 p=0 e
$ <file awk -v n=$(<file wc -l) -v p=1 'BEGIN srand() printf "n="n" p="p" " rand() * n-- < p p--; print ""; next print'
n=5 p=1 a
n=4 p=1 b
n=3 p=1 c
n=2 p=1 d
n=1 p=1
Further reading
- The GNU Awk User’s Guide: Chapter 9.1.2 Numeric Functions
edited Apr 15 at 23:01
answered Apr 15 at 22:25
dessertdessert
29.1k7 gold badges87 silver badges120 bronze badges
29.1k7 gold badges87 silver badges120 bronze badges
add a comment
|
add a comment
|
With gawk, drop the following code into a file (called say, del_random
)
function randint(n)
return int(n * rand()) + 1
BEGINFILE getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1
, file2
, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del
parameter as show.
The -i inplace
is the gawk
equivalent to sed
's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del
once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
add a comment
|
With gawk, drop the following code into a file (called say, del_random
)
function randint(n)
return int(n * rand()) + 1
BEGINFILE getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1
, file2
, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del
parameter as show.
The -i inplace
is the gawk
equivalent to sed
's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del
once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
add a comment
|
With gawk, drop the following code into a file (called say, del_random
)
function randint(n)
return int(n * rand()) + 1
BEGINFILE getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1
, file2
, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del
parameter as show.
The -i inplace
is the gawk
equivalent to sed
's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del
once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
With gawk, drop the following code into a file (called say, del_random
)
function randint(n)
return int(n * rand()) + 1
BEGINFILE getline total_lines
srand()
delete arr
while (length(arr) < lines_to_del)
val = randint(total_lines)
if (val in arr)
continue
arr[val] = 1
!(FNR in arr)
and then execute it as
gawk -i inplace -f del_random lines_to_del=5 file1 lines_to_del=20 file2
Any number of files can be passed (file1
, file2
, ...) and the number of lines to be deleted can be specified on a per-file basis via the lines_to_del
parameter as show.
The -i inplace
is the gawk
equivalent to sed
's -i
On the other hand if it's the same number of lines need to be deleted from each file you can set lines_to_del
once as follows:
gawk -i inplace -v lines_to_del=5 -f del_random file1 file2
edited Apr 15 at 22:59
dessert
29.1k7 gold badges87 silver badges120 bronze badges
29.1k7 gold badges87 silver badges120 bronze badges
answered Apr 13 at 20:06
iruvariruvar
11510 bronze badges
11510 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1133539%2fhow-to-delete-random-lines-from-a-file%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
related: Python: Choose random line from file, then delete that line
– jfs
Apr 13 at 17:44
To close-voters: text-processing questions are perfectly on topic, this is part of using and administering an Ubuntu system as defined on askubuntu.com/help/on-topic.
– dessert
Apr 15 at 16:20
@jfs You’re welcome to add a python answer here as well, even if you just copy it that’s very helpful! Or do you want me to add another copypasta answer? ;)
– dessert
Apr 15 at 22:52