Merge two files depending on multiple matching columnsMerging/joining a lot of csv files with numeric digits in the file namebash - merging 2 files using 2 common columns and add up the values of the 3rd columnawk - compare files and print lines from both filesawk - compare 2 files and print columns from both filesIf two columns partially match, replace third with awkHow to compare and replace the value at particular location with awkget specific lines from File1, others from File2 and print them in File3Awk-compare 2 files using multiple columns and print lines from both filescompare two lines and print unmatched words from two files
how do you value what your leisure time is worth?
What do you call the fallacy of thinking that some action A will guarantee some outcome B, when in reality B depends on multiple other conditions?
Prefill webform with civicrm activity data
Should I reveal productivity tricks to peers, or keep them to myself in order to be more productive than the others?
Does UPDATE without WHERE clause lock a table in PostgreSQL?
How long could a human survive completely without the immune system?
Why does allocating a single 2D array take longer than a loop allocating multiple 1D arrays of the same total size and shape?
Can massive damage kill you while at 0 HP?
How do lasers measure short distances (<1cm) when electronics are too slow for time-of-flight to work?
Does the Creighton Method of Natural Family Planning have a failure rate of 3.2% or less?
Would it be easier to colonise a living world or a dead world?
What is the next number in the series: 21, 21, 23, 20, 5, 25, 31, 24,?
What is /dev/null and why can't I use hx on it?
Transiting through Switzerland by coach with lots of cash
A sentient carnivorous species trying to preserve life. How could they find a new food source?
D&D Monsters and Copyright
What Supreme Court cases, other than Nixon v. United States, have directly applied or interpreted U.S. Const. Art. I, Section 3, Clause 6?
Coffee Grounds and Gritty Butter Cream Icing
Landing Hero: Product snippets VS illustrations
What if a quote contains an error
Why can I ping 10.0.0.0/8 addresses from a 192.168.1.0/24 subnet?
3x3 self-descriptive squares
Was post-exposure perspective correction ever practiced in the darkroom?
How to Skip One Bullet in ITEMIZE?
Merge two files depending on multiple matching columns
Merging/joining a lot of csv files with numeric digits in the file namebash - merging 2 files using 2 common columns and add up the values of the 3rd columnawk - compare files and print lines from both filesawk - compare 2 files and print columns from both filesIf two columns partially match, replace third with awkHow to compare and replace the value at particular location with awkget specific lines from File1, others from File2 and print them in File3Awk-compare 2 files using multiple columns and print lines from both filescompare two lines and print unmatched words from two files
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
Im trying to join two files depending on multiple matching columns. I want to compare columns 1,2,4,5 from file 1 with columns 1,2,4,5 from file 2 and then merge matching lines in file 3 with column 3 of file 1 and all columns from files 2.
I've already tried several awk command. For example : 1) awk 'BEGINFS=OFS=","NR==FNRa[$1$2$4$5]=$3;next $1$2$4$5 in aprint $0, a[$1$2$4$5]' file2 file1 > file3
2) awk 'NR==FNR a[$1$2$4$5] = $3; next $1$2$4$5 in a' file2 file1 >file3
but i'm getting empty output. Could anyone help me with this issue ? Thank you
An example of what my files look like
File 1 :
CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G
File 2 :
Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
Wanted Output :
ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
awk annotation
add a comment
|
Im trying to join two files depending on multiple matching columns. I want to compare columns 1,2,4,5 from file 1 with columns 1,2,4,5 from file 2 and then merge matching lines in file 3 with column 3 of file 1 and all columns from files 2.
I've already tried several awk command. For example : 1) awk 'BEGINFS=OFS=","NR==FNRa[$1$2$4$5]=$3;next $1$2$4$5 in aprint $0, a[$1$2$4$5]' file2 file1 > file3
2) awk 'NR==FNR a[$1$2$4$5] = $3; next $1$2$4$5 in a' file2 file1 >file3
but i'm getting empty output. Could anyone help me with this issue ? Thank you
An example of what my files look like
File 1 :
CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G
File 2 :
Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
Wanted Output :
ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
awk annotation
add a comment
|
Im trying to join two files depending on multiple matching columns. I want to compare columns 1,2,4,5 from file 1 with columns 1,2,4,5 from file 2 and then merge matching lines in file 3 with column 3 of file 1 and all columns from files 2.
I've already tried several awk command. For example : 1) awk 'BEGINFS=OFS=","NR==FNRa[$1$2$4$5]=$3;next $1$2$4$5 in aprint $0, a[$1$2$4$5]' file2 file1 > file3
2) awk 'NR==FNR a[$1$2$4$5] = $3; next $1$2$4$5 in a' file2 file1 >file3
but i'm getting empty output. Could anyone help me with this issue ? Thank you
An example of what my files look like
File 1 :
CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G
File 2 :
Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
Wanted Output :
ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
awk annotation
Im trying to join two files depending on multiple matching columns. I want to compare columns 1,2,4,5 from file 1 with columns 1,2,4,5 from file 2 and then merge matching lines in file 3 with column 3 of file 1 and all columns from files 2.
I've already tried several awk command. For example : 1) awk 'BEGINFS=OFS=","NR==FNRa[$1$2$4$5]=$3;next $1$2$4$5 in aprint $0, a[$1$2$4$5]' file2 file1 > file3
2) awk 'NR==FNR a[$1$2$4$5] = $3; next $1$2$4$5 in a' file2 file1 >file3
but i'm getting empty output. Could anyone help me with this issue ? Thank you
An example of what my files look like
File 1 :
CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G
File 2 :
Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
Wanted Output :
ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
awk annotation
awk annotation
edited Apr 17 at 14:20
user68186
18.2k8 gold badges53 silver badges77 bronze badges
18.2k8 gold badges53 silver badges77 bronze badges
asked Apr 17 at 13:43
Rokhaya BARokhaya BA
111 bronze badge
111 bronze badge
add a comment
|
add a comment
|
1 Answer
1
active
oldest
votes
The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.
What you seem to want is
awk -F, '
BEGINOFS=FS
NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next
ind = $1 FS $2 FS $4 FS $5
ind in a print $3, a[ind]
' file2 file1 > file3
(separating the fields with FS
i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef"
and "abc""def"
).
Ex.
$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (withdos2unix
for example) or change the awk record separator accordingly e.g.BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1134662%2fmerge-two-files-depending-on-multiple-matching-columns%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.
What you seem to want is
awk -F, '
BEGINOFS=FS
NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next
ind = $1 FS $2 FS $4 FS $5
ind in a print $3, a[ind]
' file2 file1 > file3
(separating the fields with FS
i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef"
and "abc""def"
).
Ex.
$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (withdos2unix
for example) or change the awk record separator accordingly e.g.BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
add a comment
|
The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.
What you seem to want is
awk -F, '
BEGINOFS=FS
NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next
ind = $1 FS $2 FS $4 FS $5
ind in a print $3, a[ind]
' file2 file1 > file3
(separating the fields with FS
i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef"
and "abc""def"
).
Ex.
$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (withdos2unix
for example) or change the awk record separator accordingly e.g.BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
add a comment
|
The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.
What you seem to want is
awk -F, '
BEGINOFS=FS
NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next
ind = $1 FS $2 FS $4 FS $5
ind in a print $3, a[ind]
' file2 file1 > file3
(separating the fields with FS
i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef"
and "abc""def"
).
Ex.
$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.
What you seem to want is
awk -F, '
BEGINOFS=FS
NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next
ind = $1 FS $2 FS $4 FS $5
ind in a print $3, a[ind]
' file2 file1 > file3
(separating the fields with FS
i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef"
and "abc""def"
).
Ex.
$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
answered Apr 17 at 14:13
steeldriversteeldriver
79.8k12 gold badges131 silver badges214 bronze badges
79.8k12 gold badges131 silver badges214 bronze badges
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (withdos2unix
for example) or change the awk record separator accordingly e.g.BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
add a comment
|
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (withdos2unix
for example) or change the awk record separator accordingly e.g.BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.
– Rokhaya BA
Apr 17 at 14:40
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with
dos2unix
for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with
dos2unix
for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS
– steeldriver
Apr 17 at 15:47
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.
– Rokhaya BA
Apr 18 at 8:27
add a comment
|
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1134662%2fmerge-two-files-depending-on-multiple-matching-columns%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown