Merge two files depending on multiple matching columnsMerging/joining a lot of csv files with numeric digits in the file namebash - merging 2 files using 2 common columns and add up the values of the 3rd columnawk - compare files and print lines from both filesawk - compare 2 files and print columns from both filesIf two columns partially match, replace third with awkHow to compare and replace the value at particular location with awkget specific lines from File1, others from File2 and print them in File3Awk-compare 2 files using multiple columns and print lines from both filescompare two lines and print unmatched words from two files

how do you value what your leisure time is worth?

What do you call the fallacy of thinking that some action A will guarantee some outcome B, when in reality B depends on multiple other conditions?

Prefill webform with civicrm activity data

Should I reveal productivity tricks to peers, or keep them to myself in order to be more productive than the others?

Does UPDATE without WHERE clause lock a table in PostgreSQL?

How long could a human survive completely without the immune system?

Why does allocating a single 2D array take longer than a loop allocating multiple 1D arrays of the same total size and shape?

Can massive damage kill you while at 0 HP?

How do lasers measure short distances (<1cm) when electronics are too slow for time-of-flight to work?

Does the Creighton Method of Natural Family Planning have a failure rate of 3.2% or less?

Would it be easier to colonise a living world or a dead world?

What is the next number in the series: 21, 21, 23, 20, 5, 25, 31, 24,?

What is /dev/null and why can't I use hx on it?

Transiting through Switzerland by coach with lots of cash

A sentient carnivorous species trying to preserve life. How could they find a new food source?

D&D Monsters and Copyright

What Supreme Court cases, other than Nixon v. United States, have directly applied or interpreted U.S. Const. Art. I, Section 3, Clause 6?

Coffee Grounds and Gritty Butter Cream Icing

Landing Hero: Product snippets VS illustrations

What if a quote contains an error

Why can I ping 10.0.0.0/8 addresses from a 192.168.1.0/24 subnet?

3x3 self-descriptive squares

Was post-exposure perspective correction ever practiced in the darkroom?

How to Skip One Bullet in ITEMIZE?

Merge two files depending on multiple matching columns

Merging/joining a lot of csv files with numeric digits in the file namebash - merging 2 files using 2 common columns and add up the values of the 3rd columnawk - compare files and print lines from both filesawk - compare 2 files and print columns from both filesIf two columns partially match, replace third with awkHow to compare and replace the value at particular location with awkget specific lines from File1, others from File2 and print them in File3Awk-compare 2 files using multiple columns and print lines from both filescompare two lines and print unmatched words from two files

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

Im trying to join two files depending on multiple matching columns. I want to compare columns 1,2,4,5 from file 1 with columns 1,2,4,5 from file 2 and then merge matching lines in file 3 with column 3 of file 1 and all columns from files 2.
I've already tried several awk command. For example : 1) awk 'BEGINFS=OFS=","NR==FNRa[$1$2$4$5]=$3;next $1$2$4$5 in aprint $0, a[$1$2$4$5]' file2 file1 > file3 2) awk 'NR==FNR a[$1$2$4$5] = $3; next $1$2$4$5 in a' file2 file1 >file3
but i'm getting empty output. Could anyone help me with this issue ? Thank you

An example of what my files look like

File 1 :

CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G

File 2 :

Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
 1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7

Wanted Output :

ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

add a comment
|

An example of what my files look like

File 1 :

CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G

File 2 :

Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
 1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7

Wanted Output :

ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

add a comment
|

An example of what my files look like

File 1 :

CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G

File 2 :

Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
 1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7

Wanted Output :

ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

An example of what my files look like

File 1 :

CHROM,POS,ID,REF,ALT
1,69270,rs201219564,A,G
1,69428,rs140739101,T,G
1,69496,rs150690004,G,A
1,69511,rs75062661,A,G

File 2 :

Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
1,69511,69511,A,C,0.031,0.450,D,0.002,0.090,B,0.001,0.040,B,-1.52,0.370,N,2.8350.402,21.5
1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047
 1,69511,69511,A,T,1.0,0.010,T,0.0,0.026,B,0.0,0.013,B,2.06,0.004,N,-1.639,0.014,0.002
1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
1,69496,69496,G,C,0.348,0.124,T,0.988,0.604,D,0.865,0.597,P,-2.54,0.550,D,2.6520.380,20.5
1,69496,69496,G,T,0.177,0.222,T,0.999,0.764,D,0.96,0.686,D,-2.69,0.574,D,1.4030.232,12.80
1,69428,69428,T,A,0.878,0.026,T,0.981,0.577,D,0.899,0.620,P,-0.16,0.095,N,0.0240.096,2.821
1,69428,69428,T,C,0.001,0.784,D,0.995,0.657,D,0.969,0.703,D,-5.34,0.846,D,4.5330.607,24.3
1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7

Wanted Output :

ID,Chr,Start,End,Ref,Alt,SIFT_score,SIFT_converted_rankscore,SIFT_pred,Polyphen2_HDIV_score,Polyphen2_HDIV_rankscore,Polyphen2_HDIV_pred,Polyphen2_HVAR_score,Polyphen2_HVAR_rankscore,Polyphen2_HVAR_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,CADD_raw,CADD_raw_rankscore,CADD_phred
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

awk annotation

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

edited Apr 17 at 14:20

user68186

18.2k8 gold badges53 silver badges77 bronze badges

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

asked Apr 17 at 13:43

Rokhaya BA

111 bronze badge

add a comment
|

1 Answer
1

active

oldest

votes

The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.

What you seem to want is

awk -F, '
 BEGINOFS=FS 
 NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next 
 ind = $1 FS $2 FS $4 FS $5 
 ind in a print $3, a[ind]
' file2 file1 > file3

(separating the fields with FS i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef" and "abc""def").

Ex.

$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.

– Rokhaya BA
Apr 17 at 14:40

@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with dos2unix for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS

– steeldriver
Apr 17 at 15:47

Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.

– Rokhaya BA
Apr 18 at 8:27

add a comment
|

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1134662%2fmerge-two-files-depending-on-multiple-matching-columns%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.

What you seem to want is

awk -F, '
 BEGINOFS=FS 
 NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next 
 ind = $1 FS $2 FS $4 FS $5 
 ind in a print $3, a[ind]
' file2 file1 > file3

(separating the fields with FS i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef" and "abc""def").

Ex.

$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.

– Rokhaya BA
Apr 17 at 14:40

@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with dos2unix for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS

– steeldriver
Apr 17 at 15:47

Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.

– Rokhaya BA
Apr 18 at 8:27

add a comment
|

The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.

What you seem to want is

awk -F, '
 BEGINOFS=FS 
 NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next 
 ind = $1 FS $2 FS $4 FS $5 
 ind in a print $3, a[ind]
' file2 file1 > file3

(separating the fields with FS i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef" and "abc""def").

Ex.

$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.

– Rokhaya BA
Apr 17 at 14:40

@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with dos2unix for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS

– steeldriver
Apr 17 at 15:47

Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.

– Rokhaya BA
Apr 18 at 8:27

add a comment
|

The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.

What you seem to want is

awk -F, '
 BEGINOFS=FS 
 NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next 
 ind = $1 FS $2 FS $4 FS $5 
 ind in a print $3, a[ind]
' file2 file1 > file3

(separating the fields with FS i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef" and "abc""def").

Ex.

$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

The most obvious thing you're missing is that your files are comma separated, but you use the default (whitespace) field separator.

What you seem to want is

awk -F, '
 BEGINOFS=FS 
 NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next 
 ind = $1 FS $2 FS $4 FS $5 
 ind in a print $3, a[ind]
' file2 file1 > file3

(separating the fields with FS i the associative array key string just guards against false matches; if you just concatenate fields you can't distinguish between "abcdef" and "abc""def").

Ex.

$ awk -F, 'BEGINOFS=FS NR==FNRa[$1 FS $2 FS $4 FS $5] = $0; next ind = $1 FS $2 FS $4 FS $5 ind in a print $3, a[ind]' file2 file1
rs140739101,1,69428,69428,T,G,0.013,0.538,D,0.999,0.764,D,0.984,0.745,D,-5.05,0.827,D,4.1000.550,23.7
rs150690004,1,69496,69496,G,A,0.404,0.103,T,0.401,0.331,B,0.061,0.254,B,-1.39,0.344,N,0.9700.187,10.49
rs75062661,1,69511,69511,A,G,0.652,0.048,T,0.0,0.026,B,0.0,0.013,B,1.54,0.007,N,-0.784,0.041,0.047

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

answered Apr 17 at 14:13

steeldriver

79.8k12 gold badges131 silver badges214 bronze badges

Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.

– Rokhaya BA
Apr 17 at 14:40

@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with dos2unix for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS

– steeldriver
Apr 17 at 15:47

Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.

– Rokhaya BA
Apr 18 at 8:27

add a comment
|

Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.

– Rokhaya BA
Apr 17 at 14:40

@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with dos2unix for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS

– steeldriver
Apr 17 at 15:47

Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.

– Rokhaya BA
Apr 18 at 8:27

Thank you for your answer. Actually i did try to specify the separator but i get the same result. I also tried to delete end lines and then sorted files. But it doesnt change anything. I still get empty output.

– Rokhaya BA
Apr 17 at 14:40

@RokhayaBA do your files have DOS-style (CRLF) line endings by any chance? if so, either convert them to Unix style (with dos2unix for example) or change the awk record separator accordingly e.g. BEGINRS="rn"; OFS=FS

– steeldriver
Apr 17 at 15:47

Thank you very much. That was the problem. I was trying to delete line endings for each files first (tr 'r' 'n' < file1 > file1new) before applying awk command. But changing the awk record directly was definitely the solution.

– Rokhaya BA
Apr 18 at 8:27

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

DzXdPna7 HUd juFrwLtCFs,ckFSAGiKCKxc jvRs2,9tJYY,7zeKiWRurpdS,FXe2Az,TeQOu1D6n4,TKKUNcZ2n kj7 DIjaB,jiXv1SDfd

搜尋此網誌

Bsrgvty

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer
1

1 Answer
1

1 Answer
1