What does “fetching by region is not available for SAM files” mean? The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)How to safely and efficiently convert subset of bam to fastq?How to you find read statistics for bam files with terminal mismatch?Pepsin digest (cleavage) does not work using RE?running a samtools command for multiple bam files from 1000 genomes projectSAM format: Does the BAM “Integer or numeric array” field no longer exist? Why?python does not quit when the input file is too bigconvert supplementary reads to primary in sam or bamI have 23andme text files and would like to convert to SAM/BAM formatRunning htseq-count over BAM filesAccess base aligned to particular reference position
What was the last x86 CPU that did not have the x87 floating-point unit built in?
Road tyres vs "Street" tyres for charity ride on MTB Tandem
Cooking pasta in a water boiler
What is this lever in Argentinian toilets?
Scientific Reports - Significant Figures
When did F become S in typeography, and why?
system() function string length limit
ELI5: Why do they say that Israel would have been the fourth country to land a spacecraft on the Moon and why do they call it low cost?
Who or what is the being for whom Being is a question for Heidegger?
Why is superheterodyning better than direct conversion?
Match Roman Numerals
Is above average number of years spent on PhD considered a red flag in future academia or industry positions?
Do warforged have souls?
How can I define good in a religion that claims no moral authority?
Python - Fishing Simulator
Take groceries in checked luggage
Wall plug outlet change
First use of “packing” as in carrying a gun
Did God make two great lights or did He make the great light two?
Why does the Event Horizon Telescope (EHT) not include telescopes from Africa, Asia or Australia?
Does Parliament need to approve the new Brexit delay to 31 October 2019?
Can the prologue be the backstory of your main character?
What's the point in a preamp?
Single author papers against my advisor's will?
What does “fetching by region is not available for SAM files” mean?
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)How to safely and efficiently convert subset of bam to fastq?How to you find read statistics for bam files with terminal mismatch?Pepsin digest (cleavage) does not work using RE?running a samtools command for multiple bam files from 1000 genomes projectSAM format: Does the BAM “Integer or numeric array” field no longer exist? Why?python does not quit when the input file is too bigconvert supplementary reads to primary in sam or bamI have 23andme text files and would like to convert to SAM/BAM formatRunning htseq-count over BAM filesAccess base aligned to particular reference position
$begingroup$
I am used to gzip
/biopython
solutions when dealing with sequencing data, but now I wish to switch to more elegant pysam
. So I looked at the manual, but ran into quite bizarre troubles with the first couple of lines using my bam file
import pysam
samfile = pysam.AlignmentFile("3_Tms_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tms_b3v08_scaf000159'):
print(read)
samfile.close()
returns ValueError: fetching by region is not available for SAM files
. Well, the file is bam
. I tried to google the error but the only hits I found are the lines in the source code of pysam that check if the file is bam/cram or sam, so somehow pysam
thinks that my bam is a sam. How can I convince it otherwise? I have also noticed that the manual is for python 2.7, that's maybe where the problem comes from...
If you wish to reproduce my problem, here is my bam file.
python bam pysam
$endgroup$
add a comment |
$begingroup$
I am used to gzip
/biopython
solutions when dealing with sequencing data, but now I wish to switch to more elegant pysam
. So I looked at the manual, but ran into quite bizarre troubles with the first couple of lines using my bam file
import pysam
samfile = pysam.AlignmentFile("3_Tms_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tms_b3v08_scaf000159'):
print(read)
samfile.close()
returns ValueError: fetching by region is not available for SAM files
. Well, the file is bam
. I tried to google the error but the only hits I found are the lines in the source code of pysam that check if the file is bam/cram or sam, so somehow pysam
thinks that my bam is a sam. How can I convince it otherwise? I have also noticed that the manual is for python 2.7, that's maybe where the problem comes from...
If you wish to reproduce my problem, here is my bam file.
python bam pysam
$endgroup$
add a comment |
$begingroup$
I am used to gzip
/biopython
solutions when dealing with sequencing data, but now I wish to switch to more elegant pysam
. So I looked at the manual, but ran into quite bizarre troubles with the first couple of lines using my bam file
import pysam
samfile = pysam.AlignmentFile("3_Tms_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tms_b3v08_scaf000159'):
print(read)
samfile.close()
returns ValueError: fetching by region is not available for SAM files
. Well, the file is bam
. I tried to google the error but the only hits I found are the lines in the source code of pysam that check if the file is bam/cram or sam, so somehow pysam
thinks that my bam is a sam. How can I convince it otherwise? I have also noticed that the manual is for python 2.7, that's maybe where the problem comes from...
If you wish to reproduce my problem, here is my bam file.
python bam pysam
$endgroup$
I am used to gzip
/biopython
solutions when dealing with sequencing data, but now I wish to switch to more elegant pysam
. So I looked at the manual, but ran into quite bizarre troubles with the first couple of lines using my bam file
import pysam
samfile = pysam.AlignmentFile("3_Tms_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tms_b3v08_scaf000159'):
print(read)
samfile.close()
returns ValueError: fetching by region is not available for SAM files
. Well, the file is bam
. I tried to google the error but the only hits I found are the lines in the source code of pysam that check if the file is bam/cram or sam, so somehow pysam
thinks that my bam is a sam. How can I convince it otherwise? I have also noticed that the manual is for python 2.7, that's maybe where the problem comes from...
If you wish to reproduce my problem, here is my bam file.
python bam pysam
python bam pysam
edited 2 days ago
terdon♦
4,7752830
4,7752830
asked Apr 10 at 15:51
Kamil S JaronKamil S Jaron
2,942742
2,942742
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Your 3_Tms_1_mapped.bam file, despite its filename extension, is in fact a bgzipped SAM file. You can verify this using htsfile
, which is a small utility packaged with HTSlib:
$ htsfile 3_Tms_1_mapped.bam
3_Tms_1_mapped.bam: SAM version 1.3 BGZF-compressed sequence data
(For files that really are in BAM format, it reports BAM version 1 compressed sequence data.)
So the error message is accurate in this case.
$endgroup$
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
1
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
1
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
add a comment |
$begingroup$
That isn't actually a bam file as John Marshall figured out. I am keeping the rest of my answer since it could be useful to someone else, but the issue here was that you had a compressed (bgzipped) sam file and not an actual bam file and that's why you were getting that error. When I sorted your file in preparation for indexing it, I converted to a bam which is why the rest of my answer worked.
You don't have the index file for the bam file you're using. I used this script on the file you linked to (changing the name so that it corresponds to the right file and a contig in that file):
#!/usr/bin/env python3
import pysam
samfile = pysam.AlignmentFile("3_Tce_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
print(read)
samfile.close()
The directory I ran it in had:
$ ls 3*
3_Tce_1_mapped.bam
And I got the error you described:
$ foo.py
Traceback (most recent call last):
File "/home/terdon/scripts/foo.py", line 5, in <module>
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
File "pysam/libcalignmentfile.pyx", line 1107, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetching by region is not available for SAM files
However, indexing the bam file fixed it:
$ samtools sort 3_Tce_1_mapped.bam > 3_Tce_1_mapped.sorted.bam
$ mv 3_Tce_1_mapped.sorted.bam 3_Tce_1_mapped.bam
$ samtools index 3_Tce_1_mapped.bam
$ ls 3*
3_Tce_1_mapped.bam 3_Tce_1_mapped.bam.bai
$ foo.py | wc
227 2724 16725
So just sort and index your files before attempting to seek in them. Which makes sense since the index's job is primarily to allow seeking.
$endgroup$
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.
$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
1
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
|
show 4 more comments
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "676"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7436%2fwhat-does-fetching-by-region-is-not-available-for-sam-files-mean%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Your 3_Tms_1_mapped.bam file, despite its filename extension, is in fact a bgzipped SAM file. You can verify this using htsfile
, which is a small utility packaged with HTSlib:
$ htsfile 3_Tms_1_mapped.bam
3_Tms_1_mapped.bam: SAM version 1.3 BGZF-compressed sequence data
(For files that really are in BAM format, it reports BAM version 1 compressed sequence data.)
So the error message is accurate in this case.
$endgroup$
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
1
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
1
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
add a comment |
$begingroup$
Your 3_Tms_1_mapped.bam file, despite its filename extension, is in fact a bgzipped SAM file. You can verify this using htsfile
, which is a small utility packaged with HTSlib:
$ htsfile 3_Tms_1_mapped.bam
3_Tms_1_mapped.bam: SAM version 1.3 BGZF-compressed sequence data
(For files that really are in BAM format, it reports BAM version 1 compressed sequence data.)
So the error message is accurate in this case.
$endgroup$
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
1
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
1
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
add a comment |
$begingroup$
Your 3_Tms_1_mapped.bam file, despite its filename extension, is in fact a bgzipped SAM file. You can verify this using htsfile
, which is a small utility packaged with HTSlib:
$ htsfile 3_Tms_1_mapped.bam
3_Tms_1_mapped.bam: SAM version 1.3 BGZF-compressed sequence data
(For files that really are in BAM format, it reports BAM version 1 compressed sequence data.)
So the error message is accurate in this case.
$endgroup$
Your 3_Tms_1_mapped.bam file, despite its filename extension, is in fact a bgzipped SAM file. You can verify this using htsfile
, which is a small utility packaged with HTSlib:
$ htsfile 3_Tms_1_mapped.bam
3_Tms_1_mapped.bam: SAM version 1.3 BGZF-compressed sequence data
(For files that really are in BAM format, it reports BAM version 1 compressed sequence data.)
So the error message is accurate in this case.
edited 2 days ago
answered Apr 10 at 20:43
John MarshallJohn Marshall
56826
56826
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
1
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
1
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
add a comment |
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
1
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
1
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
$begingroup$
Hmm, does it mean that this link is wrong? biopython.org/DIST/docs/api/Bio.bgzf-module.html I sort of thought that it's the same thing...
$endgroup$
– Kamil S Jaron
Apr 10 at 21:52
1
1
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
No. (Was there a particular part of that you think might be wrong?) But there is a difference between BGZF-compressing the plain text format, and a BAM file (whose decompressed underlying stream is a bespoke binary format that is different from the plain SAM text).
$endgroup$
– John Marshall
Apr 10 at 21:58
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
$begingroup$
I think I get it now. I got confused by the aims to this module, they write "In addition to being required for random access to and writing of BAM files", so I got an impression that I can simply use this module to write BAM files. This was actually really helpful answer, thanks!
$endgroup$
– Kamil S Jaron
2 days ago
1
1
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
$begingroup$
@KamilSJaron you might want to consider accepting this answer instead of mine since this was the real issue.
$endgroup$
– terdon♦
2 days ago
add a comment |
$begingroup$
That isn't actually a bam file as John Marshall figured out. I am keeping the rest of my answer since it could be useful to someone else, but the issue here was that you had a compressed (bgzipped) sam file and not an actual bam file and that's why you were getting that error. When I sorted your file in preparation for indexing it, I converted to a bam which is why the rest of my answer worked.
You don't have the index file for the bam file you're using. I used this script on the file you linked to (changing the name so that it corresponds to the right file and a contig in that file):
#!/usr/bin/env python3
import pysam
samfile = pysam.AlignmentFile("3_Tce_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
print(read)
samfile.close()
The directory I ran it in had:
$ ls 3*
3_Tce_1_mapped.bam
And I got the error you described:
$ foo.py
Traceback (most recent call last):
File "/home/terdon/scripts/foo.py", line 5, in <module>
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
File "pysam/libcalignmentfile.pyx", line 1107, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetching by region is not available for SAM files
However, indexing the bam file fixed it:
$ samtools sort 3_Tce_1_mapped.bam > 3_Tce_1_mapped.sorted.bam
$ mv 3_Tce_1_mapped.sorted.bam 3_Tce_1_mapped.bam
$ samtools index 3_Tce_1_mapped.bam
$ ls 3*
3_Tce_1_mapped.bam 3_Tce_1_mapped.bam.bai
$ foo.py | wc
227 2724 16725
So just sort and index your files before attempting to seek in them. Which makes sense since the index's job is primarily to allow seeking.
$endgroup$
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.
$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
1
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
|
show 4 more comments
$begingroup$
That isn't actually a bam file as John Marshall figured out. I am keeping the rest of my answer since it could be useful to someone else, but the issue here was that you had a compressed (bgzipped) sam file and not an actual bam file and that's why you were getting that error. When I sorted your file in preparation for indexing it, I converted to a bam which is why the rest of my answer worked.
You don't have the index file for the bam file you're using. I used this script on the file you linked to (changing the name so that it corresponds to the right file and a contig in that file):
#!/usr/bin/env python3
import pysam
samfile = pysam.AlignmentFile("3_Tce_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
print(read)
samfile.close()
The directory I ran it in had:
$ ls 3*
3_Tce_1_mapped.bam
And I got the error you described:
$ foo.py
Traceback (most recent call last):
File "/home/terdon/scripts/foo.py", line 5, in <module>
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
File "pysam/libcalignmentfile.pyx", line 1107, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetching by region is not available for SAM files
However, indexing the bam file fixed it:
$ samtools sort 3_Tce_1_mapped.bam > 3_Tce_1_mapped.sorted.bam
$ mv 3_Tce_1_mapped.sorted.bam 3_Tce_1_mapped.bam
$ samtools index 3_Tce_1_mapped.bam
$ ls 3*
3_Tce_1_mapped.bam 3_Tce_1_mapped.bam.bai
$ foo.py | wc
227 2724 16725
So just sort and index your files before attempting to seek in them. Which makes sense since the index's job is primarily to allow seeking.
$endgroup$
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.
$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
1
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
|
show 4 more comments
$begingroup$
That isn't actually a bam file as John Marshall figured out. I am keeping the rest of my answer since it could be useful to someone else, but the issue here was that you had a compressed (bgzipped) sam file and not an actual bam file and that's why you were getting that error. When I sorted your file in preparation for indexing it, I converted to a bam which is why the rest of my answer worked.
You don't have the index file for the bam file you're using. I used this script on the file you linked to (changing the name so that it corresponds to the right file and a contig in that file):
#!/usr/bin/env python3
import pysam
samfile = pysam.AlignmentFile("3_Tce_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
print(read)
samfile.close()
The directory I ran it in had:
$ ls 3*
3_Tce_1_mapped.bam
And I got the error you described:
$ foo.py
Traceback (most recent call last):
File "/home/terdon/scripts/foo.py", line 5, in <module>
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
File "pysam/libcalignmentfile.pyx", line 1107, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetching by region is not available for SAM files
However, indexing the bam file fixed it:
$ samtools sort 3_Tce_1_mapped.bam > 3_Tce_1_mapped.sorted.bam
$ mv 3_Tce_1_mapped.sorted.bam 3_Tce_1_mapped.bam
$ samtools index 3_Tce_1_mapped.bam
$ ls 3*
3_Tce_1_mapped.bam 3_Tce_1_mapped.bam.bai
$ foo.py | wc
227 2724 16725
So just sort and index your files before attempting to seek in them. Which makes sense since the index's job is primarily to allow seeking.
$endgroup$
That isn't actually a bam file as John Marshall figured out. I am keeping the rest of my answer since it could be useful to someone else, but the issue here was that you had a compressed (bgzipped) sam file and not an actual bam file and that's why you were getting that error. When I sorted your file in preparation for indexing it, I converted to a bam which is why the rest of my answer worked.
You don't have the index file for the bam file you're using. I used this script on the file you linked to (changing the name so that it corresponds to the right file and a contig in that file):
#!/usr/bin/env python3
import pysam
samfile = pysam.AlignmentFile("3_Tce_1_mapped.bam", "rb")
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
print(read)
samfile.close()
The directory I ran it in had:
$ ls 3*
3_Tce_1_mapped.bam
And I got the error you described:
$ foo.py
Traceback (most recent call last):
File "/home/terdon/scripts/foo.py", line 5, in <module>
for read in samfile.fetch('3_Tce_b3v08_scaf005149'):
File "pysam/libcalignmentfile.pyx", line 1107, in pysam.libcalignmentfile.AlignmentFile.fetch
ValueError: fetching by region is not available for SAM files
However, indexing the bam file fixed it:
$ samtools sort 3_Tce_1_mapped.bam > 3_Tce_1_mapped.sorted.bam
$ mv 3_Tce_1_mapped.sorted.bam 3_Tce_1_mapped.bam
$ samtools index 3_Tce_1_mapped.bam
$ ls 3*
3_Tce_1_mapped.bam 3_Tce_1_mapped.bam.bai
$ foo.py | wc
227 2724 16725
So just sort and index your files before attempting to seek in them. Which makes sense since the index's job is primarily to allow seeking.
edited 2 days ago
answered Apr 10 at 16:10
terdon♦terdon
4,7752830
4,7752830
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.
$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
1
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
|
show 4 more comments
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.
$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
1
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
I was thinking about indexing, but then I thought that if that would be the problem pysam would complain about sorting/indexing, not about file being sam. Do yo think it would be worth opening them a ticket to make the error message bit more meaningful?
$endgroup$
– Kamil S Jaron
Apr 10 at 18:37
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (
for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
Seeking, i.e. random access, requires an index. Note that this is not the only access pattern in pysam. You can access reads in a streaming fashion (
for read in samfile:
) without sorting, and you can access read pileups base-by-base (for column in samfile.pileup('chr1')
), which does require an index if I understand correctly.$endgroup$
– Daniel Standage
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
$begingroup$
@KamilSJaron probably, yes. It obviously caused you some pain, and I'm sure you won't be alone. I only thought of this because I have seen similar cryptic errors and was wondering how it could seek without an index anyway.
$endgroup$
– terdon♦
Apr 10 at 18:40
1
1
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@KamilSJaron Yes, I think it's an unhelpful error message, and it has caused me confusion in the past as well. If you open a ticket on Github I'd be happy to "pile on." :-)
$endgroup$
– Daniel Standage
Apr 10 at 18:41
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
$begingroup$
@DanielStandage I am this close to suspending you for that horrible pun! I would too if I could stop chuckling for long enough!
$endgroup$
– terdon♦
Apr 10 at 18:42
|
show 4 more comments
Thanks for contributing an answer to Bioinformatics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fbioinformatics.stackexchange.com%2fquestions%2f7436%2fwhat-does-fetching-by-region-is-not-available-for-sam-files-mean%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown