How do we know for sure a transliteration is lossless?Answering Tag QuestionsLanguages with alphabets sharing the same basic shapes as ArabicAre there any existing guidelines for romanizing Aynu Itak?How to remove an accent from a language (and what an accent actually is)Conjunctions between complex clauses - which items do they coordinate?What is the purpose of transliteration?
How do I install this weird looking i9 9900K I bought?
Build a matrix from the coordinates of its elements and complete it with zeros
Why did they design new connectors for USB?
Are lances and nets and versatile weapons considered one-handed weapons?
Is lens flare shot organically, or added in post-production?
I want to have a bond with a baby dragon. Can I?
Why does California seem to have much more aggressive Consumer Protection and Safety Legislation?
Why do Russian names transliterated into English have unpronounceable 'k's before 'h's (e.g. 'Mikhail' instead of just 'Mihail')?
Is leave-one-out cross validation known to systematically overestimate error?
What should be done when the theory behind a PhD thesis turns out to be wrong?
Feeling of forcing oneself to do something
Want to publish unpublished work found in an auction storage unit
Is there any plausible in-between of Endotherms and Ectotherms?
If a picture of a screen is a screenshot, what is a video of a screen?
Bridge rectifier outputting 338 volts DC with 120 volts AC input
Does a patron have to know their warlock?
Is using Observer pattern a good idea while building a Chess Game?
Why does UNIX ed not have a prompt by default
Using "disk-only" rim (proper sidewalls) with rim brakes
How Long Should a Hash be to be Absolutely Secure?
Am I obligated to pass on domain knowledge after being let go?
How do oases form in the middle of the desert?
Is American Express widely accepted in Hong Kong?
How can I customize the Touch Bar interfaces for my tremor?
How do we know for sure a transliteration is lossless?
Answering Tag QuestionsLanguages with alphabets sharing the same basic shapes as ArabicAre there any existing guidelines for romanizing Aynu Itak?How to remove an accent from a language (and what an accent actually is)Conjunctions between complex clauses - which items do they coordinate?What is the purpose of transliteration?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
.everyonelovesstackoverflowposition:absolute;height:1px;width:1px;opacity:0;top:0;left:0;pointer-events:none;
Looking at this it says it's lossless (Wylie Transliteration).
ག ga
ང nga
ཉ nya
ན na
What if you had sequences like ནག (ng, or is it naga)? Is it lossless because we can guarantee that every consonant (or consonant bundle as in some of the letters) is separated by a vowel? (I don't really know Tibetan yet, so please excuse my ignorance).
IAST for Sanskrit is another lossless one.
So we have:
त t
ह h
थ th
There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.
तह th
थ th
So if you had this in the original Sanskrit:
थथथथथथथथ
You would maybe transliterate it as:
thththththththth
Then you might do this to go back:
तहतहतहतहतहतहतहतह
Or any of these combos:
थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...
What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?
Is there never a "t + h" sound (t followed by standalone "h") in sanskrit, as opposed to a "th" sound (aspirated t)? What if we say there isn't, but then later we discover one? This is where I'm lost, it seems that such systems aren't really lossless.
Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?
It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)
cross-linguistic transliteration
add a comment
|
Looking at this it says it's lossless (Wylie Transliteration).
ག ga
ང nga
ཉ nya
ན na
What if you had sequences like ནག (ng, or is it naga)? Is it lossless because we can guarantee that every consonant (or consonant bundle as in some of the letters) is separated by a vowel? (I don't really know Tibetan yet, so please excuse my ignorance).
IAST for Sanskrit is another lossless one.
So we have:
त t
ह h
थ th
There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.
तह th
थ th
So if you had this in the original Sanskrit:
थथथथथथथथ
You would maybe transliterate it as:
thththththththth
Then you might do this to go back:
तहतहतहतहतहतहतहतह
Or any of these combos:
थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...
What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?
Is there never a "t + h" sound (t followed by standalone "h") in sanskrit, as opposed to a "th" sound (aspirated t)? What if we say there isn't, but then later we discover one? This is where I'm lost, it seems that such systems aren't really lossless.
Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?
It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)
cross-linguistic transliteration
1
If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?
– curiousdannii
Sep 28 at 13:35
add a comment
|
Looking at this it says it's lossless (Wylie Transliteration).
ག ga
ང nga
ཉ nya
ན na
What if you had sequences like ནག (ng, or is it naga)? Is it lossless because we can guarantee that every consonant (or consonant bundle as in some of the letters) is separated by a vowel? (I don't really know Tibetan yet, so please excuse my ignorance).
IAST for Sanskrit is another lossless one.
So we have:
त t
ह h
थ th
There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.
तह th
थ th
So if you had this in the original Sanskrit:
थथथथथथथथ
You would maybe transliterate it as:
thththththththth
Then you might do this to go back:
तहतहतहतहतहतहतहतह
Or any of these combos:
थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...
What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?
Is there never a "t + h" sound (t followed by standalone "h") in sanskrit, as opposed to a "th" sound (aspirated t)? What if we say there isn't, but then later we discover one? This is where I'm lost, it seems that such systems aren't really lossless.
Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?
It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)
cross-linguistic transliteration
Looking at this it says it's lossless (Wylie Transliteration).
ག ga
ང nga
ཉ nya
ན na
What if you had sequences like ནག (ng, or is it naga)? Is it lossless because we can guarantee that every consonant (or consonant bundle as in some of the letters) is separated by a vowel? (I don't really know Tibetan yet, so please excuse my ignorance).
IAST for Sanskrit is another lossless one.
So we have:
त t
ह h
थ th
There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.
तह th
थ th
So if you had this in the original Sanskrit:
थथथथथथथथ
You would maybe transliterate it as:
thththththththth
Then you might do this to go back:
तहतहतहतहतहतहतहतह
Or any of these combos:
थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...
What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?
Is there never a "t + h" sound (t followed by standalone "h") in sanskrit, as opposed to a "th" sound (aspirated t)? What if we say there isn't, but then later we discover one? This is where I'm lost, it seems that such systems aren't really lossless.
Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?
It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)
cross-linguistic transliteration
cross-linguistic transliteration
edited Sep 28 at 13:52
Lance Pollard
asked Sep 28 at 11:03
Lance PollardLance Pollard
2,3436 silver badges20 bronze badges
2,3436 silver badges20 bronze badges
1
If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?
– curiousdannii
Sep 28 at 13:35
add a comment
|
1
If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?
– curiousdannii
Sep 28 at 13:35
1
1
If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?
– curiousdannii
Sep 28 at 13:35
If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?
– curiousdannii
Sep 28 at 13:35
add a comment
|
3 Answers
3
active
oldest
votes
A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.
Lossless transliteration systems generally have to use one of four methods to stay unambiguous:
- Don't use digraphs at all. Write every phoneme with a single character:
ŋinstead ofng,xinstead ofkh,þinstead ofth, etc. - Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve
hfor digraph use, so thatkh,ch,share unambiguous: there's no such thing as anhon its own. - Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of
/t/,/h/, and/tʰ/—but transcribing themt,h,this unambiguous, since/th/can never occur (depending on your analysis of words like μέθοδος). - Add a special way to disambiguate between the two. Swahili has both
/ⁿg/and/ŋ/; the former is writtenng, the latterng'. The "library transliteration" of Arabic usesthfor/θ/, andt'hfor/th/.
The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.
The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.
The third works great until you discover that your assumptions about what's illegal were wrong! This happened famously in Inuktitut: the orthography was designed with the assumption that the sequence /nŋ/ was illegal, so they use the equivalent of nng for a geminate /ŋŋ/. Except then some lesser-known dialects do have /nŋ/, and they had to retrofit in an awkward solution. Oops.
The fourth is the easiest to retrofit onto an existing system, and is fairly widespread. If you aren't already using the apostrophe for something, it's an easy way to fix pretty much any ambiguities that come up.
1
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
1
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
1
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
2
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
|
show 1 more comment
Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:
Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...
I believe the EWTS variant addresses most of the ambiguities, but transliterating to/from Wylie inherently requires knowledge of Tibetan orthographic rules for identification of root letter and which stacks, prefixes, and first- and second-suffixes are valid. For at least a few three-letter words with no vowel mark, the rule is essentially just a special-case for the particular word.
So indeed, this kind of transliteration system is very limited and suffers from the problems you expected. Other transliteration systems like Romaji (at least as I understand it) don't; the difference is ability to preserve character boundaries unambiguously.
Arguably, such transliteration systems are obsolete in the age of Unicode anyway.
add a comment
|
Sanskrit is lossless. तह is romanized taha, and there is no cluster th distinct from the aspirated consonant tʰ romanized as th, spelled थ. You omitted virama in your spelling of bare "t", i.e. त्.
You can't later discover that there is "t+h" in Sanskrit because there isn't, though you could wonder, how would the Sanskrit grammarians doing fieldwork on Arabic render the cluster "th". Maybe they would write त्ह. It is possible that problems arise in transcribing grammarian metalanguage, which massively violates the rules of Sanskrit. Since I guess you didn't know that there is no t+h cluster in Sanskrit, that relates to how you'd know if a system is lossless – you have to know the target language, and compare the facts of the language to what you know about spelling. I conjecture that North Saami is lossless w.r.t. pronunciation of written words, up to the point of social indeterminacy (are Norwegian u and y adopted into the language with the same vowel or different vowels?).
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "312"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f33674%2fhow-do-we-know-for-sure-a-transliteration-is-lossless%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.
Lossless transliteration systems generally have to use one of four methods to stay unambiguous:
- Don't use digraphs at all. Write every phoneme with a single character:
ŋinstead ofng,xinstead ofkh,þinstead ofth, etc. - Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve
hfor digraph use, so thatkh,ch,share unambiguous: there's no such thing as anhon its own. - Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of
/t/,/h/, and/tʰ/—but transcribing themt,h,this unambiguous, since/th/can never occur (depending on your analysis of words like μέθοδος). - Add a special way to disambiguate between the two. Swahili has both
/ⁿg/and/ŋ/; the former is writtenng, the latterng'. The "library transliteration" of Arabic usesthfor/θ/, andt'hfor/th/.
The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.
The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.
The third works great until you discover that your assumptions about what's illegal were wrong! This happened famously in Inuktitut: the orthography was designed with the assumption that the sequence /nŋ/ was illegal, so they use the equivalent of nng for a geminate /ŋŋ/. Except then some lesser-known dialects do have /nŋ/, and they had to retrofit in an awkward solution. Oops.
The fourth is the easiest to retrofit onto an existing system, and is fairly widespread. If you aren't already using the apostrophe for something, it's an easy way to fix pretty much any ambiguities that come up.
1
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
1
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
1
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
2
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
|
show 1 more comment
A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.
Lossless transliteration systems generally have to use one of four methods to stay unambiguous:
- Don't use digraphs at all. Write every phoneme with a single character:
ŋinstead ofng,xinstead ofkh,þinstead ofth, etc. - Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve
hfor digraph use, so thatkh,ch,share unambiguous: there's no such thing as anhon its own. - Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of
/t/,/h/, and/tʰ/—but transcribing themt,h,this unambiguous, since/th/can never occur (depending on your analysis of words like μέθοδος). - Add a special way to disambiguate between the two. Swahili has both
/ⁿg/and/ŋ/; the former is writtenng, the latterng'. The "library transliteration" of Arabic usesthfor/θ/, andt'hfor/th/.
The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.
The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.
The third works great until you discover that your assumptions about what's illegal were wrong! This happened famously in Inuktitut: the orthography was designed with the assumption that the sequence /nŋ/ was illegal, so they use the equivalent of nng for a geminate /ŋŋ/. Except then some lesser-known dialects do have /nŋ/, and they had to retrofit in an awkward solution. Oops.
The fourth is the easiest to retrofit onto an existing system, and is fairly widespread. If you aren't already using the apostrophe for something, it's an easy way to fix pretty much any ambiguities that come up.
1
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
1
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
1
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
2
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
|
show 1 more comment
A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.
Lossless transliteration systems generally have to use one of four methods to stay unambiguous:
- Don't use digraphs at all. Write every phoneme with a single character:
ŋinstead ofng,xinstead ofkh,þinstead ofth, etc. - Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve
hfor digraph use, so thatkh,ch,share unambiguous: there's no such thing as anhon its own. - Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of
/t/,/h/, and/tʰ/—but transcribing themt,h,this unambiguous, since/th/can never occur (depending on your analysis of words like μέθοδος). - Add a special way to disambiguate between the two. Swahili has both
/ⁿg/and/ŋ/; the former is writtenng, the latterng'. The "library transliteration" of Arabic usesthfor/θ/, andt'hfor/th/.
The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.
The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.
The third works great until you discover that your assumptions about what's illegal were wrong! This happened famously in Inuktitut: the orthography was designed with the assumption that the sequence /nŋ/ was illegal, so they use the equivalent of nng for a geminate /ŋŋ/. Except then some lesser-known dialects do have /nŋ/, and they had to retrofit in an awkward solution. Oops.
The fourth is the easiest to retrofit onto an existing system, and is fairly widespread. If you aren't already using the apostrophe for something, it's an easy way to fix pretty much any ambiguities that come up.
A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.
Lossless transliteration systems generally have to use one of four methods to stay unambiguous:
- Don't use digraphs at all. Write every phoneme with a single character:
ŋinstead ofng,xinstead ofkh,þinstead ofth, etc. - Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve
hfor digraph use, so thatkh,ch,share unambiguous: there's no such thing as anhon its own. - Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of
/t/,/h/, and/tʰ/—but transcribing themt,h,this unambiguous, since/th/can never occur (depending on your analysis of words like μέθοδος). - Add a special way to disambiguate between the two. Swahili has both
/ⁿg/and/ŋ/; the former is writtenng, the latterng'. The "library transliteration" of Arabic usesthfor/θ/, andt'hfor/th/.
The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.
The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.
The third works great until you discover that your assumptions about what's illegal were wrong! This happened famously in Inuktitut: the orthography was designed with the assumption that the sequence /nŋ/ was illegal, so they use the equivalent of nng for a geminate /ŋŋ/. Except then some lesser-known dialects do have /nŋ/, and they had to retrofit in an awkward solution. Oops.
The fourth is the easiest to retrofit onto an existing system, and is fairly widespread. If you aren't already using the apostrophe for something, it's an easy way to fix pretty much any ambiguities that come up.
edited Sep 28 at 17:47
answered Sep 28 at 16:53
DraconisDraconis
28.6k2 gold badges54 silver badges107 bronze badges
28.6k2 gold badges54 silver badges107 bronze badges
1
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
1
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
1
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
2
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
|
show 1 more comment
1
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
1
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
1
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
2
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
1
1
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?
– Lance Pollard
Sep 28 at 17:24
1
1
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.
– Draconis
Sep 28 at 17:25
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.
– Peteris
Sep 29 at 7:41
1
1
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.
– jogloran
Sep 29 at 21:13
2
2
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.
– Draconis
Sep 29 at 21:17
|
show 1 more comment
Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:
Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...
I believe the EWTS variant addresses most of the ambiguities, but transliterating to/from Wylie inherently requires knowledge of Tibetan orthographic rules for identification of root letter and which stacks, prefixes, and first- and second-suffixes are valid. For at least a few three-letter words with no vowel mark, the rule is essentially just a special-case for the particular word.
So indeed, this kind of transliteration system is very limited and suffers from the problems you expected. Other transliteration systems like Romaji (at least as I understand it) don't; the difference is ability to preserve character boundaries unambiguously.
Arguably, such transliteration systems are obsolete in the age of Unicode anyway.
add a comment
|
Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:
Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...
I believe the EWTS variant addresses most of the ambiguities, but transliterating to/from Wylie inherently requires knowledge of Tibetan orthographic rules for identification of root letter and which stacks, prefixes, and first- and second-suffixes are valid. For at least a few three-letter words with no vowel mark, the rule is essentially just a special-case for the particular word.
So indeed, this kind of transliteration system is very limited and suffers from the problems you expected. Other transliteration systems like Romaji (at least as I understand it) don't; the difference is ability to preserve character boundaries unambiguously.
Arguably, such transliteration systems are obsolete in the age of Unicode anyway.
add a comment
|
Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:
Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...
I believe the EWTS variant addresses most of the ambiguities, but transliterating to/from Wylie inherently requires knowledge of Tibetan orthographic rules for identification of root letter and which stacks, prefixes, and first- and second-suffixes are valid. For at least a few three-letter words with no vowel mark, the rule is essentially just a special-case for the particular word.
So indeed, this kind of transliteration system is very limited and suffers from the problems you expected. Other transliteration systems like Romaji (at least as I understand it) don't; the difference is ability to preserve character boundaries unambiguously.
Arguably, such transliteration systems are obsolete in the age of Unicode anyway.
Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:
Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...
I believe the EWTS variant addresses most of the ambiguities, but transliterating to/from Wylie inherently requires knowledge of Tibetan orthographic rules for identification of root letter and which stacks, prefixes, and first- and second-suffixes are valid. For at least a few three-letter words with no vowel mark, the rule is essentially just a special-case for the particular word.
So indeed, this kind of transliteration system is very limited and suffers from the problems you expected. Other transliteration systems like Romaji (at least as I understand it) don't; the difference is ability to preserve character boundaries unambiguously.
Arguably, such transliteration systems are obsolete in the age of Unicode anyway.
answered Sep 29 at 3:52
R..R..
1212 bronze badges
1212 bronze badges
add a comment
|
add a comment
|
Sanskrit is lossless. तह is romanized taha, and there is no cluster th distinct from the aspirated consonant tʰ romanized as th, spelled थ. You omitted virama in your spelling of bare "t", i.e. त्.
You can't later discover that there is "t+h" in Sanskrit because there isn't, though you could wonder, how would the Sanskrit grammarians doing fieldwork on Arabic render the cluster "th". Maybe they would write त्ह. It is possible that problems arise in transcribing grammarian metalanguage, which massively violates the rules of Sanskrit. Since I guess you didn't know that there is no t+h cluster in Sanskrit, that relates to how you'd know if a system is lossless – you have to know the target language, and compare the facts of the language to what you know about spelling. I conjecture that North Saami is lossless w.r.t. pronunciation of written words, up to the point of social indeterminacy (are Norwegian u and y adopted into the language with the same vowel or different vowels?).
add a comment
|
Sanskrit is lossless. तह is romanized taha, and there is no cluster th distinct from the aspirated consonant tʰ romanized as th, spelled थ. You omitted virama in your spelling of bare "t", i.e. त्.
You can't later discover that there is "t+h" in Sanskrit because there isn't, though you could wonder, how would the Sanskrit grammarians doing fieldwork on Arabic render the cluster "th". Maybe they would write त्ह. It is possible that problems arise in transcribing grammarian metalanguage, which massively violates the rules of Sanskrit. Since I guess you didn't know that there is no t+h cluster in Sanskrit, that relates to how you'd know if a system is lossless – you have to know the target language, and compare the facts of the language to what you know about spelling. I conjecture that North Saami is lossless w.r.t. pronunciation of written words, up to the point of social indeterminacy (are Norwegian u and y adopted into the language with the same vowel or different vowels?).
add a comment
|
Sanskrit is lossless. तह is romanized taha, and there is no cluster th distinct from the aspirated consonant tʰ romanized as th, spelled थ. You omitted virama in your spelling of bare "t", i.e. त्.
You can't later discover that there is "t+h" in Sanskrit because there isn't, though you could wonder, how would the Sanskrit grammarians doing fieldwork on Arabic render the cluster "th". Maybe they would write त्ह. It is possible that problems arise in transcribing grammarian metalanguage, which massively violates the rules of Sanskrit. Since I guess you didn't know that there is no t+h cluster in Sanskrit, that relates to how you'd know if a system is lossless – you have to know the target language, and compare the facts of the language to what you know about spelling. I conjecture that North Saami is lossless w.r.t. pronunciation of written words, up to the point of social indeterminacy (are Norwegian u and y adopted into the language with the same vowel or different vowels?).
Sanskrit is lossless. तह is romanized taha, and there is no cluster th distinct from the aspirated consonant tʰ romanized as th, spelled थ. You omitted virama in your spelling of bare "t", i.e. त्.
You can't later discover that there is "t+h" in Sanskrit because there isn't, though you could wonder, how would the Sanskrit grammarians doing fieldwork on Arabic render the cluster "th". Maybe they would write त्ह. It is possible that problems arise in transcribing grammarian metalanguage, which massively violates the rules of Sanskrit. Since I guess you didn't know that there is no t+h cluster in Sanskrit, that relates to how you'd know if a system is lossless – you have to know the target language, and compare the facts of the language to what you know about spelling. I conjecture that North Saami is lossless w.r.t. pronunciation of written words, up to the point of social indeterminacy (are Norwegian u and y adopted into the language with the same vowel or different vowels?).
answered Sep 28 at 16:06
user6726user6726
42.1k1 gold badge28 silver badges84 bronze badges
42.1k1 gold badge28 silver badges84 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Linguistics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f33674%2fhow-do-we-know-for-sure-a-transliteration-is-lossless%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?
– curiousdannii
Sep 28 at 13:35