How do we know for sure a transliteration is lossless?Answering Tag QuestionsLanguages with alphabets sharing the same basic shapes as ArabicAre there any existing guidelines for romanizing Aynu Itak?How to remove an accent from a language (and what an accent actually is)Conjunctions between complex clauses - which items do they coordinate?What is the purpose of transliteration?

How do I install this weird looking i9 9900K I bought?

Build a matrix from the coordinates of its elements and complete it with zeros

Why did they design new connectors for USB?

Are lances and nets and versatile weapons considered one-handed weapons?

Is lens flare shot organically, or added in post-production?

I want to have a bond with a baby dragon. Can I?

Why does California seem to have much more aggressive Consumer Protection and Safety Legislation?

Why do Russian names transliterated into English have unpronounceable 'k's before 'h's (e.g. 'Mikhail' instead of just 'Mihail')?

Is leave-one-out cross validation known to systematically overestimate error?

What should be done when the theory behind a PhD thesis turns out to be wrong?

Feeling of forcing oneself to do something

Want to publish unpublished work found in an auction storage unit

Is there any plausible in-between of Endotherms and Ectotherms?

If a picture of a screen is a screenshot, what is a video of a screen?

Bridge rectifier outputting 338 volts DC with 120 volts AC input

Does a patron have to know their warlock?

Is using Observer pattern a good idea while building a Chess Game?

Why does UNIX ed not have a prompt by default

Using "disk-only" rim (proper sidewalls) with rim brakes

How Long Should a Hash be to be Absolutely Secure?

Am I obligated to pass on domain knowledge after being let go?

How do oases form in the middle of the desert?

Is American Express widely accepted in Hong Kong?

How can I customize the Touch Bar interfaces for my tremor?

How do we know for sure a transliteration is lossless?

Answering Tag QuestionsLanguages with alphabets sharing the same basic shapes as ArabicAre there any existing guidelines for romanizing Aynu Itak?How to remove an accent from a language (and what an accent actually is)Conjunctions between complex clauses - which items do they coordinate?What is the purpose of transliteration?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

.everyonelovesstackoverflowposition:absolute;height:1px;width:1px;opacity:0;top:0;left:0;pointer-events:none;

Looking at this it says it's lossless (Wylie Transliteration).

ག ga
ང nga
ཉ nya
ན na

What if you had sequences like ནག (ng, or is it naga)? Is it lossless because we can guarantee that every consonant (or consonant bundle as in some of the letters) is separated by a vowel? (I don't really know Tibetan yet, so please excuse my ignorance).

IAST for Sanskrit is another lossless one.

So we have:

त t
ह h
थ th

There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.

तह th
थ th

So if you had this in the original Sanskrit:

थथथथथथथथ

You would maybe transliterate it as:

thththththththth

Then you might do this to go back:

तहतहतहतहतहतहतहतह

Or any of these combos:

थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...

What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?

Is there never a "t + h" sound (t followed by standalone "h") in sanskrit, as opposed to a "th" sound (aspirated t)? What if we say there isn't, but then later we discover one? This is where I'm lost, it seems that such systems aren't really lossless.

Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?

It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)

edited Sep 28 at 13:52

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

1

If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?

– curiousdannii
Sep 28 at 13:35

add a comment
|

Looking at this it says it's lossless (Wylie Transliteration).

ག ga
ང nga
ཉ nya
ན na

IAST for Sanskrit is another lossless one.

So we have:

त t
ह h
थ th

There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.

तह th
थ th

So if you had this in the original Sanskrit:

थथथथथथथथ

You would maybe transliterate it as:

thththththththth

Then you might do this to go back:

तहतहतहतहतहतहतहतह

Or any of these combos:

थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...

What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?

Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?

It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)

edited Sep 28 at 13:52

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

1

If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?

– curiousdannii
Sep 28 at 13:35

add a comment
|

Looking at this it says it's lossless (Wylie Transliteration).

ག ga
ང nga
ཉ nya
ན na

IAST for Sanskrit is another lossless one.

So we have:

त t
ह h
थ th

There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.

तह th
थ th

So if you had this in the original Sanskrit:

थथथथथथथथ

You would maybe transliterate it as:

thththththththth

Then you might do this to go back:

तहतहतहतहतहतहतहतह

Or any of these combos:

थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...

What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?

Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?

It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)

edited Sep 28 at 13:52

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

Looking at this it says it's lossless (Wylie Transliteration).

ག ga
ང nga
ཉ nya
ན na

IAST for Sanskrit is another lossless one.

So we have:

त t
ह h
थ th

There's many more that have this (seemingly) same problem. You can write the same thing multiple ways.

तह th
थ th

So if you had this in the original Sanskrit:

थथथथथथथथ

You would maybe transliterate it as:

thththththththth

Then you might do this to go back:

तहतहतहतहतहतहतहतह

Or any of these combos:

थतहतहतहतहतहतहथ
थतहतहतहतहतहथथ
...

What you end up with is not necessarily what you started with. How do they say this is lossless? Does it have the same property that every consonant/letter is separated by a vowel?

Can one explain how these are actually lossless? How can you prove that it's lossless, maybe not so far as a mathematical proof, but a thought experiment or something perhaps?

It would also be nice to know which languages have lossless transliterations out there available, I would like to check them out :)

cross-linguistic transliteration

edited Sep 28 at 13:52

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

edited Sep 28 at 13:52

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

edited Sep 28 at 13:52

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

asked Sep 28 at 11:03

Lance Pollard

2,3436 silver badges20 bronze badges

1

If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?

– curiousdannii
Sep 28 at 13:35

add a comment
|

1

If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?

– curiousdannii
Sep 28 at 13:35

If there were no consonant clusters it would be simple, but I'm seeing pages say that Sanskrit does have some clusters, so a lossless transcription would be tricky. Maybe the clusters you identified just don't occur?

– curiousdannii
Sep 28 at 13:35

add a comment
|

3 Answers
3

active

oldest

votes

A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.

Lossless transliteration systems generally have to use one of four methods to stay unambiguous:

Don't use digraphs at all. Write every phoneme with a single character: ŋ instead of ng, x instead of kh, þ instead of th, etc.

Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve h for digraph use, so that kh, ch, sh are unambiguous: there's no such thing as an h on its own.

Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of /t/, /h/, and /tʰ/—but transcribing them t, h, th is unambiguous, since /th/ can never occur (depending on your analysis of words like μέθοδος).

Add a special way to disambiguate between the two. Swahili has both /ⁿg/ and /ŋ/; the former is written ng, the latter ng'. The "library transliteration" of Arabic uses th for /θ/, and t'h for /th/.

The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.

The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.

The third works great until you discover that your assumptions about what's illegal were wrong! This happened famously in Inuktitut: the orthography was designed with the assumption that the sequence /nŋ/ was illegal, so they use the equivalent of nng for a geminate /ŋŋ/. Except then some lesser-known dialects do have /nŋ/, and they had to retrofit in an awkward solution. Oops.

The fourth is the easiest to retrofit onto an existing system, and is fairly widespread. If you aren't already using the apostrophe for something, it's an easy way to fix pretty much any ambiguities that come up.

edited Sep 28 at 17:47

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

1

#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?

– Lance Pollard
Sep 28 at 17:24

1

@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.

– Draconis
Sep 28 at 17:25

@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.

– Peteris
Sep 29 at 7:41

1

If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.

– jogloran
Sep 29 at 21:13

2

@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.

– Draconis
Sep 29 at 21:17

|
show 1 more comment

Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:

Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...

I believe the EWTS variant addresses most of the ambiguities, but transliterating to/from Wylie inherently requires knowledge of Tibetan orthographic rules for identification of root letter and which stacks, prefixes, and first- and second-suffixes are valid. For at least a few three-letter words with no vowel mark, the rule is essentially just a special-case for the particular word.

So indeed, this kind of transliteration system is very limited and suffers from the problems you expected. Other transliteration systems like Romaji (at least as I understand it) don't; the difference is ability to preserve character boundaries unambiguously.

Arguably, such transliteration systems are obsolete in the age of Unicode anyway.

answered Sep 29 at 3:52

R..

1212 bronze badges

add a comment
|

Sanskrit is lossless. तह is romanized taha, and there is no cluster th distinct from the aspirated consonant tʰ romanized as th, spelled थ. You omitted virama in your spelling of bare "t", i.e. त्.
You can't later discover that there is "t+h" in Sanskrit because there isn't, though you could wonder, how would the Sanskrit grammarians doing fieldwork on Arabic render the cluster "th". Maybe they would write त्ह. It is possible that problems arise in transcribing grammarian metalanguage, which massively violates the rules of Sanskrit. Since I guess you didn't know that there is no t+h cluster in Sanskrit, that relates to how you'd know if a system is lossless – you have to know the target language, and compare the facts of the language to what you know about spelling. I conjecture that North Saami is lossless w.r.t. pronunciation of written words, up to the point of social indeterminacy (are Norwegian u and y adopted into the language with the same vowel or different vowels?).

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

add a comment
|

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "312"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2flinguistics.stackexchange.com%2fquestions%2f33674%2fhow-do-we-know-for-sure-a-transliteration-is-lossless%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.

Lossless transliteration systems generally have to use one of four methods to stay unambiguous:

Don't use digraphs at all. Write every phoneme with a single character: ŋ instead of ng, x instead of kh, þ instead of th, etc.

Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve h for digraph use, so that kh, ch, sh are unambiguous: there's no such thing as an h on its own.

Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of /t/, /h/, and /tʰ/—but transcribing them t, h, th is unambiguous, since /th/ can never occur (depending on your analysis of words like μέθοδος).

Add a special way to disambiguate between the two. Swahili has both /ⁿg/ and /ŋ/; the former is written ng, the latter ng'. The "library transliteration" of Arabic uses th for /θ/, and t'h for /th/.

The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.

The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.

edited Sep 28 at 17:47

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

1

#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?

– Lance Pollard
Sep 28 at 17:24

1

@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.

– Draconis
Sep 28 at 17:25

@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.

– Peteris
Sep 29 at 7:41

1

If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.

– jogloran
Sep 29 at 21:13

2

@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.

– Draconis
Sep 29 at 21:17

|
show 1 more comment

A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.

Lossless transliteration systems generally have to use one of four methods to stay unambiguous:

Don't use digraphs at all. Write every phoneme with a single character: ŋ instead of ng, x instead of kh, þ instead of th, etc.

Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve h for digraph use, so that kh, ch, sh are unambiguous: there's no such thing as an h on its own.

Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of /t/, /h/, and /tʰ/—but transcribing them t, h, th is unambiguous, since /th/ can never occur (depending on your analysis of words like μέθοδος).

Add a special way to disambiguate between the two. Swahili has both /ⁿg/ and /ŋ/; the former is written ng, the latter ng'. The "library transliteration" of Arabic uses th for /θ/, and t'h for /th/.

The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.

The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.

edited Sep 28 at 17:47

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

1

#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?

– Lance Pollard
Sep 28 at 17:24

1

@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.

– Draconis
Sep 28 at 17:25

@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.

– Peteris
Sep 29 at 7:41

1

If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.

– jogloran
Sep 29 at 21:13

2

@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.

– Draconis
Sep 29 at 21:17

|
show 1 more comment

A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.

Lossless transliteration systems generally have to use one of four methods to stay unambiguous:

Don't use digraphs at all. Write every phoneme with a single character: ŋ instead of ng, x instead of kh, þ instead of th, etc.

Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve h for digraph use, so that kh, ch, sh are unambiguous: there's no such thing as an h on its own.

Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of /t/, /h/, and /tʰ/—but transcribing them t, h, th is unambiguous, since /th/ can never occur (depending on your analysis of words like μέθοδος).

Add a special way to disambiguate between the two. Swahili has both /ⁿg/ and /ŋ/; the former is written ng, the latter ng'. The "library transliteration" of Arabic uses th for /θ/, and t'h for /th/.

The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.

The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.

edited Sep 28 at 17:47

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

A transliteration system is usually either designed to be lossless, or not. To know whether it is or not, you have to know the target language.

Lossless transliteration systems generally have to use one of four methods to stay unambiguous:

Don't use digraphs at all. Write every phoneme with a single character: ŋ instead of ng, x instead of kh, þ instead of th, etc.

Use a letter for digraphs that never appears elsewhere. Some transliteration systems for Russian reserve h for digraph use, so that kh, ch, sh are unambiguous: there's no such thing as an h on its own.

Use digraphs that are illegal consonant clusters in the language. Ancient Greek (Attic dialect at least) had all of /t/, /h/, and /tʰ/—but transcribing them t, h, th is unambiguous, since /th/ can never occur (depending on your analysis of words like μέθοδος).

Add a special way to disambiguate between the two. Swahili has both /ⁿg/ and /ŋ/; the former is written ng, the latter ng'. The "library transliteration" of Arabic uses th for /θ/, and t'h for /th/.

The first is sometimes considered cleanest, but tends to very quickly exceed the limits of ASCII.

The second works well for certain digraphs, less well for others: a language with /ŋ/ probably also has /n/ and /g/ already.

edited Sep 28 at 17:47

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

edited Sep 28 at 17:47

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

answered Sep 28 at 16:53

Draconis

28.6k2 gold badges54 silver badges107 bronze badges

1

#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?

– Lance Pollard
Sep 28 at 17:24

1

@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.

– Draconis
Sep 28 at 17:25

@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.

– Peteris
Sep 29 at 7:41

1

If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.

– jogloran
Sep 29 at 21:13

2

@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.

– Draconis
Sep 29 at 21:17

|
show 1 more comment

1

#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?

– Lance Pollard
Sep 28 at 17:24

1

@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.

– Draconis
Sep 28 at 17:25

@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.

– Peteris
Sep 29 at 7:41

1

If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.

– jogloran
Sep 29 at 21:13

2

@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.

– Draconis
Sep 29 at 21:17

#3 is exactly what I was thinking would be the problem of every transliteration. How can you guarantee this is the case for Sanskrit and Tibetan?

– Lance Pollard
Sep 28 at 17:24

@LancePollard Be a fluent/native speaker, ideally. The problem with Inuktitut is that they wanted a system that would work for every dialect, but didn't realize a couple outlying dialects allowed this particular consonant cluster—the people working on it were fluent, and knew well that there was no such cluster in Nunavut.

– Draconis
Sep 28 at 17:25

@LancePollard in the case of languages with extensive written corpora (IMHO Sanskrit might qualify) and sizeable dictionaries you can test and verify such assumptions with simple string search/regular expressions.

– Peteris
Sep 29 at 7:41

If anyone's interested, the "awkward solution" @Draconis refers to is described here: tusaalanga.ca/node/2520. Essentially they replaced the digraph with <ŋ> so that we have <nŋ> and <ŋŋ> to distinguish the two cases.

– jogloran
Sep 29 at 21:13

@jogloran Indeed! It's awkward mostly because it breaks compatibility with the other dialects. And, more controversially, they decided to cease the official use of syllabics in those dialects until they could find a workaround—which breaks compatibility even worse. Eventually I'm assuming the "ng"-ligature used in standard syllabics will get replaced with a new "ŋ" character, and maybe the awkward r/q problems will get fixed too—but it won't be a quick, or easy, change at this point.

– Draconis
Sep 29 at 21:17

|
show 1 more comment

Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:

Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...

Arguably, such transliteration systems are obsolete in the age of Unicode anyway.

answered Sep 29 at 3:52

R..

1212 bronze badges

add a comment
|

Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:

Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...

Arguably, such transliteration systems are obsolete in the age of Unicode anyway.

answered Sep 29 at 3:52

R..

1212 bronze badges

add a comment
|

Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:

Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...

Arguably, such transliteration systems are obsolete in the age of Unicode anyway.

answered Sep 29 at 3:52

R..

1212 bronze badges

Regarding Wylie, the problem you describe is part of the issue noted in the Wikipedia article you linked:

Wylie's original scheme is not capable of transliterating all Tibetan-script texts. In particular, it has no correspondences for most Tibetan punctuation symbols, and lacks the ability to represent non-Tibetan words written in Tibetan script...

Arguably, such transliteration systems are obsolete in the age of Unicode anyway.

answered Sep 29 at 3:52

R..

1212 bronze badges

answered Sep 29 at 3:52

R..

1212 bronze badges

answered Sep 29 at 3:52

R..

1212 bronze badges

answered Sep 29 at 3:52

R..

1212 bronze badges

add a comment
|

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

add a comment
|

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

add a comment
|

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

answered Sep 28 at 16:06

user6726

42.1k1 gold badge28 silver badges84 bronze badges

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Linguistics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bsrgvty

3 Answers
3

Your Answer

Post as a guest

3 Answers
3

3 Answers
3

Post as a guest

Popular posts from this blog

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

3 Answers 3

3 Answers 3

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

3 Answers
3

3 Answers
3

3 Answers
3