Map unique raw words to a list of code wordsConverting base-10 numbers into base-26 lettersFiltering a long list of files through a set of ignore patterns using iteratorsCarTalk's Homophones Puzzler: A Programmatic SolutionCheck consistency of a list of statements, with fuzzy rhyme matchingIterator to generate all the words (of a given words) that are one change away“Acro Words” - Creating Acronyms of a Text that are Wordsinroder_iterator for syntax tree of markargsFirstDuplicate FinderCounting lower vs non-lowercase tokens for tokenized text with several conditionsA GapHelper static class to create unique (non-existing) size restricted names

Is it appropriate to ask for the text of a eulogy?

How to analyse 'Element not Found' exceptions when working with Selenium

What Lego set has the biggest box?

finding IP return hex address

Is it possible for a tiger's tail to be taken off and replaced with a living cobra, with both creatures still alive?

Declining a paper review after accepting it and seeing the manuscript

Is there something as common frequency?

How much of a discount should I seek when prepaying a whole year's rent?

How did 達 (~tachi) come to mean `pluralize` something?

Confused about Autoregressive AR(1) process

What are the downsides of being a debt-free country (no foreign national debt)?

Was a four year-old forced to sleep on the floor of Leeds General Infirmary?

Command to keep only a portion of JSON data from each line?

Why is there no FPU on (most) DSP chips?

Would Topic Modelling be classified as NLP or NLU?

Have spacecraft photographed each other beyond Earth orbit?

What is a Aged Rope Phrase™?

Can I ignore an open source license if I checkout a version that was released prior to the code having any license?

Does the basis graph of a matroid determine it?

Thoughts on if it's possible to succeed in math @ PhD level w/o natural ability in quant reasoning?

Do trolls appear to be dead after reaching 0 HP from non-fire/acid damage?

Why is Trump not being impeached for bribery?

Sudden cheap travel?

Sleep for 1000 years

Map unique raw words to a list of code words

Converting base-10 numbers into base-26 lettersFiltering a long list of files through a set of ignore patterns using iteratorsCarTalk's Homophones Puzzler: A Programmatic SolutionCheck consistency of a list of statements, with fuzzy rhyme matchingIterator to generate all the words (of a given words) that are one change away“Acro Words” - Creating Acronyms of a Text that are Wordsinroder_iterator for syntax tree of markargsFirstDuplicate FinderCounting lower vs non-lowercase tokens for tokenized text with several conditionsA GapHelper static class to create unique (non-existing) size restricted names

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

Problem

Write a function that replaces the words in raw with the words in code_words such that the first occurrence of each word in raw is assigned the first unassigned word in code_words. If the code_words list is too short, raise an error. code_words may contain duplicates, in which case the function should ignore/skip them.

Examples:

encoder(["a"], ["1", "2", "3", "4"]) → ["1"]
encoder(["a", "b"], ["1", "2", "3", "4"]) → ["1", "2"]
encoder(["a", "b", "a"], ["1", "1", "2", "3", "4"]) → ["1", "2", "1"]

Solution

def encoder(raw, code_words):
 cw = iter(code_words)
 code_by_raw = # map of raw item to code item
 result = []
 seen = set() # for ignoring duplicate code_words
 for r in raw:
 if r not in code_by_raw:
 for code in cw: # cw is iter(code_words), "persistent pointer"
 if code not in seen:
 seen.add(code)
 break
 else: # nobreak; ran out of code_words
 raise ValueError("not enough code_words")
 code_by_raw[r] = code
 result.append(code_by_raw[r])
 return result

Questions

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

What should be the typical best practices in this case?

Might it be better if I used the following instead?

try:
 code = next(cw)
 while code in seen:
 code = next(cw)
except StopIteration:
 raise ValueError("not enough code_words")
else:
 seen.add(code)

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

add a comment
|

Problem

Examples:

encoder(["a"], ["1", "2", "3", "4"]) → ["1"]
encoder(["a", "b"], ["1", "2", "3", "4"]) → ["1", "2"]
encoder(["a", "b", "a"], ["1", "1", "2", "3", "4"]) → ["1", "2", "1"]

Solution

def encoder(raw, code_words):
 cw = iter(code_words)
 code_by_raw = # map of raw item to code item
 result = []
 seen = set() # for ignoring duplicate code_words
 for r in raw:
 if r not in code_by_raw:
 for code in cw: # cw is iter(code_words), "persistent pointer"
 if code not in seen:
 seen.add(code)
 break
 else: # nobreak; ran out of code_words
 raise ValueError("not enough code_words")
 code_by_raw[r] = code
 result.append(code_by_raw[r])
 return result

Questions

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

What should be the typical best practices in this case?

Might it be better if I used the following instead?

try:
 code = next(cw)
 while code in seen:
 code = next(cw)
except StopIteration:
 raise ValueError("not enough code_words")
else:
 seen.add(code)

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

add a comment
|

Problem

Examples:

encoder(["a"], ["1", "2", "3", "4"]) → ["1"]
encoder(["a", "b"], ["1", "2", "3", "4"]) → ["1", "2"]
encoder(["a", "b", "a"], ["1", "1", "2", "3", "4"]) → ["1", "2", "1"]

Solution

def encoder(raw, code_words):
 cw = iter(code_words)
 code_by_raw = # map of raw item to code item
 result = []
 seen = set() # for ignoring duplicate code_words
 for r in raw:
 if r not in code_by_raw:
 for code in cw: # cw is iter(code_words), "persistent pointer"
 if code not in seen:
 seen.add(code)
 break
 else: # nobreak; ran out of code_words
 raise ValueError("not enough code_words")
 code_by_raw[r] = code
 result.append(code_by_raw[r])
 return result

Questions

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

What should be the typical best practices in this case?

Might it be better if I used the following instead?

try:
 code = next(cw)
 while code in seen:
 code = next(cw)
except StopIteration:
 raise ValueError("not enough code_words")
else:
 seen.add(code)

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

Problem

Examples:

encoder(["a"], ["1", "2", "3", "4"]) → ["1"]
encoder(["a", "b"], ["1", "2", "3", "4"]) → ["1", "2"]
encoder(["a", "b", "a"], ["1", "1", "2", "3", "4"]) → ["1", "2", "1"]

Solution

def encoder(raw, code_words):
 cw = iter(code_words)
 code_by_raw = # map of raw item to code item
 result = []
 seen = set() # for ignoring duplicate code_words
 for r in raw:
 if r not in code_by_raw:
 for code in cw: # cw is iter(code_words), "persistent pointer"
 if code not in seen:
 seen.add(code)
 break
 else: # nobreak; ran out of code_words
 raise ValueError("not enough code_words")
 code_by_raw[r] = code
 result.append(code_by_raw[r])
 return result

Questions

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

What should be the typical best practices in this case?

Might it be better if I used the following instead?

try:
 code = next(cw)
 while code in seen:
 code = next(cw)
except StopIteration:
 raise ValueError("not enough code_words")
else:
 seen.add(code)

python iterator iteration generator

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

edited Sep 30 at 16:07

Mast

11.1k7 gold badges41 silver badges95 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

asked Sep 30 at 6:31

nehcsivart

1733 bronze badges

add a comment
|

1 Answer
1

active

oldest

votes

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

No. Instead, you can just remove the line cw = iter(code_words) as long as it's a native iterable. "Persistent Pointer" isn't a thing in python, because all python knows are Names.

What should be the typical best practices in this case?

That would be building a dictionary and using it for the actual translation. You're basically already doing this with your code_by_raw, if a bit more verbose than others might. The only real difference would be that, in my opinion, it would be better to first establish the translation, and then create the result.

Except for your premature result generation, I would say your current function isn't bad. It does what it needs to do, it does it well without stupid actions, but it's not very readable. It's said often, I think you need to factor out a bit of code. Specifically, the bit that handles the fact that your inputs don't have to yield unique values, and how you need to handle duplicates.

I would suggest a generator to handle that. This simplifies the main function a ton. (A comment pointed me towards the unique_everseen recipe, which is a slightly broader function. We don't quite need all it's functionality, but it might be worth the effort if you need some more flexibility.)

def unique(iterable):
 """ Generator that "uniquefies" an iterator. Subsequent values equal to values already yielded will be ignored. """
 past = set()
 for entry in iterable:
 if entry in past:
 continue
 past.add(entry)
 yield entry

def encoder(raw_words, code_words):
 # Create mapping dictionary:
 code_by_raw = dict(zip(unique(raw_words), unique(code_words))
 # Check if we had sufficient code_words:
 if len(code_by_raw) < len(raw_words):
 raise ValueError("not enough code_words")
 # Do translation and return the result
 return [code_by_raw[raw] for raw in raw_words]

I can't completely tell your experience level with python. For result creation, I'm using comprehensions here.

Might it be better if I used the following instead?

It would not be bad functionally to use a structure like that, but it's still ugly (but opinions may differ). It basically does the same as my unique() generator up there.

edited Sep 30 at 19:08

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

$begingroup$
Also it might be worth it to have a look at the unique_everseen function in the itertools recipes, which has some performance improvements and an optional key by which to determine uniqueness (but is otherwise the same as your unique function).
$endgroup$
– Graipher
Sep 30 at 7:40

1

$begingroup$
Yeah, that's worth mentioning. I put it in. I'll keep my unique() around for ease spotting of what it does.
$endgroup$
– Gloweye
Sep 30 at 7:44

1

$begingroup$
Beware that it is just a recipe, though. Unfortunately you cannot just do from itertools import unique_everseen.
$endgroup$
– Graipher
Sep 30 at 7:48

$begingroup$
Ah, OK. Didn't pay attention to the header.
$endgroup$
– Gloweye
Sep 30 at 7:50

1

$begingroup$
I think dict.fromkeys(iterable) serves more or less the same functionality (for Python version >= 3.6) as unique(iterable).
$endgroup$
– GZ0
Sep 30 at 21:23

|
show 3 more comments

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f229891%2fmap-unique-raw-words-to-a-list-of-code-words%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

No. Instead, you can just remove the line cw = iter(code_words) as long as it's a native iterable. "Persistent Pointer" isn't a thing in python, because all python knows are Names.

What should be the typical best practices in this case?

def unique(iterable):
 """ Generator that "uniquefies" an iterator. Subsequent values equal to values already yielded will be ignored. """
 past = set()
 for entry in iterable:
 if entry in past:
 continue
 past.add(entry)
 yield entry

def encoder(raw_words, code_words):
 # Create mapping dictionary:
 code_by_raw = dict(zip(unique(raw_words), unique(code_words))
 # Check if we had sufficient code_words:
 if len(code_by_raw) < len(raw_words):
 raise ValueError("not enough code_words")
 # Do translation and return the result
 return [code_by_raw[raw] for raw in raw_words]

I can't completely tell your experience level with python. For result creation, I'm using comprehensions here.

Might it be better if I used the following instead?

It would not be bad functionally to use a structure like that, but it's still ugly (but opinions may differ). It basically does the same as my unique() generator up there.

edited Sep 30 at 19:08

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

$begingroup$
Also it might be worth it to have a look at the unique_everseen function in the itertools recipes, which has some performance improvements and an optional key by which to determine uniqueness (but is otherwise the same as your unique function).
$endgroup$
– Graipher
Sep 30 at 7:40

1

$begingroup$
Yeah, that's worth mentioning. I put it in. I'll keep my unique() around for ease spotting of what it does.
$endgroup$
– Gloweye
Sep 30 at 7:44

1

$begingroup$
Beware that it is just a recipe, though. Unfortunately you cannot just do from itertools import unique_everseen.
$endgroup$
– Graipher
Sep 30 at 7:48

$begingroup$
Ah, OK. Didn't pay attention to the header.
$endgroup$
– Gloweye
Sep 30 at 7:50

1

$begingroup$
I think dict.fromkeys(iterable) serves more or less the same functionality (for Python version >= 3.6) as unique(iterable).
$endgroup$
– GZ0
Sep 30 at 21:23

|
show 3 more comments

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

No. Instead, you can just remove the line cw = iter(code_words) as long as it's a native iterable. "Persistent Pointer" isn't a thing in python, because all python knows are Names.

What should be the typical best practices in this case?

def unique(iterable):
 """ Generator that "uniquefies" an iterator. Subsequent values equal to values already yielded will be ignored. """
 past = set()
 for entry in iterable:
 if entry in past:
 continue
 past.add(entry)
 yield entry

def encoder(raw_words, code_words):
 # Create mapping dictionary:
 code_by_raw = dict(zip(unique(raw_words), unique(code_words))
 # Check if we had sufficient code_words:
 if len(code_by_raw) < len(raw_words):
 raise ValueError("not enough code_words")
 # Do translation and return the result
 return [code_by_raw[raw] for raw in raw_words]

I can't completely tell your experience level with python. For result creation, I'm using comprehensions here.

Might it be better if I used the following instead?

It would not be bad functionally to use a structure like that, but it's still ugly (but opinions may differ). It basically does the same as my unique() generator up there.

edited Sep 30 at 19:08

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

$begingroup$
Also it might be worth it to have a look at the unique_everseen function in the itertools recipes, which has some performance improvements and an optional key by which to determine uniqueness (but is otherwise the same as your unique function).
$endgroup$
– Graipher
Sep 30 at 7:40

1

$begingroup$
Yeah, that's worth mentioning. I put it in. I'll keep my unique() around for ease spotting of what it does.
$endgroup$
– Gloweye
Sep 30 at 7:44

1

$begingroup$
Beware that it is just a recipe, though. Unfortunately you cannot just do from itertools import unique_everseen.
$endgroup$
– Graipher
Sep 30 at 7:48

$begingroup$
Ah, OK. Didn't pay attention to the header.
$endgroup$
– Gloweye
Sep 30 at 7:50

1

$begingroup$
I think dict.fromkeys(iterable) serves more or less the same functionality (for Python version >= 3.6) as unique(iterable).
$endgroup$
– GZ0
Sep 30 at 21:23

|
show 3 more comments

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

No. Instead, you can just remove the line cw = iter(code_words) as long as it's a native iterable. "Persistent Pointer" isn't a thing in python, because all python knows are Names.

What should be the typical best practices in this case?

def unique(iterable):
 """ Generator that "uniquefies" an iterator. Subsequent values equal to values already yielded will be ignored. """
 past = set()
 for entry in iterable:
 if entry in past:
 continue
 past.add(entry)
 yield entry

def encoder(raw_words, code_words):
 # Create mapping dictionary:
 code_by_raw = dict(zip(unique(raw_words), unique(code_words))
 # Check if we had sufficient code_words:
 if len(code_by_raw) < len(raw_words):
 raise ValueError("not enough code_words")
 # Do translation and return the result
 return [code_by_raw[raw] for raw in raw_words]

I can't completely tell your experience level with python. For result creation, I'm using comprehensions here.

Might it be better if I used the following instead?

It would not be bad functionally to use a structure like that, but it's still ugly (but opinions may differ). It basically does the same as my unique() generator up there.

edited Sep 30 at 19:08

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

My main concern is the use of cw as a "persistent pointer". Specifically, might people be confused when they see for code in cw?

No. Instead, you can just remove the line cw = iter(code_words) as long as it's a native iterable. "Persistent Pointer" isn't a thing in python, because all python knows are Names.

What should be the typical best practices in this case?

def unique(iterable):
 """ Generator that "uniquefies" an iterator. Subsequent values equal to values already yielded will be ignored. """
 past = set()
 for entry in iterable:
 if entry in past:
 continue
 past.add(entry)
 yield entry

def encoder(raw_words, code_words):
 # Create mapping dictionary:
 code_by_raw = dict(zip(unique(raw_words), unique(code_words))
 # Check if we had sufficient code_words:
 if len(code_by_raw) < len(raw_words):
 raise ValueError("not enough code_words")
 # Do translation and return the result
 return [code_by_raw[raw] for raw in raw_words]

I can't completely tell your experience level with python. For result creation, I'm using comprehensions here.

Might it be better if I used the following instead?

It would not be bad functionally to use a structure like that, but it's still ugly (but opinions may differ). It basically does the same as my unique() generator up there.

edited Sep 30 at 19:08

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

edited Sep 30 at 19:08

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

answered Sep 30 at 7:06

Gloweye

1,7165 silver badges19 bronze badges

$begingroup$
Also it might be worth it to have a look at the unique_everseen function in the itertools recipes, which has some performance improvements and an optional key by which to determine uniqueness (but is otherwise the same as your unique function).
$endgroup$
– Graipher
Sep 30 at 7:40

1

$begingroup$
Yeah, that's worth mentioning. I put it in. I'll keep my unique() around for ease spotting of what it does.
$endgroup$
– Gloweye
Sep 30 at 7:44

1

$begingroup$
Beware that it is just a recipe, though. Unfortunately you cannot just do from itertools import unique_everseen.
$endgroup$
– Graipher
Sep 30 at 7:48

$begingroup$
Ah, OK. Didn't pay attention to the header.
$endgroup$
– Gloweye
Sep 30 at 7:50

1

$begingroup$
I think dict.fromkeys(iterable) serves more or less the same functionality (for Python version >= 3.6) as unique(iterable).
$endgroup$
– GZ0
Sep 30 at 21:23

|
show 3 more comments

$begingroup$
Also it might be worth it to have a look at the unique_everseen function in the itertools recipes, which has some performance improvements and an optional key by which to determine uniqueness (but is otherwise the same as your unique function).
$endgroup$
– Graipher
Sep 30 at 7:40

1

$begingroup$
Yeah, that's worth mentioning. I put it in. I'll keep my unique() around for ease spotting of what it does.
$endgroup$
– Gloweye
Sep 30 at 7:44

1

$begingroup$
Beware that it is just a recipe, though. Unfortunately you cannot just do from itertools import unique_everseen.
$endgroup$
– Graipher
Sep 30 at 7:48

$begingroup$
Ah, OK. Didn't pay attention to the header.
$endgroup$
– Gloweye
Sep 30 at 7:50

1

$begingroup$
I think dict.fromkeys(iterable) serves more or less the same functionality (for Python version >= 3.6) as unique(iterable).
$endgroup$
– GZ0
Sep 30 at 21:23

Also it might be worth it to have a look at the unique_everseen function in the itertools recipes, which has some performance improvements and an optional key by which to determine uniqueness (but is otherwise the same as your unique function).

– Graipher
Sep 30 at 7:40

Yeah, that's worth mentioning. I put it in. I'll keep my unique() around for ease spotting of what it does.

– Gloweye
Sep 30 at 7:44

Beware that it is just a recipe, though. Unfortunately you cannot just do from itertools import unique_everseen.

– Graipher
Sep 30 at 7:48

Ah, OK. Didn't pay attention to the header.

– Gloweye
Sep 30 at 7:50

I think dict.fromkeys(iterable) serves more or less the same functionality (for Python version >= 3.6) as unique(iterable).

– GZ0
Sep 30 at 21:23

|
show 3 more comments

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bsrgvty

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer
1

1 Answer
1

1 Answer
1