How to correctly write regular expression to match ASCII control charsHow can I test and use a Perl regular expression interactively?How do I create a dynamic regexp with rx?How to save part of a regular expression during search and replace?Custom Major Mode - Regex to find word before equal sign and set font-lock-variable-name-faceHow to escape regexp special characters in a string?How to match more than one instance of a single subexpression?why is this trim-space function so complicated/ugly in emacs lisp?How to match symbol in regexp?JavaScript regular expressions in re-builderChange regex-builder-mode hook to use <C-s>

What does it mean by "d-ism of Leibniz" and "dotage of Newton" in simple English?

How can I offer a test ride while selling a bike?

Comma Code - Ch. 4 Automate the Boring Stuff

Looking after a wayward brother in mother's will

Why use water tanks from a retired Space Shuttle?

What does symbols in google maps (when looking for some location in uk) mean?

Why don't I have ground wiring on any of my outlets?

Strange math syntax in old basic listing

Can you please explain this joke: "I'm going bananas is what I tell my bananas before I leave the house"?

California: "For quality assurance, this phone call is being recorded"

Why is there a need to modify system call tables in Linux?

Creating Fictional Slavic Place Names

Can I ask a publisher for a paper that I need for reviewing

Do adult Russians normally hand-write Cyrillic as cursive or as block letters?

Applicants clearly not having the skills they advertise

What caused the tendency for conservatives to not support climate change regulations?

Could a guilty Boris Johnson be used to cancel Brexit?

Is there a way to save this session?

Is having a hidden directory under /etc safe?

Short story written from alien perspective with this line: "It's too bright to look at, so they don't"

Why does my electric oven present the option of 40A and 50A breakers?

Can a magnetic field of a large body be stronger than its gravity?

Can a helicopter mask itself from radar?

Are grass strips more dangerous than tarmac?



How to correctly write regular expression to match ASCII control chars


How can I test and use a Perl regular expression interactively?How do I create a dynamic regexp with rx?How to save part of a regular expression during search and replace?Custom Major Mode - Regex to find word before equal sign and set font-lock-variable-name-faceHow to escape regexp special characters in a string?How to match more than one instance of a single subexpression?why is this trim-space function so complicated/ugly in emacs lisp?How to match symbol in regexp?JavaScript regular expressions in re-builderChange regex-builder-mode hook to use <C-s>













2















I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:



^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$


So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?










share|improve this question
























  • I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?

    – npostavs
    Apr 14 at 15:34











  • I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.

    – serghei
    Apr 14 at 16:13












  • And as I can see À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF

    – serghei
    Apr 14 at 16:19















2















I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:



^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$


So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?










share|improve this question
























  • I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?

    – npostavs
    Apr 14 at 15:34











  • I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.

    – serghei
    Apr 14 at 16:13












  • And as I can see À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF

    – serghei
    Apr 14 at 16:19













2












2








2


1






I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:



^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$


So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?










share|improve this question
















I would like to to create a regular expression in elisp (in the standard 'read' form) to match extended ASCII chars the same as PCRE does:



^[a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]*$


So, I'm currently сonfused about x7f-xff. Is there a way to set a range using something like xhh?







regular-expressions






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 14 at 15:59







serghei

















asked Apr 14 at 14:54









sergheiserghei

190111




190111












  • I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?

    – npostavs
    Apr 14 at 15:34











  • I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.

    – serghei
    Apr 14 at 16:13












  • And as I can see À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF

    – serghei
    Apr 14 at 16:19

















  • I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?

    – npostavs
    Apr 14 at 15:34











  • I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.

    – serghei
    Apr 14 at 16:13












  • And as I can see À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF

    – serghei
    Apr 14 at 16:19
















I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?

– npostavs
Apr 14 at 15:34





I think the answer depends on whether you're matching against unibyte or multibyte strings. Do you think À (which is undefined in ASCII, 0xC0 in latin-1 and Unicode, but encoded as 0xC380 in UTF-8) falls into the range 0x7F-0xFF?

– npostavs
Apr 14 at 15:34













I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.

– serghei
Apr 14 at 16:13






I think so. At least PCRE matched À as a char in 0x7F-0xFF range. I need the same behavior for standard Elisp regular expression.

– serghei
Apr 14 at 16:13














And as I can see À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF

– serghei
Apr 14 at 16:19





And as I can see À is defined in ASCII: ascii-code.com. 0xC0 is between 0x7F and 0xFF

– serghei
Apr 14 at 16:19










1 Answer
1






active

oldest

votes


















1














You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).



That is, you can just insert the characters themselves in the regexp pattern.



One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):



[ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]






share|improve this answer

























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "583"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).



    That is, you can just insert the characters themselves in the regexp pattern.



    One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):



    [ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]






    share|improve this answer





























      1














      You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).



      That is, you can just insert the characters themselves in the regexp pattern.



      One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):



      [ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]






      share|improve this answer



























        1












        1








        1







        You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).



        That is, you can just insert the characters themselves in the regexp pattern.



        One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):



        [ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]






        share|improve this answer















        You can use -ÿ instead of x7f-xff. That first character, which StackExchange prints as a space, is DEL, which has codepoint 127 (decimal), #o177 (octal), and #x7f (hexadecimal).



        That is, you can just insert the characters themselves in the regexp pattern.



        One way to input such characters is to use C-x 8 RET. To search for any char in the range x7f through xff you would type this at the C-M-s prompt (without the spaces):



        [ C-x 8 RET # x 7 f - C-x 8 RET # x f f ]







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 14 at 17:52

























        answered Apr 14 at 17:46









        DrewDrew

        49.3k464110




        49.3k464110



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Emacs Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2femacs.stackexchange.com%2fquestions%2f48925%2fhow-to-correctly-write-regular-expression-to-match-ascii-control-chars%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Tamil (spriik) Luke uk diar | Nawigatjuun

            Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

            Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?