Is there a reliable way to hide/convey a message in vocal expressions (speech, song,…)A language made of silenceNatural gas based communicationIntercepting and Faking Radio CommunicationHow would the Lilim live?Social engineering: Design a new language for a totalitarian state obsessed with scientific advancementRecognizable natural numbers for alien message?How can people adjust spoken language to adapt for the discovery that they live on a non-orientable surface?

Special case of filling between curves

Does Airplane Mode allow GPS location to pass through?

How to remove solidified paste from toothbrush

How to handle a colleague who appears helpful in front of manager but doesn't help in private?

Earth magnetic field space elevator. No cable

How does 还 function in 快下雪了,我还以为今天是好天气?

Python's .split() implemented in C

Why does this process map every fraction to the golden ratio?

Is there a way to randomly distribute points within a circle on the surface of a sphere?

How big could a meteor crater be without causing significant secondary effects?

Balancing empathy and deferring to the syllabus in teaching responsibilities

Mistake with Whole/Half step intervals problem

How does mounted combatant interact with total cover?

How to calculate number of sets in Sigma Algebra

There exists a prime p such that p | n for all n ∈ N, n > 1

Can Veil of Summer force an opponent to target their own creatures with Swift End when Lucky Clover is on the battlefield?

What is the correct way for pilots to say the time?

"Startup" working hours - is it normal to be asked to work 11 hours/ day?

How to get the parameter values of a stored procedure that's in the middle of execution?

Is it a mistake to use a password that has previously been used (by anyone ever)?

The Meaning of a Musical Idea

Are these breakers suitable for a GE panel?

My PhD defense is next week and I am having negative thoughts about my work and knowledge. Any advice on how to tackle this?

Pre industrial form of rapid mass transit over land



Is there a reliable way to hide/convey a message in vocal expressions (speech, song,…)


A language made of silenceNatural gas based communicationIntercepting and Faking Radio CommunicationHow would the Lilim live?Social engineering: Design a new language for a totalitarian state obsessed with scientific advancementRecognizable natural numbers for alien message?How can people adjust spoken language to adapt for the discovery that they live on a non-orientable surface?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

.everyonelovesstackoverflowposition:absolute;height:1px;width:1px;opacity:0;top:0;left:0;pointer-events:none;








4















$begingroup$


Assume Bob. Bob wants to convey the following, simple message (an example) to Cassandra:



Walk 5 feet forward, turn around 90 degrees clockwise, walk another 4 feet and dig 3 feet into the ground.



Easy, right? Now here's the twist: I require Bob to not state the message outright, but by somehow hide or convey it within other vocal expressions. This could be him talking or singing. Bob and Cassandra had the prior opportunity to agree on a code scheme, and that's what I am after.



  • there should not be any link to the hidden message within the words of Bobs utterance. So something like "use every first word of every second sentence" is not viable. The meaning of the actual spoken/uttered words can not play any role within the scheme.

  • They do not know beforehand, if a song or talking will be used, so both modes need to be viable. Bonus Points if even random screaming could be used.

  • The scheme should allow for an almost mathematical precision. There should be no doubt if Bob meant 3 or 4 feet.

  • Assume that Cassandra, the recipent, can hear the message clearly. Audio Transfer is not my point, I am only looking for an encoding scheme.

  • I imagine, that some parameters of human voice or soundwaves in general could be used. I am unsure hovewer which one. Volume shouldn't have any meaning, so amplitude is out, right? Frequency?

  • Ease of use is not a primary concern. If both need to be geniuses and have absolute pitch for your idea to work, so be it. If Cassandra needs to know "oh, boy thats 120 Hz right now", so be it.

Given my requirements, goals and constraints, is there a way to use some acoustic property of the human voice as an "additional channel" to convey a second (hidden) message? How would such a mapping work?










share|improve this question











$endgroup$














  • $begingroup$
    Number of syllables sound nice! I will think it trough. Care to expand it into an answer? I also expanded my constraints thanks to your comment.
    $endgroup$
    – user6415
    Sep 10 at 22:28










  • $begingroup$
    I had a similar idea once.
    $endgroup$
    – Renan
    Sep 11 at 0:48










  • $begingroup$
    How do you determine "best answer" here. (Too interesting to shut down as idea genereation...)
    $endgroup$
    – Guran
    Sep 11 at 5:51






  • 1




    $begingroup$
    ♫And I would walk 500 feet, turn right and walk 400 more, and I would dig 300 feet...♫ (The trick is knowing to divide everything by 100...)
    $endgroup$
    – Darrel Hoffman
    Sep 11 at 13:24







  • 1




    $begingroup$
    Why is volume ruled out? Amplitude modulation in real life uses the "volume" of a radio signal just fine. It's not the absolute amplitude, but the change in amplitude, that carries the information.
    $endgroup$
    – MichaelS
    Sep 12 at 2:18

















4















$begingroup$


Assume Bob. Bob wants to convey the following, simple message (an example) to Cassandra:



Walk 5 feet forward, turn around 90 degrees clockwise, walk another 4 feet and dig 3 feet into the ground.



Easy, right? Now here's the twist: I require Bob to not state the message outright, but by somehow hide or convey it within other vocal expressions. This could be him talking or singing. Bob and Cassandra had the prior opportunity to agree on a code scheme, and that's what I am after.



  • there should not be any link to the hidden message within the words of Bobs utterance. So something like "use every first word of every second sentence" is not viable. The meaning of the actual spoken/uttered words can not play any role within the scheme.

  • They do not know beforehand, if a song or talking will be used, so both modes need to be viable. Bonus Points if even random screaming could be used.

  • The scheme should allow for an almost mathematical precision. There should be no doubt if Bob meant 3 or 4 feet.

  • Assume that Cassandra, the recipent, can hear the message clearly. Audio Transfer is not my point, I am only looking for an encoding scheme.

  • I imagine, that some parameters of human voice or soundwaves in general could be used. I am unsure hovewer which one. Volume shouldn't have any meaning, so amplitude is out, right? Frequency?

  • Ease of use is not a primary concern. If both need to be geniuses and have absolute pitch for your idea to work, so be it. If Cassandra needs to know "oh, boy thats 120 Hz right now", so be it.

Given my requirements, goals and constraints, is there a way to use some acoustic property of the human voice as an "additional channel" to convey a second (hidden) message? How would such a mapping work?










share|improve this question











$endgroup$














  • $begingroup$
    Number of syllables sound nice! I will think it trough. Care to expand it into an answer? I also expanded my constraints thanks to your comment.
    $endgroup$
    – user6415
    Sep 10 at 22:28










  • $begingroup$
    I had a similar idea once.
    $endgroup$
    – Renan
    Sep 11 at 0:48










  • $begingroup$
    How do you determine "best answer" here. (Too interesting to shut down as idea genereation...)
    $endgroup$
    – Guran
    Sep 11 at 5:51






  • 1




    $begingroup$
    ♫And I would walk 500 feet, turn right and walk 400 more, and I would dig 300 feet...♫ (The trick is knowing to divide everything by 100...)
    $endgroup$
    – Darrel Hoffman
    Sep 11 at 13:24







  • 1




    $begingroup$
    Why is volume ruled out? Amplitude modulation in real life uses the "volume" of a radio signal just fine. It's not the absolute amplitude, but the change in amplitude, that carries the information.
    $endgroup$
    – MichaelS
    Sep 12 at 2:18













4













4









4


1



$begingroup$


Assume Bob. Bob wants to convey the following, simple message (an example) to Cassandra:



Walk 5 feet forward, turn around 90 degrees clockwise, walk another 4 feet and dig 3 feet into the ground.



Easy, right? Now here's the twist: I require Bob to not state the message outright, but by somehow hide or convey it within other vocal expressions. This could be him talking or singing. Bob and Cassandra had the prior opportunity to agree on a code scheme, and that's what I am after.



  • there should not be any link to the hidden message within the words of Bobs utterance. So something like "use every first word of every second sentence" is not viable. The meaning of the actual spoken/uttered words can not play any role within the scheme.

  • They do not know beforehand, if a song or talking will be used, so both modes need to be viable. Bonus Points if even random screaming could be used.

  • The scheme should allow for an almost mathematical precision. There should be no doubt if Bob meant 3 or 4 feet.

  • Assume that Cassandra, the recipent, can hear the message clearly. Audio Transfer is not my point, I am only looking for an encoding scheme.

  • I imagine, that some parameters of human voice or soundwaves in general could be used. I am unsure hovewer which one. Volume shouldn't have any meaning, so amplitude is out, right? Frequency?

  • Ease of use is not a primary concern. If both need to be geniuses and have absolute pitch for your idea to work, so be it. If Cassandra needs to know "oh, boy thats 120 Hz right now", so be it.

Given my requirements, goals and constraints, is there a way to use some acoustic property of the human voice as an "additional channel" to convey a second (hidden) message? How would such a mapping work?










share|improve this question











$endgroup$




Assume Bob. Bob wants to convey the following, simple message (an example) to Cassandra:



Walk 5 feet forward, turn around 90 degrees clockwise, walk another 4 feet and dig 3 feet into the ground.



Easy, right? Now here's the twist: I require Bob to not state the message outright, but by somehow hide or convey it within other vocal expressions. This could be him talking or singing. Bob and Cassandra had the prior opportunity to agree on a code scheme, and that's what I am after.



  • there should not be any link to the hidden message within the words of Bobs utterance. So something like "use every first word of every second sentence" is not viable. The meaning of the actual spoken/uttered words can not play any role within the scheme.

  • They do not know beforehand, if a song or talking will be used, so both modes need to be viable. Bonus Points if even random screaming could be used.

  • The scheme should allow for an almost mathematical precision. There should be no doubt if Bob meant 3 or 4 feet.

  • Assume that Cassandra, the recipent, can hear the message clearly. Audio Transfer is not my point, I am only looking for an encoding scheme.

  • I imagine, that some parameters of human voice or soundwaves in general could be used. I am unsure hovewer which one. Volume shouldn't have any meaning, so amplitude is out, right? Frequency?

  • Ease of use is not a primary concern. If both need to be geniuses and have absolute pitch for your idea to work, so be it. If Cassandra needs to know "oh, boy thats 120 Hz right now", so be it.

Given my requirements, goals and constraints, is there a way to use some acoustic property of the human voice as an "additional channel" to convey a second (hidden) message? How would such a mapping work?







humans language communication






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 10 at 22:30

























asked Sep 10 at 22:10







user6415user6415





















  • $begingroup$
    Number of syllables sound nice! I will think it trough. Care to expand it into an answer? I also expanded my constraints thanks to your comment.
    $endgroup$
    – user6415
    Sep 10 at 22:28










  • $begingroup$
    I had a similar idea once.
    $endgroup$
    – Renan
    Sep 11 at 0:48










  • $begingroup$
    How do you determine "best answer" here. (Too interesting to shut down as idea genereation...)
    $endgroup$
    – Guran
    Sep 11 at 5:51






  • 1




    $begingroup$
    ♫And I would walk 500 feet, turn right and walk 400 more, and I would dig 300 feet...♫ (The trick is knowing to divide everything by 100...)
    $endgroup$
    – Darrel Hoffman
    Sep 11 at 13:24







  • 1




    $begingroup$
    Why is volume ruled out? Amplitude modulation in real life uses the "volume" of a radio signal just fine. It's not the absolute amplitude, but the change in amplitude, that carries the information.
    $endgroup$
    – MichaelS
    Sep 12 at 2:18
















  • $begingroup$
    Number of syllables sound nice! I will think it trough. Care to expand it into an answer? I also expanded my constraints thanks to your comment.
    $endgroup$
    – user6415
    Sep 10 at 22:28










  • $begingroup$
    I had a similar idea once.
    $endgroup$
    – Renan
    Sep 11 at 0:48










  • $begingroup$
    How do you determine "best answer" here. (Too interesting to shut down as idea genereation...)
    $endgroup$
    – Guran
    Sep 11 at 5:51






  • 1




    $begingroup$
    ♫And I would walk 500 feet, turn right and walk 400 more, and I would dig 300 feet...♫ (The trick is knowing to divide everything by 100...)
    $endgroup$
    – Darrel Hoffman
    Sep 11 at 13:24







  • 1




    $begingroup$
    Why is volume ruled out? Amplitude modulation in real life uses the "volume" of a radio signal just fine. It's not the absolute amplitude, but the change in amplitude, that carries the information.
    $endgroup$
    – MichaelS
    Sep 12 at 2:18















$begingroup$
Number of syllables sound nice! I will think it trough. Care to expand it into an answer? I also expanded my constraints thanks to your comment.
$endgroup$
– user6415
Sep 10 at 22:28




$begingroup$
Number of syllables sound nice! I will think it trough. Care to expand it into an answer? I also expanded my constraints thanks to your comment.
$endgroup$
– user6415
Sep 10 at 22:28












$begingroup$
I had a similar idea once.
$endgroup$
– Renan
Sep 11 at 0:48




$begingroup$
I had a similar idea once.
$endgroup$
– Renan
Sep 11 at 0:48












$begingroup$
How do you determine "best answer" here. (Too interesting to shut down as idea genereation...)
$endgroup$
– Guran
Sep 11 at 5:51




$begingroup$
How do you determine "best answer" here. (Too interesting to shut down as idea genereation...)
$endgroup$
– Guran
Sep 11 at 5:51




1




1




$begingroup$
♫And I would walk 500 feet, turn right and walk 400 more, and I would dig 300 feet...♫ (The trick is knowing to divide everything by 100...)
$endgroup$
– Darrel Hoffman
Sep 11 at 13:24





$begingroup$
♫And I would walk 500 feet, turn right and walk 400 more, and I would dig 300 feet...♫ (The trick is knowing to divide everything by 100...)
$endgroup$
– Darrel Hoffman
Sep 11 at 13:24





1




1




$begingroup$
Why is volume ruled out? Amplitude modulation in real life uses the "volume" of a radio signal just fine. It's not the absolute amplitude, but the change in amplitude, that carries the information.
$endgroup$
– MichaelS
Sep 12 at 2:18




$begingroup$
Why is volume ruled out? Amplitude modulation in real life uses the "volume" of a radio signal just fine. It's not the absolute amplitude, but the change in amplitude, that carries the information.
$endgroup$
– MichaelS
Sep 12 at 2:18










14 Answers
14






active

oldest

votes


















19

















$begingroup$

Morse code and syllable length - that is, use syllable length to encode a message in Morse code so a long syllable for a dash, a short one of a dot. Easy to decode if intercepted - yes, absolutely. But you didn't mention the possibility that anyone was listening in to find a hidden message, just that it needed to be encoded.



Without any training, it'll be slightly noticeable to be sure, though an excuse like 'My cadence varies when I'm nervous' could help. With training, though, you'll be able to keep the dash and dot syllables only slightly varied from true syllables, and thus perfectly viable.






share|improve this answer










$endgroup$














  • $begingroup$
    Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
    $endgroup$
    – Mat J
    Sep 12 at 6:10






  • 1




    $begingroup$
    How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
    $endgroup$
    – Robin Bennett
    Sep 12 at 14:05










  • $begingroup$
    @RobinBennett That could work if you don't mind swearing like a sailor.
    $endgroup$
    – Halfthawed
    Sep 12 at 17:11










  • $begingroup$
    Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
    $endgroup$
    – Keith Morrison
    Sep 12 at 20:22


















16

















$begingroup$

You can scream if you like, because it's not the sounds that matter, it's the silences between them.



The gaps between the words are what counts, whether you choose to encode in Morse or otherwise. The cadence of speech includes the gaps as well as the sounds and a bit of careful timing will allow you to pass the message.



The major downside of anything related to Morse code is that you have a very low information density. There's going to be a lot of screaming for you to get your message across.






share|improve this answer












$endgroup$






















    12

















    $begingroup$

    Breath and word count



    The simplest form of encoding I can come up with is this:



    Wether Bob speaks or sings, pay attention to when he breathes. Count the number of words between each breath. It will be a number between one and eight. (If not, it is meaningless noice, a filler)



    By combining two such numbers the scheme allows for 64 characters, more than enough for A-Z, numbers and space.



    Granted, this will be very hard to encode/decode on the fly, but given some minimal preparation Bob can easily disguise any message in speech or song.






    share|improve this answer










    $endgroup$






















      10

















      $begingroup$

      You could hide the message in the word order where grammar allows it (obviously that works the better the more the used language allows to reorder words).



      For example, consider the sentence:




      Today I'll have pizza for lunch.




      You can move "today" and "for lunch" to many different positions:




      I'll have pizza today for lunch.

      I'll have pizza for lunch today.

      For lunch I'll have pizza today.

      For lunch today, I'll have pizza.

      Today for lunch, I'll have pizza.




      So you can encode a digit between 1 and 6 in the word order. Note that the hidden message is independent of the obvious message; the very same digit can be hidden in sentences like




      Yesterday I watched Doctor Who after work.

      Sometimes I go swimming in the morning.




      Now clearly there are sentences with different number of possible word orders. But it should be possible to come up with a code that works with arbitrary sentences (except that sentences with fixed word order won't be able to give any information).



      A possible coding strategy could be as follows:



      For each sentence, determine the number of possible word orders; let's call it the sentence capacity. For example, the example sentence above would have a capacity of 6. Then take as many sentences that the product of their capacities is at least 27 (enough to encode 26 letters and a space). These sentences give a code group.



      Next, assign each sentence of the code group a number by lexicographically ordering the possible sentences and numbering them starting at zero. The example sentence is last in the order, therefore it would get the number 5.



      Then, calculate the value of the code group by multiplying the number of each sentence by the capacities of all following sentences, and then adding it all together.



      If the resulting value is zero, it is a space, if it is between 1 and 26, it describes a letter, and if it is larger, then the encoder made an error.



      For example, consider the following text:




      I'll have Pizza for lunch today. I bought it this morning. For dessert I plan to eat strawberries or cherries.




      The first sentence has a capacity of 6, the second has a capacity of 2, the third has a capacity of 4 ("for dessert" can be put at the beginning or end, and also the order of strawberries and cherries can be flipped without changing the meaning).



      The product of the capacities is 6×2×4=48, clearly larger than 27, but the first two sentences only give 12, so the code group consists of those three sentences.



      The first sentence has two other possible orders preceding it in lexicographical order (the two variants starting with "for lunch"), so it gets the value 2. The second sentence is first in the list of possible word orders, so it gets the value 0. And the third sentence has only the one with strawberries and cherries switched preceding it lexicographically, thus it gets the value 1.



      Thus the value of the code group is 2×2×4 + 0×4 + 1 = 17, which corresponds to a Q.






      share|improve this answer










      $endgroup$






















        6

















        $begingroup$

        Steganography is the encoding and decoding of information hidden in plain sight within pictures, audio files, whatever. It's much easier if you allow hardware in the mix - for example, encoding a message in the noise below the audible level, or adjusting the frequency of each tone by just enough that the difference can be measured but not heard.



        But you want to be able to do this by just singing or talking. For singing, one possible encoding scheme for a very good singer would be vibrato. You could send numerically encoded messages by controlling how many 'beats' of vibrato you use for each phrase.



        But you want to be able to use talking as well, and the encoding can't be in the words according to your criteria. So that leaves things like pitch, duration, volume, and non-word sounds like breath intakes, duration between words, 'umm's and 'awws', etc. Skip volume, as it's too hard to determine absolute volume and you probably want it to work for varying levels of background noise,



        For more complexity add them together. For example, taking in a breath then saying, 'um, we should go' could mean something completely different than just saying 'we should go', which could be different than sighing then saying the same thing. It's not the 'we should go' that matters, it's the patterns of speech around the words.



        So, 'breath intake + um + rising tone at end of sentence' (as in a question) means one thing. 'breath exhale + 2ums in sentence + flat tone' means something else. Make up as many different combinations as you need to encode all the information.



        The nice thing about encoding your message in the 'metadata' of talking instead of words or syllable lengths or something tied to specific words is that you could make it work with any text. What matters is not the text itself, but how you say it or sing it. You could read the phone book this way and still get your message across.






        share|improve this answer










        $endgroup$






















          5

















          $begingroup$

          Something like Jeremiah Denton? As a POW, in a televised interview he blinked t-o-r-t-u-r-e in morse code.
          https://youtu.be/rufnWLVQcKg






          share|improve this answer










          $endgroup$














          • $begingroup$
            Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
            $endgroup$
            – user6415
            Sep 11 at 12:20


















          5

















          $begingroup$

          Summary



          You could use amplitude or frequency modulation of the voice to transfer data between 5 WPM (realistic maximum) and 20 WPM (probably superhuman). You could go arbitrarily high with synthetics or cybernetic implants (up to around 5 billion WPM).



          Essentially, it involves slightly (or greatly, if stealth isn't an issue) raising and lowering your volume or pitch with reasonably accurate timing. These changes are interpreted by your partner as binary data, which can encode text or other information using a variety of formats.



          A 5-bit, 32-character ASCII-like code is probably the fastest option, with Morse code being a little slower, but less prone to errors.



          You could also use AM and FM at the same time (this is called quadrature amplitude modulation, or QAM0), but that would be extremely difficult for normal people to pull off, and I haven't discussed it here.



          Amplitude and Frequency Modulation (AM and FM)



          The obvious aspects of the human voice that translate into basic radio transmission schemes are amplitude and frequency. By rapidly changing either property you can encode lower-frequency sound waves in the higher-frequency carrier wave.



          Animation of a simple base signal modulated with amplitude and frequency modulation.
          AM / FM Examples.1



          AM / FM Sound Waves



          Instead of translating to radio waves and back, you can simply alter the amplitude or frequency of sound at very high speeds (about twice the speed of the maximum frequency you're representing). This might be possible for a synthetic, but would likely be impossible for a normal human in real time.



          However, you can always encode the signal in non-real time. I'm not finding any data on the fastest larynx variations humans are capable of, but I'd guess it's no greater than 10 Hz, and probably less than that. Various studies have shown large muscles can twitch in 100 to 300 ms2, which roughly translates to 3 to 10 Hz, anti-respectively.



          One second of a 200 Hz signal would then take 40 or more seconds to encode. It would also likely be quite difficult to encode and decode.



          On-Off and Frequency-Shift Keying



          On-off keying3 is a simple way to modulate binary data using extreme amplitude shifts. The presence of noise represents logic high, while the absence of noise represents logic low. We can extend this by using two different, but present, amplitudes. Or using two different frequencies, known as frequency-shift keying4.



          Image showing binary frequency-shift keying.
          Binary Frequency-Shift Keying.5
          Image showing amplitude-shift keying.
          Binary Amplitude-Shift Keying.6



          Encoding a Message in Binary



          At this point, you have a simple, binary alphabet. You can use this alphabet to encode any kind of data you want. You can have fixed-length words that represent specific letters in a traditional alphabet (ASCII7 or Unicode8), variable-length words representing an intermediate set of symbols(Morse code9), binary data representing sound levels or image information, etc.



          The biggest problem here is just the human factor. The more complex your information, the harder it is to feasibly encode and decode it. At some point, there's a physical limit. Your best case is likely to be something like ASCII, reduced to 5 bits, or 32 characters. At 2.5 Hz, each character takes two seconds.



          Binary-encoded Morse code (BEMC) could also be used, but it takes about 6.2 bits per character (25% more).



          (I wrote a simple C program10 to convert an input string to BEMC. To test normal-ish English, I picked a random Wikipedia article, got the article for "Lake Chub"11, stripped newlines from the text, and used the contents as input for the program. The program processes alpha and numeric characters, along with spaces. Input consisted of 4972 characters, of which 4739 were processed and converted to 29690 bits. On average, this encoding used 6.27 bits per character. Processing only alpha characters (to compare to 5-bit alpha-only encoding), 4696 were processed and converted to 29024 bits, which used 6.18 bits per character.)



          The advantage of BEMC is that every dash and dot has both a high-to-low and a low-to-high transition for every character, so it's relatively easy to keep track of timing.



          Technically, you have to mentally encode twice (once from alphabet to Morse, then again from Morse to binary), but in practice there's little distinction between BEMC and just using ternary (dash, dot, space) Morse code directly -- dashes are long periods of logic high with a short period of logic low, dots are short periods of logic high with a short period of logic low, and spaces are medium periods of logic low.



          The average word length of typical writing is 4.8 characters12. Add 1 character for spaces between words for 5.8 characters per word. An average text message is 7 words long13, or about 40 characters. At 5 bits per character, a text message takes 200 bits, or 80 seconds at 2.5 Hz. 20 seconds at 10 Hz.



          Alternately, this equates to 29 bits per word, 5.2 WPM at 2.5 Hz, or 21 WPM at 10 Hz.



          Using this in Practice



          Obviously, all of this is impractical for normal purposes. There's a perfectly good way to communicate with the human vocal apparatus: speech.



          But if you want to get short messages across, you could do so. The trick is shifting frequencies or amplitudes enough for the other person to consistently decode the data without errors, but not so much other people hear the difference.



          I doubt there's any way to prevent others from realizing your speech or singing is weird, but they wouldn't immediately know what you were doing. Further, they wouldn't necessarily know your exact encoding, though they could easily figure it out from a recording.



          The biggest problem here is likely to be keeping reasonably consistent timing over minute-long durations, but I'd guess it's doable.



          Synthetics or Cybernetic Implants



          A synthetic person, or a person with cybernetic implants, could plausibly use these techniques to reach much higher data transfer rates. The maximum frequency achievable in air is about 5 GHz14, limiting us to about 2.5 Gb/s, which is 86 million words per second, 5 billion WPM, 12 million texts per second, or 81 nanoseconds per text.



          But in all likelihood, you could do much better with non-acoustic data transfer methods if you had access to these levels of electronics.



          References



          0 An Electronic Notes article, What is QAM: quadrature amplitude modulation. https://www.electronics-notes.com/articles/radio/modulation/quadrature-amplitude-modulation-what-is-qam-basics.php
          1 Taken from Wikipedia under the Creative Commons license. https://en.wikipedia.org/wiki/File:Amfm3-en-de.gif
          2 Research study, Fast and slow twitch units in a human muscle from 1971. Found at the Journal of Neurology, Neurosurgery, and Psychiatry website. https://jnnp.bmj.com/content/jnnp/34/2/113.full.pdf
          3 A Wikipedia article, On-Off keying. https://en.wikipedia.org/wiki/On%E2%80%93off_keying
          4 A Wikipedia article, Frequency-shift keying. https://en.wikipedia.org/wiki/Frequency-shift_keying
          5 Taken from Wikipedia under the Creative Commons license. https://commons.wikimedia.org/w/index.php?curid=635074
          6 A modified version of (5), submitted under the original license.
          7 Table of ASCII codes. http://www.asciitable.com/
          8Unicode Consortium's overview of Unicode. https://home.unicode.org/basic-info/overview/
          9 babou's very awesome cs.stackexchange answer to Is Morse Code binary, ternary or quinary?. https://cs.stackexchange.com/a/39922
          10 My C program hosted at the OnlineGDB C compiler. https://onlinegdb.com/BkMqmOvUS
          11 A Wikipedia article, Lake chub. https://en.wikipedia.org/wiki/Lake_chub
          12 A Peter@Norvig.com article, English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU. http://norvig.com/mayzner.html
          13 A Crushh article, K, Wrap It Up Mom. https://crushhapp.com/blog/k-wrap-it-up-mom
          14 Ron Maimon's physics.stackexchange answer to Is there an upper frequency limit to ultrasound?. https://physics.stackexchange.com/a/23427/90152








          share|improve this answer










          $endgroup$














          • $begingroup$
            Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
            $endgroup$
            – user6415
            Sep 12 at 8:59


















          4

















          $begingroup$

          My first thought was that you can use principles of steganography here. but you've rejected the simpler patterns in the first bullet (there should not be any link to the hidden message within the words of Bobs utterance).



          So, next option. Most people speak from the mouth, not nasally. You can try to have Bob speak regular words nasally. Choose any two type of words here - example monosyllable and disyllable. Based on this, you can now convert these sounds into a morse code for english character.



          EDIT:



          I realise now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.






          share|improve this answer












          $endgroup$














          • $begingroup$
            Sorry, I did not understand how the nasal voice would matter in this context?
            $endgroup$
            – user6415
            Sep 10 at 22:36










          • $begingroup$
            @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
            $endgroup$
            – mu 無
            Sep 10 at 22:43



















          4

















          $begingroup$

          Many have covered encoding messages into speaking or writing, but I didn't see any covering singing.



          There are many languages like Mandarin or Cantonese that use tonality to change the meaning of a word. Using tonality in a language that doesn't naturally ascribe meaning to it, especially during song, is a great way to hide messages.



          For example, "forward" is a neutral tone, "turn right" is a rising tone, and "turn left" is a falling tone. A dipping tone (neutral, lower, back to neutral) could indicate down, while the opposite of a dipping tone could indicate up. Multiple words sung in a particular tone means "that direction for as many units as there are words. This schema allows encoding direction and magnitude into a song.



          The phrase "when the bird flies high" sung in a rising tone indicates "turn right, then go forward 5 units". There can also be a pre-existing vocabulary for other important things, like mentioning a dog means "guards", describing how beautiful something is means "look out for something with description". For more examples of this, you can read letters sent from WW2 POWs that have hidden messages encoded into rather innocuous statements.



          There could be a physical signal or even another tonal encoding to indicate "these are instructions", such as a certain chord or part of a song, like the bridge leading into the final chorus.




          The time has finally come - neutral, walk 5 feet forward



          for us to strike - rising tone, turn right, walk 4 feet



          our enemies down - dipping tone, 3 feet below where you are




          The combination of key words, tonality, and word count can be used to discretely convey hidden messages in song without tipping off casual listeners that there is a hidden message.






          share|improve this answer










          $endgroup$














          • $begingroup$
            Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
            $endgroup$
            – TCooper
            Sep 12 at 22:23











          • $begingroup$
            You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
            $endgroup$
            – JRodge01
            Sep 13 at 11:25










          • $begingroup$
            You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
            $endgroup$
            – TCooper
            Sep 13 at 16:36










          • $begingroup$
            but thanks for answering/my apologies on questioning what I hadn't read thoroughly
            $endgroup$
            – TCooper
            Sep 13 at 16:37


















          3

















          $begingroup$

          Morse is what probably comes to mind first, but, as pointed out, it is incredibly low density.



          If you can have the pitch of the voice change by 16 distinguishable levels, you can use it to encode hex codes, which are reasonably denser way of encoding something than Morse.



          Since you probably only need to encode 26 letters + 10 numbers, it gives us 36 hex numbers, which fits neatly into two hex codes:



          Pitch. Hex
          Pitch1 - 0
          Pitch2 - 1
          ...
          Pitch10 - A
          Pitch16 - F


          Letters A...Z - pitch combos from 00-1A



          Numbers 0-9 - pitch combos from 1B-24



          Actually there is so much surplus, that you might wish to either reduce alphabet to 16 values (to use single hex digit for encode) or use a higher digit number encoding, if you can expect to distinguish between 36 voice pitches). For example you could reduce 0 to O, 1 to I, use U for V, I for J and Y, get rid of Q etc. Some of this was used in early typewriters, for example. It all depends on how smart you can assume the intended recipient to be so that he can decode:



          Walk5ftforwardturn9odegreesclockwisewalkanother4ftanddig3ftintoground.



          Additionally you could have pitch ordering different so that lowest and highest pitches encode less used values, to reduce attention to pitch changes.



          So now you have pitch modulated hex encoding, which could naturally be tied to syllables. Every 2 syllables encode one letter of your alphabet (including numbers).



          Message is unimportant as are usually spaces between text.






          share|improve this answer












          $endgroup$






















            2

















            $begingroup$

            Singing: Encoded in nonsense



            There are a bunch of answers that cover spoken word. Many of them while slightly noticeable in normal conversation/speech would be extremely obvious when sung. There's no reason to not have two (or three if you have something else planned for screaming) different means of encoding your message.



            Many songs have nonsense lyrics such as La La La or Na Na Na. For example Police's De Do Do Do, De Da Da Da has the chorus:



            De do do do de da da da Is all I want to say to you 
            De do do do de da da da Their innocence will pull me through
            De do do do de da da da Is all I want to say to you
            De do do do de da da da They're meaningless and all that's true


            Which could easily be altered to encode your message, either using Morse code and varying between DO and DA or with some other scheme. And in fact you'll probably want to try and find some other scheme, because Morse code uses between 1 & 4 bits per letter, and the Police song for example only has 32 bits per chorus (96 with three repeats), meaning you probably only have around 40-50 letters for your message. That said you could probably
            find a song with more nonsense syllables in it (maybe Goldfrapp's Oh La La @ ~110).



            All that said, you will probably want to shorten the message some. Just removing the spaces, gives a 258 character message once converted to Morse code (190 if you discount spaces). You could shorten that considerably by not using proper full English. For example "Go N 5 f W 4 f Dig 3 f" is only 58 characters once converted Morse code which should easily fit in the Police song.



            Remember that brevity is fairly important when encoding a message within another unless you're planning an hour long speech or a full concert. Almost any kind of encoding is going to have a slower transmission speed then the medium using your requirements.






            share|improve this answer










            $endgroup$






















              2

















              $begingroup$

              Steganography has already been brought up, but not the best method for it. Modern Steganography usually involves encoding binary into the least significant values of a media file which can be extracted by comparing a secret original file to the modified one to extract the differences. This creates such small variances that they are indistinguishable from normal noise, while also being 100% error proof.



              How this applies to an audio file:



              An audio file decompresses to a continuous string of values that can be represented by numbers. Let's say for this example that you are using an 8-bit depth. Your first clip might look like [122,143,201,203,198,152,100,84,...] and your second clip looks like [122,144,202,203,199,153,101,84,...]. To the human ear, these two tracks are indistinguishable, but by subtracting one from the other, you get the following binary [0,1,1,0,1,1,1,0] which is the letter "n" in ASCII.



              Most modern audio files have a 44,100Hz sample rate with a 16-bit depth meaning that you can pack your entire example message into an audio file by adding a 0.15% fluctuation to 0.02 seconds of the audio file. (good luck hearing that!)



              Normally, this type of steganography is impossible to decode without the source file, meaning you can not use it ad hoc thereby violating your second bullet point, but audio files have the unique quality of often being saved in stereo. It is very common for mono-track audio to be transmitted in stereo where both tracks are just the same track repeated. But in this case, you make one stereo track the source, and the other the encoded message so you can just extract one from the other. This is less secure than using a secret source file in that someone looking for an encoded message may be able to find it; so, you will need to make sure the message is encrypted before you encode it.



              In this manner you can send a long and detailed message in any audio file (even just some random screaming). As long as the recipient has the right software and decryption key, they can read it.






              share|improve this answer












              $endgroup$






















                2

















                $begingroup$

                I'm not sure exactly what you're trying to rule out with your first bullet. Are you just ruling out simply jumbling the words or adding a bunch of null words, i.e. you don't want someone to be able to see the clear words in the song? Or do you mean that the spoken words cannot be used in any way?



                Because there are lots of ways to encode a message in a cover text. Sure, if you say "take every other word", someone who is expecting a message of this sort could see suspicious words in the song. I recall a code that was really used to pass a message to a captured spy once (I forget the circumstances, have to look it up) where the rule was, "take the 3rd letter after every punctuation mark". Punctuation wouldn't work well for a verbal message, of course, but you could have a rule like "take every fifth letter" or "take the second letter after every 'e'" or some such.



                You could apply some formula to the text. Like take the number of letters in each word and run these through some sort of formula. Or one syllable words count as a dot and multi-syllable words as a dash and use Morse code. Etc etc.



                Use a true code as opposed to a cypher. Make up a list of code words in advance. Like if he's going to sing a love song, say that "love" means "clockwise" and "beautiful" means "counter-clockwise" and "sunset" means "feet" and "dress" means "dig" and so on. Then he translates the message into the code words and fills in words that are not code words to make a coherent song.



                If you mean that the words cannot be used in any way to convey the message:



                You say both people can be assumed to be extremely perceptive. So okay, let's say the song covers 2 octaves. That's 16 notes. Any combination of 2 notes has 256 possibilities. Let 26 of those combinations map to the letters of the alphabet, maybe a few more map to other symbols you need, like spaces and essential punctuation. Then translate the message into these notes. A catch to this is that a random collection of notes wouldn't sound like a real song and might be impossible to sing. But if you're only using 30 or so of 256 note-pairs, you could use the other 226 freely as fillers to try to "smooth out" the tune. Or you could say that only every 4th note counts or some such. I haven't tried this and I don't know how practical it would be.



                Similarly, code the message in the lengths of the notes. Say use binary: A quarter note is 1, a half note is 2, a quarter note is 4, an eigth note is 8, a sixteenth note is 16. A rest marks the end of a group of notes making up a number. Then A=1, B=2, C=3, etc. Again, coding a message like this would make a highly discordant tune, you'd have to be able to add lots of nulls to smooth it out.






                share|improve this answer










                $endgroup$






















                  1

















                  $begingroup$

                  Take any well known tune ("Ode to Joy" came to mind to me for its simplicity).
                  Sing the tune with nonsense words - or perhaps alternate incorrect instructions - it is not the words that matter.



                  Treat the notes as binary - A 0 will be "in tune" and a 1 will be "off key".



                  Has the prerequisite that the encoder has reasonable singing ability (otherwise the decryption may come out all 1s).



                  An unfortunate side effect is that anyone familiar with the decryption method may read a lot of unintended nonsense from passing a karaoke bar






                  share|improve this answer










                  $endgroup$
















                    Your Answer








                    StackExchange.ready(function()
                    var channelOptions =
                    tags: "".split(" "),
                    id: "579"
                    ;
                    initTagRenderer("".split(" "), "".split(" "), channelOptions);

                    StackExchange.using("externalEditor", function()
                    // Have to fire editor after snippets, if snippets enabled
                    if (StackExchange.settings.snippets.snippetsEnabled)
                    StackExchange.using("snippets", function()
                    createEditor();
                    );

                    else
                    createEditor();

                    );

                    function createEditor()
                    StackExchange.prepareEditor(
                    heartbeatType: 'answer',
                    autoActivateHeartbeat: false,
                    convertImagesToLinks: false,
                    noModals: true,
                    showLowRepImageUploadWarning: true,
                    reputationToPostImages: null,
                    bindNavPrevention: true,
                    postfix: "",
                    imageUploader:
                    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                    allowUrls: true
                    ,
                    noCode: true, onDemand: true,
                    discardSelector: ".discard-answer"
                    ,immediatelyShowMarkdownHelp:true
                    );



                    );














                    draft saved

                    draft discarded
















                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f154862%2fis-there-a-reliable-way-to-hide-convey-a-message-in-vocal-expressions-speech-s%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown

























                    14 Answers
                    14






                    active

                    oldest

                    votes








                    14 Answers
                    14






                    active

                    oldest

                    votes









                    active

                    oldest

                    votes






                    active

                    oldest

                    votes









                    19

















                    $begingroup$

                    Morse code and syllable length - that is, use syllable length to encode a message in Morse code so a long syllable for a dash, a short one of a dot. Easy to decode if intercepted - yes, absolutely. But you didn't mention the possibility that anyone was listening in to find a hidden message, just that it needed to be encoded.



                    Without any training, it'll be slightly noticeable to be sure, though an excuse like 'My cadence varies when I'm nervous' could help. With training, though, you'll be able to keep the dash and dot syllables only slightly varied from true syllables, and thus perfectly viable.






                    share|improve this answer










                    $endgroup$














                    • $begingroup$
                      Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
                      $endgroup$
                      – Mat J
                      Sep 12 at 6:10






                    • 1




                      $begingroup$
                      How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
                      $endgroup$
                      – Robin Bennett
                      Sep 12 at 14:05










                    • $begingroup$
                      @RobinBennett That could work if you don't mind swearing like a sailor.
                      $endgroup$
                      – Halfthawed
                      Sep 12 at 17:11










                    • $begingroup$
                      Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
                      $endgroup$
                      – Keith Morrison
                      Sep 12 at 20:22















                    19

















                    $begingroup$

                    Morse code and syllable length - that is, use syllable length to encode a message in Morse code so a long syllable for a dash, a short one of a dot. Easy to decode if intercepted - yes, absolutely. But you didn't mention the possibility that anyone was listening in to find a hidden message, just that it needed to be encoded.



                    Without any training, it'll be slightly noticeable to be sure, though an excuse like 'My cadence varies when I'm nervous' could help. With training, though, you'll be able to keep the dash and dot syllables only slightly varied from true syllables, and thus perfectly viable.






                    share|improve this answer










                    $endgroup$














                    • $begingroup$
                      Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
                      $endgroup$
                      – Mat J
                      Sep 12 at 6:10






                    • 1




                      $begingroup$
                      How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
                      $endgroup$
                      – Robin Bennett
                      Sep 12 at 14:05










                    • $begingroup$
                      @RobinBennett That could work if you don't mind swearing like a sailor.
                      $endgroup$
                      – Halfthawed
                      Sep 12 at 17:11










                    • $begingroup$
                      Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
                      $endgroup$
                      – Keith Morrison
                      Sep 12 at 20:22













                    19















                    19











                    19







                    $begingroup$

                    Morse code and syllable length - that is, use syllable length to encode a message in Morse code so a long syllable for a dash, a short one of a dot. Easy to decode if intercepted - yes, absolutely. But you didn't mention the possibility that anyone was listening in to find a hidden message, just that it needed to be encoded.



                    Without any training, it'll be slightly noticeable to be sure, though an excuse like 'My cadence varies when I'm nervous' could help. With training, though, you'll be able to keep the dash and dot syllables only slightly varied from true syllables, and thus perfectly viable.






                    share|improve this answer










                    $endgroup$



                    Morse code and syllable length - that is, use syllable length to encode a message in Morse code so a long syllable for a dash, a short one of a dot. Easy to decode if intercepted - yes, absolutely. But you didn't mention the possibility that anyone was listening in to find a hidden message, just that it needed to be encoded.



                    Without any training, it'll be slightly noticeable to be sure, though an excuse like 'My cadence varies when I'm nervous' could help. With training, though, you'll be able to keep the dash and dot syllables only slightly varied from true syllables, and thus perfectly viable.







                    share|improve this answer













                    share|improve this answer




                    share|improve this answer










                    answered Sep 10 at 22:43









                    HalfthawedHalfthawed

                    12.7k1 gold badge16 silver badges50 bronze badges




                    12.7k1 gold badge16 silver badges50 bronze badges














                    • $begingroup$
                      Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
                      $endgroup$
                      – Mat J
                      Sep 12 at 6:10






                    • 1




                      $begingroup$
                      How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
                      $endgroup$
                      – Robin Bennett
                      Sep 12 at 14:05










                    • $begingroup$
                      @RobinBennett That could work if you don't mind swearing like a sailor.
                      $endgroup$
                      – Halfthawed
                      Sep 12 at 17:11










                    • $begingroup$
                      Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
                      $endgroup$
                      – Keith Morrison
                      Sep 12 at 20:22
















                    • $begingroup$
                      Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
                      $endgroup$
                      – Mat J
                      Sep 12 at 6:10






                    • 1




                      $begingroup$
                      How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
                      $endgroup$
                      – Robin Bennett
                      Sep 12 at 14:05










                    • $begingroup$
                      @RobinBennett That could work if you don't mind swearing like a sailor.
                      $endgroup$
                      – Halfthawed
                      Sep 12 at 17:11










                    • $begingroup$
                      Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
                      $endgroup$
                      – Keith Morrison
                      Sep 12 at 20:22















                    $begingroup$
                    Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
                    $endgroup$
                    – Mat J
                    Sep 12 at 6:10




                    $begingroup$
                    Reminds me about the this military operation in which Colombian army sent a hidden message to hostages… using a pop song
                    $endgroup$
                    – Mat J
                    Sep 12 at 6:10




                    1




                    1




                    $begingroup$
                    How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
                    $endgroup$
                    – Robin Bennett
                    Sep 12 at 14:05




                    $begingroup$
                    How about Morse code with swear words? Anything with 4 letters is a dot, anything longer is a dash. Relatively easy to fit one letter into each normal sentence without having to think too hard, easy to decode and almost indistinguishable from a normal soldier's conversation.
                    $endgroup$
                    – Robin Bennett
                    Sep 12 at 14:05












                    $begingroup$
                    @RobinBennett That could work if you don't mind swearing like a sailor.
                    $endgroup$
                    – Halfthawed
                    Sep 12 at 17:11




                    $begingroup$
                    @RobinBennett That could work if you don't mind swearing like a sailor.
                    $endgroup$
                    – Halfthawed
                    Sep 12 at 17:11












                    $begingroup$
                    Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
                    $endgroup$
                    – Keith Morrison
                    Sep 12 at 20:22




                    $begingroup$
                    Assuming the numbers are spelled out and commas omitted, but spaces need to be included, you need to encode 117 characters. Assuming you have some specific way of indicating a space between words, for Morse there will be 376 individual "characters" that have to be transmitted; dots, dashes, the space between letters, and the spaces between words.
                    $endgroup$
                    – Keith Morrison
                    Sep 12 at 20:22













                    16

















                    $begingroup$

                    You can scream if you like, because it's not the sounds that matter, it's the silences between them.



                    The gaps between the words are what counts, whether you choose to encode in Morse or otherwise. The cadence of speech includes the gaps as well as the sounds and a bit of careful timing will allow you to pass the message.



                    The major downside of anything related to Morse code is that you have a very low information density. There's going to be a lot of screaming for you to get your message across.






                    share|improve this answer












                    $endgroup$



















                      16

















                      $begingroup$

                      You can scream if you like, because it's not the sounds that matter, it's the silences between them.



                      The gaps between the words are what counts, whether you choose to encode in Morse or otherwise. The cadence of speech includes the gaps as well as the sounds and a bit of careful timing will allow you to pass the message.



                      The major downside of anything related to Morse code is that you have a very low information density. There's going to be a lot of screaming for you to get your message across.






                      share|improve this answer












                      $endgroup$

















                        16















                        16











                        16







                        $begingroup$

                        You can scream if you like, because it's not the sounds that matter, it's the silences between them.



                        The gaps between the words are what counts, whether you choose to encode in Morse or otherwise. The cadence of speech includes the gaps as well as the sounds and a bit of careful timing will allow you to pass the message.



                        The major downside of anything related to Morse code is that you have a very low information density. There's going to be a lot of screaming for you to get your message across.






                        share|improve this answer












                        $endgroup$



                        You can scream if you like, because it's not the sounds that matter, it's the silences between them.



                        The gaps between the words are what counts, whether you choose to encode in Morse or otherwise. The cadence of speech includes the gaps as well as the sounds and a bit of careful timing will allow you to pass the message.



                        The major downside of anything related to Morse code is that you have a very low information density. There's going to be a lot of screaming for you to get your message across.







                        share|improve this answer















                        share|improve this answer




                        share|improve this answer








                        edited Sep 11 at 7:37

























                        answered Sep 11 at 7:09









                        SeparatrixSeparatrix

                        98.2k34 gold badges223 silver badges376 bronze badges




                        98.2k34 gold badges223 silver badges376 bronze badges
























                            12

















                            $begingroup$

                            Breath and word count



                            The simplest form of encoding I can come up with is this:



                            Wether Bob speaks or sings, pay attention to when he breathes. Count the number of words between each breath. It will be a number between one and eight. (If not, it is meaningless noice, a filler)



                            By combining two such numbers the scheme allows for 64 characters, more than enough for A-Z, numbers and space.



                            Granted, this will be very hard to encode/decode on the fly, but given some minimal preparation Bob can easily disguise any message in speech or song.






                            share|improve this answer










                            $endgroup$



















                              12

















                              $begingroup$

                              Breath and word count



                              The simplest form of encoding I can come up with is this:



                              Wether Bob speaks or sings, pay attention to when he breathes. Count the number of words between each breath. It will be a number between one and eight. (If not, it is meaningless noice, a filler)



                              By combining two such numbers the scheme allows for 64 characters, more than enough for A-Z, numbers and space.



                              Granted, this will be very hard to encode/decode on the fly, but given some minimal preparation Bob can easily disguise any message in speech or song.






                              share|improve this answer










                              $endgroup$

















                                12















                                12











                                12







                                $begingroup$

                                Breath and word count



                                The simplest form of encoding I can come up with is this:



                                Wether Bob speaks or sings, pay attention to when he breathes. Count the number of words between each breath. It will be a number between one and eight. (If not, it is meaningless noice, a filler)



                                By combining two such numbers the scheme allows for 64 characters, more than enough for A-Z, numbers and space.



                                Granted, this will be very hard to encode/decode on the fly, but given some minimal preparation Bob can easily disguise any message in speech or song.






                                share|improve this answer










                                $endgroup$



                                Breath and word count



                                The simplest form of encoding I can come up with is this:



                                Wether Bob speaks or sings, pay attention to when he breathes. Count the number of words between each breath. It will be a number between one and eight. (If not, it is meaningless noice, a filler)



                                By combining two such numbers the scheme allows for 64 characters, more than enough for A-Z, numbers and space.



                                Granted, this will be very hard to encode/decode on the fly, but given some minimal preparation Bob can easily disguise any message in speech or song.







                                share|improve this answer













                                share|improve this answer




                                share|improve this answer










                                answered Sep 11 at 5:49









                                GuranGuran

                                4,4981 gold badge13 silver badges28 bronze badges




                                4,4981 gold badge13 silver badges28 bronze badges
























                                    10

















                                    $begingroup$

                                    You could hide the message in the word order where grammar allows it (obviously that works the better the more the used language allows to reorder words).



                                    For example, consider the sentence:




                                    Today I'll have pizza for lunch.




                                    You can move "today" and "for lunch" to many different positions:




                                    I'll have pizza today for lunch.

                                    I'll have pizza for lunch today.

                                    For lunch I'll have pizza today.

                                    For lunch today, I'll have pizza.

                                    Today for lunch, I'll have pizza.




                                    So you can encode a digit between 1 and 6 in the word order. Note that the hidden message is independent of the obvious message; the very same digit can be hidden in sentences like




                                    Yesterday I watched Doctor Who after work.

                                    Sometimes I go swimming in the morning.




                                    Now clearly there are sentences with different number of possible word orders. But it should be possible to come up with a code that works with arbitrary sentences (except that sentences with fixed word order won't be able to give any information).



                                    A possible coding strategy could be as follows:



                                    For each sentence, determine the number of possible word orders; let's call it the sentence capacity. For example, the example sentence above would have a capacity of 6. Then take as many sentences that the product of their capacities is at least 27 (enough to encode 26 letters and a space). These sentences give a code group.



                                    Next, assign each sentence of the code group a number by lexicographically ordering the possible sentences and numbering them starting at zero. The example sentence is last in the order, therefore it would get the number 5.



                                    Then, calculate the value of the code group by multiplying the number of each sentence by the capacities of all following sentences, and then adding it all together.



                                    If the resulting value is zero, it is a space, if it is between 1 and 26, it describes a letter, and if it is larger, then the encoder made an error.



                                    For example, consider the following text:




                                    I'll have Pizza for lunch today. I bought it this morning. For dessert I plan to eat strawberries or cherries.




                                    The first sentence has a capacity of 6, the second has a capacity of 2, the third has a capacity of 4 ("for dessert" can be put at the beginning or end, and also the order of strawberries and cherries can be flipped without changing the meaning).



                                    The product of the capacities is 6×2×4=48, clearly larger than 27, but the first two sentences only give 12, so the code group consists of those three sentences.



                                    The first sentence has two other possible orders preceding it in lexicographical order (the two variants starting with "for lunch"), so it gets the value 2. The second sentence is first in the list of possible word orders, so it gets the value 0. And the third sentence has only the one with strawberries and cherries switched preceding it lexicographically, thus it gets the value 1.



                                    Thus the value of the code group is 2×2×4 + 0×4 + 1 = 17, which corresponds to a Q.






                                    share|improve this answer










                                    $endgroup$



















                                      10

















                                      $begingroup$

                                      You could hide the message in the word order where grammar allows it (obviously that works the better the more the used language allows to reorder words).



                                      For example, consider the sentence:




                                      Today I'll have pizza for lunch.




                                      You can move "today" and "for lunch" to many different positions:




                                      I'll have pizza today for lunch.

                                      I'll have pizza for lunch today.

                                      For lunch I'll have pizza today.

                                      For lunch today, I'll have pizza.

                                      Today for lunch, I'll have pizza.




                                      So you can encode a digit between 1 and 6 in the word order. Note that the hidden message is independent of the obvious message; the very same digit can be hidden in sentences like




                                      Yesterday I watched Doctor Who after work.

                                      Sometimes I go swimming in the morning.




                                      Now clearly there are sentences with different number of possible word orders. But it should be possible to come up with a code that works with arbitrary sentences (except that sentences with fixed word order won't be able to give any information).



                                      A possible coding strategy could be as follows:



                                      For each sentence, determine the number of possible word orders; let's call it the sentence capacity. For example, the example sentence above would have a capacity of 6. Then take as many sentences that the product of their capacities is at least 27 (enough to encode 26 letters and a space). These sentences give a code group.



                                      Next, assign each sentence of the code group a number by lexicographically ordering the possible sentences and numbering them starting at zero. The example sentence is last in the order, therefore it would get the number 5.



                                      Then, calculate the value of the code group by multiplying the number of each sentence by the capacities of all following sentences, and then adding it all together.



                                      If the resulting value is zero, it is a space, if it is between 1 and 26, it describes a letter, and if it is larger, then the encoder made an error.



                                      For example, consider the following text:




                                      I'll have Pizza for lunch today. I bought it this morning. For dessert I plan to eat strawberries or cherries.




                                      The first sentence has a capacity of 6, the second has a capacity of 2, the third has a capacity of 4 ("for dessert" can be put at the beginning or end, and also the order of strawberries and cherries can be flipped without changing the meaning).



                                      The product of the capacities is 6×2×4=48, clearly larger than 27, but the first two sentences only give 12, so the code group consists of those three sentences.



                                      The first sentence has two other possible orders preceding it in lexicographical order (the two variants starting with "for lunch"), so it gets the value 2. The second sentence is first in the list of possible word orders, so it gets the value 0. And the third sentence has only the one with strawberries and cherries switched preceding it lexicographically, thus it gets the value 1.



                                      Thus the value of the code group is 2×2×4 + 0×4 + 1 = 17, which corresponds to a Q.






                                      share|improve this answer










                                      $endgroup$

















                                        10















                                        10











                                        10







                                        $begingroup$

                                        You could hide the message in the word order where grammar allows it (obviously that works the better the more the used language allows to reorder words).



                                        For example, consider the sentence:




                                        Today I'll have pizza for lunch.




                                        You can move "today" and "for lunch" to many different positions:




                                        I'll have pizza today for lunch.

                                        I'll have pizza for lunch today.

                                        For lunch I'll have pizza today.

                                        For lunch today, I'll have pizza.

                                        Today for lunch, I'll have pizza.




                                        So you can encode a digit between 1 and 6 in the word order. Note that the hidden message is independent of the obvious message; the very same digit can be hidden in sentences like




                                        Yesterday I watched Doctor Who after work.

                                        Sometimes I go swimming in the morning.




                                        Now clearly there are sentences with different number of possible word orders. But it should be possible to come up with a code that works with arbitrary sentences (except that sentences with fixed word order won't be able to give any information).



                                        A possible coding strategy could be as follows:



                                        For each sentence, determine the number of possible word orders; let's call it the sentence capacity. For example, the example sentence above would have a capacity of 6. Then take as many sentences that the product of their capacities is at least 27 (enough to encode 26 letters and a space). These sentences give a code group.



                                        Next, assign each sentence of the code group a number by lexicographically ordering the possible sentences and numbering them starting at zero. The example sentence is last in the order, therefore it would get the number 5.



                                        Then, calculate the value of the code group by multiplying the number of each sentence by the capacities of all following sentences, and then adding it all together.



                                        If the resulting value is zero, it is a space, if it is between 1 and 26, it describes a letter, and if it is larger, then the encoder made an error.



                                        For example, consider the following text:




                                        I'll have Pizza for lunch today. I bought it this morning. For dessert I plan to eat strawberries or cherries.




                                        The first sentence has a capacity of 6, the second has a capacity of 2, the third has a capacity of 4 ("for dessert" can be put at the beginning or end, and also the order of strawberries and cherries can be flipped without changing the meaning).



                                        The product of the capacities is 6×2×4=48, clearly larger than 27, but the first two sentences only give 12, so the code group consists of those three sentences.



                                        The first sentence has two other possible orders preceding it in lexicographical order (the two variants starting with "for lunch"), so it gets the value 2. The second sentence is first in the list of possible word orders, so it gets the value 0. And the third sentence has only the one with strawberries and cherries switched preceding it lexicographically, thus it gets the value 1.



                                        Thus the value of the code group is 2×2×4 + 0×4 + 1 = 17, which corresponds to a Q.






                                        share|improve this answer










                                        $endgroup$



                                        You could hide the message in the word order where grammar allows it (obviously that works the better the more the used language allows to reorder words).



                                        For example, consider the sentence:




                                        Today I'll have pizza for lunch.




                                        You can move "today" and "for lunch" to many different positions:




                                        I'll have pizza today for lunch.

                                        I'll have pizza for lunch today.

                                        For lunch I'll have pizza today.

                                        For lunch today, I'll have pizza.

                                        Today for lunch, I'll have pizza.




                                        So you can encode a digit between 1 and 6 in the word order. Note that the hidden message is independent of the obvious message; the very same digit can be hidden in sentences like




                                        Yesterday I watched Doctor Who after work.

                                        Sometimes I go swimming in the morning.




                                        Now clearly there are sentences with different number of possible word orders. But it should be possible to come up with a code that works with arbitrary sentences (except that sentences with fixed word order won't be able to give any information).



                                        A possible coding strategy could be as follows:



                                        For each sentence, determine the number of possible word orders; let's call it the sentence capacity. For example, the example sentence above would have a capacity of 6. Then take as many sentences that the product of their capacities is at least 27 (enough to encode 26 letters and a space). These sentences give a code group.



                                        Next, assign each sentence of the code group a number by lexicographically ordering the possible sentences and numbering them starting at zero. The example sentence is last in the order, therefore it would get the number 5.



                                        Then, calculate the value of the code group by multiplying the number of each sentence by the capacities of all following sentences, and then adding it all together.



                                        If the resulting value is zero, it is a space, if it is between 1 and 26, it describes a letter, and if it is larger, then the encoder made an error.



                                        For example, consider the following text:




                                        I'll have Pizza for lunch today. I bought it this morning. For dessert I plan to eat strawberries or cherries.




                                        The first sentence has a capacity of 6, the second has a capacity of 2, the third has a capacity of 4 ("for dessert" can be put at the beginning or end, and also the order of strawberries and cherries can be flipped without changing the meaning).



                                        The product of the capacities is 6×2×4=48, clearly larger than 27, but the first two sentences only give 12, so the code group consists of those three sentences.



                                        The first sentence has two other possible orders preceding it in lexicographical order (the two variants starting with "for lunch"), so it gets the value 2. The second sentence is first in the list of possible word orders, so it gets the value 0. And the third sentence has only the one with strawberries and cherries switched preceding it lexicographically, thus it gets the value 1.



                                        Thus the value of the code group is 2×2×4 + 0×4 + 1 = 17, which corresponds to a Q.







                                        share|improve this answer













                                        share|improve this answer




                                        share|improve this answer










                                        answered Sep 11 at 6:46









                                        celtschkceltschk

                                        29.2k12 gold badges79 silver badges145 bronze badges




                                        29.2k12 gold badges79 silver badges145 bronze badges
























                                            6

















                                            $begingroup$

                                            Steganography is the encoding and decoding of information hidden in plain sight within pictures, audio files, whatever. It's much easier if you allow hardware in the mix - for example, encoding a message in the noise below the audible level, or adjusting the frequency of each tone by just enough that the difference can be measured but not heard.



                                            But you want to be able to do this by just singing or talking. For singing, one possible encoding scheme for a very good singer would be vibrato. You could send numerically encoded messages by controlling how many 'beats' of vibrato you use for each phrase.



                                            But you want to be able to use talking as well, and the encoding can't be in the words according to your criteria. So that leaves things like pitch, duration, volume, and non-word sounds like breath intakes, duration between words, 'umm's and 'awws', etc. Skip volume, as it's too hard to determine absolute volume and you probably want it to work for varying levels of background noise,



                                            For more complexity add them together. For example, taking in a breath then saying, 'um, we should go' could mean something completely different than just saying 'we should go', which could be different than sighing then saying the same thing. It's not the 'we should go' that matters, it's the patterns of speech around the words.



                                            So, 'breath intake + um + rising tone at end of sentence' (as in a question) means one thing. 'breath exhale + 2ums in sentence + flat tone' means something else. Make up as many different combinations as you need to encode all the information.



                                            The nice thing about encoding your message in the 'metadata' of talking instead of words or syllable lengths or something tied to specific words is that you could make it work with any text. What matters is not the text itself, but how you say it or sing it. You could read the phone book this way and still get your message across.






                                            share|improve this answer










                                            $endgroup$



















                                              6

















                                              $begingroup$

                                              Steganography is the encoding and decoding of information hidden in plain sight within pictures, audio files, whatever. It's much easier if you allow hardware in the mix - for example, encoding a message in the noise below the audible level, or adjusting the frequency of each tone by just enough that the difference can be measured but not heard.



                                              But you want to be able to do this by just singing or talking. For singing, one possible encoding scheme for a very good singer would be vibrato. You could send numerically encoded messages by controlling how many 'beats' of vibrato you use for each phrase.



                                              But you want to be able to use talking as well, and the encoding can't be in the words according to your criteria. So that leaves things like pitch, duration, volume, and non-word sounds like breath intakes, duration between words, 'umm's and 'awws', etc. Skip volume, as it's too hard to determine absolute volume and you probably want it to work for varying levels of background noise,



                                              For more complexity add them together. For example, taking in a breath then saying, 'um, we should go' could mean something completely different than just saying 'we should go', which could be different than sighing then saying the same thing. It's not the 'we should go' that matters, it's the patterns of speech around the words.



                                              So, 'breath intake + um + rising tone at end of sentence' (as in a question) means one thing. 'breath exhale + 2ums in sentence + flat tone' means something else. Make up as many different combinations as you need to encode all the information.



                                              The nice thing about encoding your message in the 'metadata' of talking instead of words or syllable lengths or something tied to specific words is that you could make it work with any text. What matters is not the text itself, but how you say it or sing it. You could read the phone book this way and still get your message across.






                                              share|improve this answer










                                              $endgroup$

















                                                6















                                                6











                                                6







                                                $begingroup$

                                                Steganography is the encoding and decoding of information hidden in plain sight within pictures, audio files, whatever. It's much easier if you allow hardware in the mix - for example, encoding a message in the noise below the audible level, or adjusting the frequency of each tone by just enough that the difference can be measured but not heard.



                                                But you want to be able to do this by just singing or talking. For singing, one possible encoding scheme for a very good singer would be vibrato. You could send numerically encoded messages by controlling how many 'beats' of vibrato you use for each phrase.



                                                But you want to be able to use talking as well, and the encoding can't be in the words according to your criteria. So that leaves things like pitch, duration, volume, and non-word sounds like breath intakes, duration between words, 'umm's and 'awws', etc. Skip volume, as it's too hard to determine absolute volume and you probably want it to work for varying levels of background noise,



                                                For more complexity add them together. For example, taking in a breath then saying, 'um, we should go' could mean something completely different than just saying 'we should go', which could be different than sighing then saying the same thing. It's not the 'we should go' that matters, it's the patterns of speech around the words.



                                                So, 'breath intake + um + rising tone at end of sentence' (as in a question) means one thing. 'breath exhale + 2ums in sentence + flat tone' means something else. Make up as many different combinations as you need to encode all the information.



                                                The nice thing about encoding your message in the 'metadata' of talking instead of words or syllable lengths or something tied to specific words is that you could make it work with any text. What matters is not the text itself, but how you say it or sing it. You could read the phone book this way and still get your message across.






                                                share|improve this answer










                                                $endgroup$



                                                Steganography is the encoding and decoding of information hidden in plain sight within pictures, audio files, whatever. It's much easier if you allow hardware in the mix - for example, encoding a message in the noise below the audible level, or adjusting the frequency of each tone by just enough that the difference can be measured but not heard.



                                                But you want to be able to do this by just singing or talking. For singing, one possible encoding scheme for a very good singer would be vibrato. You could send numerically encoded messages by controlling how many 'beats' of vibrato you use for each phrase.



                                                But you want to be able to use talking as well, and the encoding can't be in the words according to your criteria. So that leaves things like pitch, duration, volume, and non-word sounds like breath intakes, duration between words, 'umm's and 'awws', etc. Skip volume, as it's too hard to determine absolute volume and you probably want it to work for varying levels of background noise,



                                                For more complexity add them together. For example, taking in a breath then saying, 'um, we should go' could mean something completely different than just saying 'we should go', which could be different than sighing then saying the same thing. It's not the 'we should go' that matters, it's the patterns of speech around the words.



                                                So, 'breath intake + um + rising tone at end of sentence' (as in a question) means one thing. 'breath exhale + 2ums in sentence + flat tone' means something else. Make up as many different combinations as you need to encode all the information.



                                                The nice thing about encoding your message in the 'metadata' of talking instead of words or syllable lengths or something tied to specific words is that you could make it work with any text. What matters is not the text itself, but how you say it or sing it. You could read the phone book this way and still get your message across.







                                                share|improve this answer













                                                share|improve this answer




                                                share|improve this answer










                                                answered Sep 11 at 2:06









                                                Dan HansonDan Hanson

                                                6171 silver badge5 bronze badges




                                                6171 silver badge5 bronze badges
























                                                    5

















                                                    $begingroup$

                                                    Something like Jeremiah Denton? As a POW, in a televised interview he blinked t-o-r-t-u-r-e in morse code.
                                                    https://youtu.be/rufnWLVQcKg






                                                    share|improve this answer










                                                    $endgroup$














                                                    • $begingroup$
                                                      Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
                                                      $endgroup$
                                                      – user6415
                                                      Sep 11 at 12:20















                                                    5

















                                                    $begingroup$

                                                    Something like Jeremiah Denton? As a POW, in a televised interview he blinked t-o-r-t-u-r-e in morse code.
                                                    https://youtu.be/rufnWLVQcKg






                                                    share|improve this answer










                                                    $endgroup$














                                                    • $begingroup$
                                                      Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
                                                      $endgroup$
                                                      – user6415
                                                      Sep 11 at 12:20













                                                    5















                                                    5











                                                    5







                                                    $begingroup$

                                                    Something like Jeremiah Denton? As a POW, in a televised interview he blinked t-o-r-t-u-r-e in morse code.
                                                    https://youtu.be/rufnWLVQcKg






                                                    share|improve this answer










                                                    $endgroup$



                                                    Something like Jeremiah Denton? As a POW, in a televised interview he blinked t-o-r-t-u-r-e in morse code.
                                                    https://youtu.be/rufnWLVQcKg







                                                    share|improve this answer













                                                    share|improve this answer




                                                    share|improve this answer










                                                    answered Sep 11 at 7:06









                                                    InnovineInnovine

                                                    5,17110 silver badges28 bronze badges




                                                    5,17110 silver badges28 bronze badges














                                                    • $begingroup$
                                                      Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
                                                      $endgroup$
                                                      – user6415
                                                      Sep 11 at 12:20
















                                                    • $begingroup$
                                                      Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
                                                      $endgroup$
                                                      – user6415
                                                      Sep 11 at 12:20















                                                    $begingroup$
                                                    Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
                                                    $endgroup$
                                                    – user6415
                                                    Sep 11 at 12:20




                                                    $begingroup$
                                                    Why I downvoted (no offense): While this is a interesting bit of information, it completly ignores all constraints and requirements
                                                    $endgroup$
                                                    – user6415
                                                    Sep 11 at 12:20











                                                    5

















                                                    $begingroup$

                                                    Summary



                                                    You could use amplitude or frequency modulation of the voice to transfer data between 5 WPM (realistic maximum) and 20 WPM (probably superhuman). You could go arbitrarily high with synthetics or cybernetic implants (up to around 5 billion WPM).



                                                    Essentially, it involves slightly (or greatly, if stealth isn't an issue) raising and lowering your volume or pitch with reasonably accurate timing. These changes are interpreted by your partner as binary data, which can encode text or other information using a variety of formats.



                                                    A 5-bit, 32-character ASCII-like code is probably the fastest option, with Morse code being a little slower, but less prone to errors.



                                                    You could also use AM and FM at the same time (this is called quadrature amplitude modulation, or QAM0), but that would be extremely difficult for normal people to pull off, and I haven't discussed it here.



                                                    Amplitude and Frequency Modulation (AM and FM)



                                                    The obvious aspects of the human voice that translate into basic radio transmission schemes are amplitude and frequency. By rapidly changing either property you can encode lower-frequency sound waves in the higher-frequency carrier wave.



                                                    Animation of a simple base signal modulated with amplitude and frequency modulation.
                                                    AM / FM Examples.1



                                                    AM / FM Sound Waves



                                                    Instead of translating to radio waves and back, you can simply alter the amplitude or frequency of sound at very high speeds (about twice the speed of the maximum frequency you're representing). This might be possible for a synthetic, but would likely be impossible for a normal human in real time.



                                                    However, you can always encode the signal in non-real time. I'm not finding any data on the fastest larynx variations humans are capable of, but I'd guess it's no greater than 10 Hz, and probably less than that. Various studies have shown large muscles can twitch in 100 to 300 ms2, which roughly translates to 3 to 10 Hz, anti-respectively.



                                                    One second of a 200 Hz signal would then take 40 or more seconds to encode. It would also likely be quite difficult to encode and decode.



                                                    On-Off and Frequency-Shift Keying



                                                    On-off keying3 is a simple way to modulate binary data using extreme amplitude shifts. The presence of noise represents logic high, while the absence of noise represents logic low. We can extend this by using two different, but present, amplitudes. Or using two different frequencies, known as frequency-shift keying4.



                                                    Image showing binary frequency-shift keying.
                                                    Binary Frequency-Shift Keying.5
                                                    Image showing amplitude-shift keying.
                                                    Binary Amplitude-Shift Keying.6



                                                    Encoding a Message in Binary



                                                    At this point, you have a simple, binary alphabet. You can use this alphabet to encode any kind of data you want. You can have fixed-length words that represent specific letters in a traditional alphabet (ASCII7 or Unicode8), variable-length words representing an intermediate set of symbols(Morse code9), binary data representing sound levels or image information, etc.



                                                    The biggest problem here is just the human factor. The more complex your information, the harder it is to feasibly encode and decode it. At some point, there's a physical limit. Your best case is likely to be something like ASCII, reduced to 5 bits, or 32 characters. At 2.5 Hz, each character takes two seconds.



                                                    Binary-encoded Morse code (BEMC) could also be used, but it takes about 6.2 bits per character (25% more).



                                                    (I wrote a simple C program10 to convert an input string to BEMC. To test normal-ish English, I picked a random Wikipedia article, got the article for "Lake Chub"11, stripped newlines from the text, and used the contents as input for the program. The program processes alpha and numeric characters, along with spaces. Input consisted of 4972 characters, of which 4739 were processed and converted to 29690 bits. On average, this encoding used 6.27 bits per character. Processing only alpha characters (to compare to 5-bit alpha-only encoding), 4696 were processed and converted to 29024 bits, which used 6.18 bits per character.)



                                                    The advantage of BEMC is that every dash and dot has both a high-to-low and a low-to-high transition for every character, so it's relatively easy to keep track of timing.



                                                    Technically, you have to mentally encode twice (once from alphabet to Morse, then again from Morse to binary), but in practice there's little distinction between BEMC and just using ternary (dash, dot, space) Morse code directly -- dashes are long periods of logic high with a short period of logic low, dots are short periods of logic high with a short period of logic low, and spaces are medium periods of logic low.



                                                    The average word length of typical writing is 4.8 characters12. Add 1 character for spaces between words for 5.8 characters per word. An average text message is 7 words long13, or about 40 characters. At 5 bits per character, a text message takes 200 bits, or 80 seconds at 2.5 Hz. 20 seconds at 10 Hz.



                                                    Alternately, this equates to 29 bits per word, 5.2 WPM at 2.5 Hz, or 21 WPM at 10 Hz.



                                                    Using this in Practice



                                                    Obviously, all of this is impractical for normal purposes. There's a perfectly good way to communicate with the human vocal apparatus: speech.



                                                    But if you want to get short messages across, you could do so. The trick is shifting frequencies or amplitudes enough for the other person to consistently decode the data without errors, but not so much other people hear the difference.



                                                    I doubt there's any way to prevent others from realizing your speech or singing is weird, but they wouldn't immediately know what you were doing. Further, they wouldn't necessarily know your exact encoding, though they could easily figure it out from a recording.



                                                    The biggest problem here is likely to be keeping reasonably consistent timing over minute-long durations, but I'd guess it's doable.



                                                    Synthetics or Cybernetic Implants



                                                    A synthetic person, or a person with cybernetic implants, could plausibly use these techniques to reach much higher data transfer rates. The maximum frequency achievable in air is about 5 GHz14, limiting us to about 2.5 Gb/s, which is 86 million words per second, 5 billion WPM, 12 million texts per second, or 81 nanoseconds per text.



                                                    But in all likelihood, you could do much better with non-acoustic data transfer methods if you had access to these levels of electronics.



                                                    References



                                                    0 An Electronic Notes article, What is QAM: quadrature amplitude modulation. https://www.electronics-notes.com/articles/radio/modulation/quadrature-amplitude-modulation-what-is-qam-basics.php
                                                    1 Taken from Wikipedia under the Creative Commons license. https://en.wikipedia.org/wiki/File:Amfm3-en-de.gif
                                                    2 Research study, Fast and slow twitch units in a human muscle from 1971. Found at the Journal of Neurology, Neurosurgery, and Psychiatry website. https://jnnp.bmj.com/content/jnnp/34/2/113.full.pdf
                                                    3 A Wikipedia article, On-Off keying. https://en.wikipedia.org/wiki/On%E2%80%93off_keying
                                                    4 A Wikipedia article, Frequency-shift keying. https://en.wikipedia.org/wiki/Frequency-shift_keying
                                                    5 Taken from Wikipedia under the Creative Commons license. https://commons.wikimedia.org/w/index.php?curid=635074
                                                    6 A modified version of (5), submitted under the original license.
                                                    7 Table of ASCII codes. http://www.asciitable.com/
                                                    8Unicode Consortium's overview of Unicode. https://home.unicode.org/basic-info/overview/
                                                    9 babou's very awesome cs.stackexchange answer to Is Morse Code binary, ternary or quinary?. https://cs.stackexchange.com/a/39922
                                                    10 My C program hosted at the OnlineGDB C compiler. https://onlinegdb.com/BkMqmOvUS
                                                    11 A Wikipedia article, Lake chub. https://en.wikipedia.org/wiki/Lake_chub
                                                    12 A Peter@Norvig.com article, English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU. http://norvig.com/mayzner.html
                                                    13 A Crushh article, K, Wrap It Up Mom. https://crushhapp.com/blog/k-wrap-it-up-mom
                                                    14 Ron Maimon's physics.stackexchange answer to Is there an upper frequency limit to ultrasound?. https://physics.stackexchange.com/a/23427/90152








                                                    share|improve this answer










                                                    $endgroup$














                                                    • $begingroup$
                                                      Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
                                                      $endgroup$
                                                      – user6415
                                                      Sep 12 at 8:59















                                                    5

















                                                    $begingroup$

                                                    Summary



                                                    You could use amplitude or frequency modulation of the voice to transfer data between 5 WPM (realistic maximum) and 20 WPM (probably superhuman). You could go arbitrarily high with synthetics or cybernetic implants (up to around 5 billion WPM).



                                                    Essentially, it involves slightly (or greatly, if stealth isn't an issue) raising and lowering your volume or pitch with reasonably accurate timing. These changes are interpreted by your partner as binary data, which can encode text or other information using a variety of formats.



                                                    A 5-bit, 32-character ASCII-like code is probably the fastest option, with Morse code being a little slower, but less prone to errors.



                                                    You could also use AM and FM at the same time (this is called quadrature amplitude modulation, or QAM0), but that would be extremely difficult for normal people to pull off, and I haven't discussed it here.



                                                    Amplitude and Frequency Modulation (AM and FM)



                                                    The obvious aspects of the human voice that translate into basic radio transmission schemes are amplitude and frequency. By rapidly changing either property you can encode lower-frequency sound waves in the higher-frequency carrier wave.



                                                    Animation of a simple base signal modulated with amplitude and frequency modulation.
                                                    AM / FM Examples.1



                                                    AM / FM Sound Waves



                                                    Instead of translating to radio waves and back, you can simply alter the amplitude or frequency of sound at very high speeds (about twice the speed of the maximum frequency you're representing). This might be possible for a synthetic, but would likely be impossible for a normal human in real time.



                                                    However, you can always encode the signal in non-real time. I'm not finding any data on the fastest larynx variations humans are capable of, but I'd guess it's no greater than 10 Hz, and probably less than that. Various studies have shown large muscles can twitch in 100 to 300 ms2, which roughly translates to 3 to 10 Hz, anti-respectively.



                                                    One second of a 200 Hz signal would then take 40 or more seconds to encode. It would also likely be quite difficult to encode and decode.



                                                    On-Off and Frequency-Shift Keying



                                                    On-off keying3 is a simple way to modulate binary data using extreme amplitude shifts. The presence of noise represents logic high, while the absence of noise represents logic low. We can extend this by using two different, but present, amplitudes. Or using two different frequencies, known as frequency-shift keying4.



                                                    Image showing binary frequency-shift keying.
                                                    Binary Frequency-Shift Keying.5
                                                    Image showing amplitude-shift keying.
                                                    Binary Amplitude-Shift Keying.6



                                                    Encoding a Message in Binary



                                                    At this point, you have a simple, binary alphabet. You can use this alphabet to encode any kind of data you want. You can have fixed-length words that represent specific letters in a traditional alphabet (ASCII7 or Unicode8), variable-length words representing an intermediate set of symbols(Morse code9), binary data representing sound levels or image information, etc.



                                                    The biggest problem here is just the human factor. The more complex your information, the harder it is to feasibly encode and decode it. At some point, there's a physical limit. Your best case is likely to be something like ASCII, reduced to 5 bits, or 32 characters. At 2.5 Hz, each character takes two seconds.



                                                    Binary-encoded Morse code (BEMC) could also be used, but it takes about 6.2 bits per character (25% more).



                                                    (I wrote a simple C program10 to convert an input string to BEMC. To test normal-ish English, I picked a random Wikipedia article, got the article for "Lake Chub"11, stripped newlines from the text, and used the contents as input for the program. The program processes alpha and numeric characters, along with spaces. Input consisted of 4972 characters, of which 4739 were processed and converted to 29690 bits. On average, this encoding used 6.27 bits per character. Processing only alpha characters (to compare to 5-bit alpha-only encoding), 4696 were processed and converted to 29024 bits, which used 6.18 bits per character.)



                                                    The advantage of BEMC is that every dash and dot has both a high-to-low and a low-to-high transition for every character, so it's relatively easy to keep track of timing.



                                                    Technically, you have to mentally encode twice (once from alphabet to Morse, then again from Morse to binary), but in practice there's little distinction between BEMC and just using ternary (dash, dot, space) Morse code directly -- dashes are long periods of logic high with a short period of logic low, dots are short periods of logic high with a short period of logic low, and spaces are medium periods of logic low.



                                                    The average word length of typical writing is 4.8 characters12. Add 1 character for spaces between words for 5.8 characters per word. An average text message is 7 words long13, or about 40 characters. At 5 bits per character, a text message takes 200 bits, or 80 seconds at 2.5 Hz. 20 seconds at 10 Hz.



                                                    Alternately, this equates to 29 bits per word, 5.2 WPM at 2.5 Hz, or 21 WPM at 10 Hz.



                                                    Using this in Practice



                                                    Obviously, all of this is impractical for normal purposes. There's a perfectly good way to communicate with the human vocal apparatus: speech.



                                                    But if you want to get short messages across, you could do so. The trick is shifting frequencies or amplitudes enough for the other person to consistently decode the data without errors, but not so much other people hear the difference.



                                                    I doubt there's any way to prevent others from realizing your speech or singing is weird, but they wouldn't immediately know what you were doing. Further, they wouldn't necessarily know your exact encoding, though they could easily figure it out from a recording.



                                                    The biggest problem here is likely to be keeping reasonably consistent timing over minute-long durations, but I'd guess it's doable.



                                                    Synthetics or Cybernetic Implants



                                                    A synthetic person, or a person with cybernetic implants, could plausibly use these techniques to reach much higher data transfer rates. The maximum frequency achievable in air is about 5 GHz14, limiting us to about 2.5 Gb/s, which is 86 million words per second, 5 billion WPM, 12 million texts per second, or 81 nanoseconds per text.



                                                    But in all likelihood, you could do much better with non-acoustic data transfer methods if you had access to these levels of electronics.



                                                    References



                                                    0 An Electronic Notes article, What is QAM: quadrature amplitude modulation. https://www.electronics-notes.com/articles/radio/modulation/quadrature-amplitude-modulation-what-is-qam-basics.php
                                                    1 Taken from Wikipedia under the Creative Commons license. https://en.wikipedia.org/wiki/File:Amfm3-en-de.gif
                                                    2 Research study, Fast and slow twitch units in a human muscle from 1971. Found at the Journal of Neurology, Neurosurgery, and Psychiatry website. https://jnnp.bmj.com/content/jnnp/34/2/113.full.pdf
                                                    3 A Wikipedia article, On-Off keying. https://en.wikipedia.org/wiki/On%E2%80%93off_keying
                                                    4 A Wikipedia article, Frequency-shift keying. https://en.wikipedia.org/wiki/Frequency-shift_keying
                                                    5 Taken from Wikipedia under the Creative Commons license. https://commons.wikimedia.org/w/index.php?curid=635074
                                                    6 A modified version of (5), submitted under the original license.
                                                    7 Table of ASCII codes. http://www.asciitable.com/
                                                    8Unicode Consortium's overview of Unicode. https://home.unicode.org/basic-info/overview/
                                                    9 babou's very awesome cs.stackexchange answer to Is Morse Code binary, ternary or quinary?. https://cs.stackexchange.com/a/39922
                                                    10 My C program hosted at the OnlineGDB C compiler. https://onlinegdb.com/BkMqmOvUS
                                                    11 A Wikipedia article, Lake chub. https://en.wikipedia.org/wiki/Lake_chub
                                                    12 A Peter@Norvig.com article, English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU. http://norvig.com/mayzner.html
                                                    13 A Crushh article, K, Wrap It Up Mom. https://crushhapp.com/blog/k-wrap-it-up-mom
                                                    14 Ron Maimon's physics.stackexchange answer to Is there an upper frequency limit to ultrasound?. https://physics.stackexchange.com/a/23427/90152








                                                    share|improve this answer










                                                    $endgroup$














                                                    • $begingroup$
                                                      Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
                                                      $endgroup$
                                                      – user6415
                                                      Sep 12 at 8:59













                                                    5















                                                    5











                                                    5







                                                    $begingroup$

                                                    Summary



                                                    You could use amplitude or frequency modulation of the voice to transfer data between 5 WPM (realistic maximum) and 20 WPM (probably superhuman). You could go arbitrarily high with synthetics or cybernetic implants (up to around 5 billion WPM).



                                                    Essentially, it involves slightly (or greatly, if stealth isn't an issue) raising and lowering your volume or pitch with reasonably accurate timing. These changes are interpreted by your partner as binary data, which can encode text or other information using a variety of formats.



                                                    A 5-bit, 32-character ASCII-like code is probably the fastest option, with Morse code being a little slower, but less prone to errors.



                                                    You could also use AM and FM at the same time (this is called quadrature amplitude modulation, or QAM0), but that would be extremely difficult for normal people to pull off, and I haven't discussed it here.



                                                    Amplitude and Frequency Modulation (AM and FM)



                                                    The obvious aspects of the human voice that translate into basic radio transmission schemes are amplitude and frequency. By rapidly changing either property you can encode lower-frequency sound waves in the higher-frequency carrier wave.



                                                    Animation of a simple base signal modulated with amplitude and frequency modulation.
                                                    AM / FM Examples.1



                                                    AM / FM Sound Waves



                                                    Instead of translating to radio waves and back, you can simply alter the amplitude or frequency of sound at very high speeds (about twice the speed of the maximum frequency you're representing). This might be possible for a synthetic, but would likely be impossible for a normal human in real time.



                                                    However, you can always encode the signal in non-real time. I'm not finding any data on the fastest larynx variations humans are capable of, but I'd guess it's no greater than 10 Hz, and probably less than that. Various studies have shown large muscles can twitch in 100 to 300 ms2, which roughly translates to 3 to 10 Hz, anti-respectively.



                                                    One second of a 200 Hz signal would then take 40 or more seconds to encode. It would also likely be quite difficult to encode and decode.



                                                    On-Off and Frequency-Shift Keying



                                                    On-off keying3 is a simple way to modulate binary data using extreme amplitude shifts. The presence of noise represents logic high, while the absence of noise represents logic low. We can extend this by using two different, but present, amplitudes. Or using two different frequencies, known as frequency-shift keying4.



                                                    Image showing binary frequency-shift keying.
                                                    Binary Frequency-Shift Keying.5
                                                    Image showing amplitude-shift keying.
                                                    Binary Amplitude-Shift Keying.6



                                                    Encoding a Message in Binary



                                                    At this point, you have a simple, binary alphabet. You can use this alphabet to encode any kind of data you want. You can have fixed-length words that represent specific letters in a traditional alphabet (ASCII7 or Unicode8), variable-length words representing an intermediate set of symbols(Morse code9), binary data representing sound levels or image information, etc.



                                                    The biggest problem here is just the human factor. The more complex your information, the harder it is to feasibly encode and decode it. At some point, there's a physical limit. Your best case is likely to be something like ASCII, reduced to 5 bits, or 32 characters. At 2.5 Hz, each character takes two seconds.



                                                    Binary-encoded Morse code (BEMC) could also be used, but it takes about 6.2 bits per character (25% more).



                                                    (I wrote a simple C program10 to convert an input string to BEMC. To test normal-ish English, I picked a random Wikipedia article, got the article for "Lake Chub"11, stripped newlines from the text, and used the contents as input for the program. The program processes alpha and numeric characters, along with spaces. Input consisted of 4972 characters, of which 4739 were processed and converted to 29690 bits. On average, this encoding used 6.27 bits per character. Processing only alpha characters (to compare to 5-bit alpha-only encoding), 4696 were processed and converted to 29024 bits, which used 6.18 bits per character.)



                                                    The advantage of BEMC is that every dash and dot has both a high-to-low and a low-to-high transition for every character, so it's relatively easy to keep track of timing.



                                                    Technically, you have to mentally encode twice (once from alphabet to Morse, then again from Morse to binary), but in practice there's little distinction between BEMC and just using ternary (dash, dot, space) Morse code directly -- dashes are long periods of logic high with a short period of logic low, dots are short periods of logic high with a short period of logic low, and spaces are medium periods of logic low.



                                                    The average word length of typical writing is 4.8 characters12. Add 1 character for spaces between words for 5.8 characters per word. An average text message is 7 words long13, or about 40 characters. At 5 bits per character, a text message takes 200 bits, or 80 seconds at 2.5 Hz. 20 seconds at 10 Hz.



                                                    Alternately, this equates to 29 bits per word, 5.2 WPM at 2.5 Hz, or 21 WPM at 10 Hz.



                                                    Using this in Practice



                                                    Obviously, all of this is impractical for normal purposes. There's a perfectly good way to communicate with the human vocal apparatus: speech.



                                                    But if you want to get short messages across, you could do so. The trick is shifting frequencies or amplitudes enough for the other person to consistently decode the data without errors, but not so much other people hear the difference.



                                                    I doubt there's any way to prevent others from realizing your speech or singing is weird, but they wouldn't immediately know what you were doing. Further, they wouldn't necessarily know your exact encoding, though they could easily figure it out from a recording.



                                                    The biggest problem here is likely to be keeping reasonably consistent timing over minute-long durations, but I'd guess it's doable.



                                                    Synthetics or Cybernetic Implants



                                                    A synthetic person, or a person with cybernetic implants, could plausibly use these techniques to reach much higher data transfer rates. The maximum frequency achievable in air is about 5 GHz14, limiting us to about 2.5 Gb/s, which is 86 million words per second, 5 billion WPM, 12 million texts per second, or 81 nanoseconds per text.



                                                    But in all likelihood, you could do much better with non-acoustic data transfer methods if you had access to these levels of electronics.



                                                    References



                                                    0 An Electronic Notes article, What is QAM: quadrature amplitude modulation. https://www.electronics-notes.com/articles/radio/modulation/quadrature-amplitude-modulation-what-is-qam-basics.php
                                                    1 Taken from Wikipedia under the Creative Commons license. https://en.wikipedia.org/wiki/File:Amfm3-en-de.gif
                                                    2 Research study, Fast and slow twitch units in a human muscle from 1971. Found at the Journal of Neurology, Neurosurgery, and Psychiatry website. https://jnnp.bmj.com/content/jnnp/34/2/113.full.pdf
                                                    3 A Wikipedia article, On-Off keying. https://en.wikipedia.org/wiki/On%E2%80%93off_keying
                                                    4 A Wikipedia article, Frequency-shift keying. https://en.wikipedia.org/wiki/Frequency-shift_keying
                                                    5 Taken from Wikipedia under the Creative Commons license. https://commons.wikimedia.org/w/index.php?curid=635074
                                                    6 A modified version of (5), submitted under the original license.
                                                    7 Table of ASCII codes. http://www.asciitable.com/
                                                    8Unicode Consortium's overview of Unicode. https://home.unicode.org/basic-info/overview/
                                                    9 babou's very awesome cs.stackexchange answer to Is Morse Code binary, ternary or quinary?. https://cs.stackexchange.com/a/39922
                                                    10 My C program hosted at the OnlineGDB C compiler. https://onlinegdb.com/BkMqmOvUS
                                                    11 A Wikipedia article, Lake chub. https://en.wikipedia.org/wiki/Lake_chub
                                                    12 A Peter@Norvig.com article, English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU. http://norvig.com/mayzner.html
                                                    13 A Crushh article, K, Wrap It Up Mom. https://crushhapp.com/blog/k-wrap-it-up-mom
                                                    14 Ron Maimon's physics.stackexchange answer to Is there an upper frequency limit to ultrasound?. https://physics.stackexchange.com/a/23427/90152








                                                    share|improve this answer










                                                    $endgroup$



                                                    Summary



                                                    You could use amplitude or frequency modulation of the voice to transfer data between 5 WPM (realistic maximum) and 20 WPM (probably superhuman). You could go arbitrarily high with synthetics or cybernetic implants (up to around 5 billion WPM).



                                                    Essentially, it involves slightly (or greatly, if stealth isn't an issue) raising and lowering your volume or pitch with reasonably accurate timing. These changes are interpreted by your partner as binary data, which can encode text or other information using a variety of formats.



                                                    A 5-bit, 32-character ASCII-like code is probably the fastest option, with Morse code being a little slower, but less prone to errors.



                                                    You could also use AM and FM at the same time (this is called quadrature amplitude modulation, or QAM0), but that would be extremely difficult for normal people to pull off, and I haven't discussed it here.



                                                    Amplitude and Frequency Modulation (AM and FM)



                                                    The obvious aspects of the human voice that translate into basic radio transmission schemes are amplitude and frequency. By rapidly changing either property you can encode lower-frequency sound waves in the higher-frequency carrier wave.



                                                    Animation of a simple base signal modulated with amplitude and frequency modulation.
                                                    AM / FM Examples.1



                                                    AM / FM Sound Waves



                                                    Instead of translating to radio waves and back, you can simply alter the amplitude or frequency of sound at very high speeds (about twice the speed of the maximum frequency you're representing). This might be possible for a synthetic, but would likely be impossible for a normal human in real time.



                                                    However, you can always encode the signal in non-real time. I'm not finding any data on the fastest larynx variations humans are capable of, but I'd guess it's no greater than 10 Hz, and probably less than that. Various studies have shown large muscles can twitch in 100 to 300 ms2, which roughly translates to 3 to 10 Hz, anti-respectively.



                                                    One second of a 200 Hz signal would then take 40 or more seconds to encode. It would also likely be quite difficult to encode and decode.



                                                    On-Off and Frequency-Shift Keying



                                                    On-off keying3 is a simple way to modulate binary data using extreme amplitude shifts. The presence of noise represents logic high, while the absence of noise represents logic low. We can extend this by using two different, but present, amplitudes. Or using two different frequencies, known as frequency-shift keying4.



                                                    Image showing binary frequency-shift keying.
                                                    Binary Frequency-Shift Keying.5
                                                    Image showing amplitude-shift keying.
                                                    Binary Amplitude-Shift Keying.6



                                                    Encoding a Message in Binary



                                                    At this point, you have a simple, binary alphabet. You can use this alphabet to encode any kind of data you want. You can have fixed-length words that represent specific letters in a traditional alphabet (ASCII7 or Unicode8), variable-length words representing an intermediate set of symbols(Morse code9), binary data representing sound levels or image information, etc.



                                                    The biggest problem here is just the human factor. The more complex your information, the harder it is to feasibly encode and decode it. At some point, there's a physical limit. Your best case is likely to be something like ASCII, reduced to 5 bits, or 32 characters. At 2.5 Hz, each character takes two seconds.



                                                    Binary-encoded Morse code (BEMC) could also be used, but it takes about 6.2 bits per character (25% more).



                                                    (I wrote a simple C program10 to convert an input string to BEMC. To test normal-ish English, I picked a random Wikipedia article, got the article for "Lake Chub"11, stripped newlines from the text, and used the contents as input for the program. The program processes alpha and numeric characters, along with spaces. Input consisted of 4972 characters, of which 4739 were processed and converted to 29690 bits. On average, this encoding used 6.27 bits per character. Processing only alpha characters (to compare to 5-bit alpha-only encoding), 4696 were processed and converted to 29024 bits, which used 6.18 bits per character.)



                                                    The advantage of BEMC is that every dash and dot has both a high-to-low and a low-to-high transition for every character, so it's relatively easy to keep track of timing.



                                                    Technically, you have to mentally encode twice (once from alphabet to Morse, then again from Morse to binary), but in practice there's little distinction between BEMC and just using ternary (dash, dot, space) Morse code directly -- dashes are long periods of logic high with a short period of logic low, dots are short periods of logic high with a short period of logic low, and spaces are medium periods of logic low.



                                                    The average word length of typical writing is 4.8 characters12. Add 1 character for spaces between words for 5.8 characters per word. An average text message is 7 words long13, or about 40 characters. At 5 bits per character, a text message takes 200 bits, or 80 seconds at 2.5 Hz. 20 seconds at 10 Hz.



                                                    Alternately, this equates to 29 bits per word, 5.2 WPM at 2.5 Hz, or 21 WPM at 10 Hz.



                                                    Using this in Practice



                                                    Obviously, all of this is impractical for normal purposes. There's a perfectly good way to communicate with the human vocal apparatus: speech.



                                                    But if you want to get short messages across, you could do so. The trick is shifting frequencies or amplitudes enough for the other person to consistently decode the data without errors, but not so much other people hear the difference.



                                                    I doubt there's any way to prevent others from realizing your speech or singing is weird, but they wouldn't immediately know what you were doing. Further, they wouldn't necessarily know your exact encoding, though they could easily figure it out from a recording.



                                                    The biggest problem here is likely to be keeping reasonably consistent timing over minute-long durations, but I'd guess it's doable.



                                                    Synthetics or Cybernetic Implants



                                                    A synthetic person, or a person with cybernetic implants, could plausibly use these techniques to reach much higher data transfer rates. The maximum frequency achievable in air is about 5 GHz14, limiting us to about 2.5 Gb/s, which is 86 million words per second, 5 billion WPM, 12 million texts per second, or 81 nanoseconds per text.



                                                    But in all likelihood, you could do much better with non-acoustic data transfer methods if you had access to these levels of electronics.



                                                    References



                                                    0 An Electronic Notes article, What is QAM: quadrature amplitude modulation. https://www.electronics-notes.com/articles/radio/modulation/quadrature-amplitude-modulation-what-is-qam-basics.php
                                                    1 Taken from Wikipedia under the Creative Commons license. https://en.wikipedia.org/wiki/File:Amfm3-en-de.gif
                                                    2 Research study, Fast and slow twitch units in a human muscle from 1971. Found at the Journal of Neurology, Neurosurgery, and Psychiatry website. https://jnnp.bmj.com/content/jnnp/34/2/113.full.pdf
                                                    3 A Wikipedia article, On-Off keying. https://en.wikipedia.org/wiki/On%E2%80%93off_keying
                                                    4 A Wikipedia article, Frequency-shift keying. https://en.wikipedia.org/wiki/Frequency-shift_keying
                                                    5 Taken from Wikipedia under the Creative Commons license. https://commons.wikimedia.org/w/index.php?curid=635074
                                                    6 A modified version of (5), submitted under the original license.
                                                    7 Table of ASCII codes. http://www.asciitable.com/
                                                    8Unicode Consortium's overview of Unicode. https://home.unicode.org/basic-info/overview/
                                                    9 babou's very awesome cs.stackexchange answer to Is Morse Code binary, ternary or quinary?. https://cs.stackexchange.com/a/39922
                                                    10 My C program hosted at the OnlineGDB C compiler. https://onlinegdb.com/BkMqmOvUS
                                                    11 A Wikipedia article, Lake chub. https://en.wikipedia.org/wiki/Lake_chub
                                                    12 A Peter@Norvig.com article, English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU. http://norvig.com/mayzner.html
                                                    13 A Crushh article, K, Wrap It Up Mom. https://crushhapp.com/blog/k-wrap-it-up-mom
                                                    14 Ron Maimon's physics.stackexchange answer to Is there an upper frequency limit to ultrasound?. https://physics.stackexchange.com/a/23427/90152









                                                    share|improve this answer













                                                    share|improve this answer




                                                    share|improve this answer










                                                    answered Sep 12 at 8:40









                                                    MichaelSMichaelS

                                                    5,87114 silver badges28 bronze badges




                                                    5,87114 silver badges28 bronze badges














                                                    • $begingroup$
                                                      Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
                                                      $endgroup$
                                                      – user6415
                                                      Sep 12 at 8:59
















                                                    • $begingroup$
                                                      Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
                                                      $endgroup$
                                                      – user6415
                                                      Sep 12 at 8:59















                                                    $begingroup$
                                                    Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
                                                    $endgroup$
                                                    – user6415
                                                    Sep 12 at 8:59




                                                    $begingroup$
                                                    Wow thanks michael! Doing such great and thorough work when the proverbial cake (the answer point bonus :) ) is already eaten is something I can admire!
                                                    $endgroup$
                                                    – user6415
                                                    Sep 12 at 8:59











                                                    4

















                                                    $begingroup$

                                                    My first thought was that you can use principles of steganography here. but you've rejected the simpler patterns in the first bullet (there should not be any link to the hidden message within the words of Bobs utterance).



                                                    So, next option. Most people speak from the mouth, not nasally. You can try to have Bob speak regular words nasally. Choose any two type of words here - example monosyllable and disyllable. Based on this, you can now convert these sounds into a morse code for english character.



                                                    EDIT:



                                                    I realise now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.






                                                    share|improve this answer












                                                    $endgroup$














                                                    • $begingroup$
                                                      Sorry, I did not understand how the nasal voice would matter in this context?
                                                      $endgroup$
                                                      – user6415
                                                      Sep 10 at 22:36










                                                    • $begingroup$
                                                      @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
                                                      $endgroup$
                                                      – mu 無
                                                      Sep 10 at 22:43
















                                                    4

















                                                    $begingroup$

                                                    My first thought was that you can use principles of steganography here. but you've rejected the simpler patterns in the first bullet (there should not be any link to the hidden message within the words of Bobs utterance).



                                                    So, next option. Most people speak from the mouth, not nasally. You can try to have Bob speak regular words nasally. Choose any two type of words here - example monosyllable and disyllable. Based on this, you can now convert these sounds into a morse code for english character.



                                                    EDIT:



                                                    I realise now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.






                                                    share|improve this answer












                                                    $endgroup$














                                                    • $begingroup$
                                                      Sorry, I did not understand how the nasal voice would matter in this context?
                                                      $endgroup$
                                                      – user6415
                                                      Sep 10 at 22:36










                                                    • $begingroup$
                                                      @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
                                                      $endgroup$
                                                      – mu 無
                                                      Sep 10 at 22:43














                                                    4















                                                    4











                                                    4







                                                    $begingroup$

                                                    My first thought was that you can use principles of steganography here. but you've rejected the simpler patterns in the first bullet (there should not be any link to the hidden message within the words of Bobs utterance).



                                                    So, next option. Most people speak from the mouth, not nasally. You can try to have Bob speak regular words nasally. Choose any two type of words here - example monosyllable and disyllable. Based on this, you can now convert these sounds into a morse code for english character.



                                                    EDIT:



                                                    I realise now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.






                                                    share|improve this answer












                                                    $endgroup$



                                                    My first thought was that you can use principles of steganography here. but you've rejected the simpler patterns in the first bullet (there should not be any link to the hidden message within the words of Bobs utterance).



                                                    So, next option. Most people speak from the mouth, not nasally. You can try to have Bob speak regular words nasally. Choose any two type of words here - example monosyllable and disyllable. Based on this, you can now convert these sounds into a morse code for english character.



                                                    EDIT:



                                                    I realise now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.







                                                    share|improve this answer















                                                    share|improve this answer




                                                    share|improve this answer








                                                    edited Sep 10 at 22:56

























                                                    answered Sep 10 at 22:31









                                                    mu 無mu 無

                                                    9453 silver badges7 bronze badges




                                                    9453 silver badges7 bronze badges














                                                    • $begingroup$
                                                      Sorry, I did not understand how the nasal voice would matter in this context?
                                                      $endgroup$
                                                      – user6415
                                                      Sep 10 at 22:36










                                                    • $begingroup$
                                                      @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
                                                      $endgroup$
                                                      – mu 無
                                                      Sep 10 at 22:43

















                                                    • $begingroup$
                                                      Sorry, I did not understand how the nasal voice would matter in this context?
                                                      $endgroup$
                                                      – user6415
                                                      Sep 10 at 22:36










                                                    • $begingroup$
                                                      @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
                                                      $endgroup$
                                                      – mu 無
                                                      Sep 10 at 22:43
















                                                    $begingroup$
                                                    Sorry, I did not understand how the nasal voice would matter in this context?
                                                    $endgroup$
                                                    – user6415
                                                    Sep 10 at 22:36




                                                    $begingroup$
                                                    Sorry, I did not understand how the nasal voice would matter in this context?
                                                    $endgroup$
                                                    – user6415
                                                    Sep 10 at 22:36












                                                    $begingroup$
                                                    @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
                                                    $endgroup$
                                                    – mu 無
                                                    Sep 10 at 22:43





                                                    $begingroup$
                                                    @openend I think I had misunderstood your question the first time to mean using an acoustic property in addition to regular speech. SO, I wanted to suggest using both nasal + regular voice for communication, with nasal being used to promote the scheme for morse. BUT after re-reading your question, I understand now that you are only looking for a way to encode the information in a uniform manner, in which case, even regular speech using the right monosyllable and disyllable words will suffice.
                                                    $endgroup$
                                                    – mu 無
                                                    Sep 10 at 22:43












                                                    4

















                                                    $begingroup$

                                                    Many have covered encoding messages into speaking or writing, but I didn't see any covering singing.



                                                    There are many languages like Mandarin or Cantonese that use tonality to change the meaning of a word. Using tonality in a language that doesn't naturally ascribe meaning to it, especially during song, is a great way to hide messages.



                                                    For example, "forward" is a neutral tone, "turn right" is a rising tone, and "turn left" is a falling tone. A dipping tone (neutral, lower, back to neutral) could indicate down, while the opposite of a dipping tone could indicate up. Multiple words sung in a particular tone means "that direction for as many units as there are words. This schema allows encoding direction and magnitude into a song.



                                                    The phrase "when the bird flies high" sung in a rising tone indicates "turn right, then go forward 5 units". There can also be a pre-existing vocabulary for other important things, like mentioning a dog means "guards", describing how beautiful something is means "look out for something with description". For more examples of this, you can read letters sent from WW2 POWs that have hidden messages encoded into rather innocuous statements.



                                                    There could be a physical signal or even another tonal encoding to indicate "these are instructions", such as a certain chord or part of a song, like the bridge leading into the final chorus.




                                                    The time has finally come - neutral, walk 5 feet forward



                                                    for us to strike - rising tone, turn right, walk 4 feet



                                                    our enemies down - dipping tone, 3 feet below where you are




                                                    The combination of key words, tonality, and word count can be used to discretely convey hidden messages in song without tipping off casual listeners that there is a hidden message.






                                                    share|improve this answer










                                                    $endgroup$














                                                    • $begingroup$
                                                      Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 12 at 22:23











                                                    • $begingroup$
                                                      You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
                                                      $endgroup$
                                                      – JRodge01
                                                      Sep 13 at 11:25










                                                    • $begingroup$
                                                      You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:36










                                                    • $begingroup$
                                                      but thanks for answering/my apologies on questioning what I hadn't read thoroughly
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:37















                                                    4

















                                                    $begingroup$

                                                    Many have covered encoding messages into speaking or writing, but I didn't see any covering singing.



                                                    There are many languages like Mandarin or Cantonese that use tonality to change the meaning of a word. Using tonality in a language that doesn't naturally ascribe meaning to it, especially during song, is a great way to hide messages.



                                                    For example, "forward" is a neutral tone, "turn right" is a rising tone, and "turn left" is a falling tone. A dipping tone (neutral, lower, back to neutral) could indicate down, while the opposite of a dipping tone could indicate up. Multiple words sung in a particular tone means "that direction for as many units as there are words. This schema allows encoding direction and magnitude into a song.



                                                    The phrase "when the bird flies high" sung in a rising tone indicates "turn right, then go forward 5 units". There can also be a pre-existing vocabulary for other important things, like mentioning a dog means "guards", describing how beautiful something is means "look out for something with description". For more examples of this, you can read letters sent from WW2 POWs that have hidden messages encoded into rather innocuous statements.



                                                    There could be a physical signal or even another tonal encoding to indicate "these are instructions", such as a certain chord or part of a song, like the bridge leading into the final chorus.




                                                    The time has finally come - neutral, walk 5 feet forward



                                                    for us to strike - rising tone, turn right, walk 4 feet



                                                    our enemies down - dipping tone, 3 feet below where you are




                                                    The combination of key words, tonality, and word count can be used to discretely convey hidden messages in song without tipping off casual listeners that there is a hidden message.






                                                    share|improve this answer










                                                    $endgroup$














                                                    • $begingroup$
                                                      Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 12 at 22:23











                                                    • $begingroup$
                                                      You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
                                                      $endgroup$
                                                      – JRodge01
                                                      Sep 13 at 11:25










                                                    • $begingroup$
                                                      You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:36










                                                    • $begingroup$
                                                      but thanks for answering/my apologies on questioning what I hadn't read thoroughly
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:37













                                                    4















                                                    4











                                                    4







                                                    $begingroup$

                                                    Many have covered encoding messages into speaking or writing, but I didn't see any covering singing.



                                                    There are many languages like Mandarin or Cantonese that use tonality to change the meaning of a word. Using tonality in a language that doesn't naturally ascribe meaning to it, especially during song, is a great way to hide messages.



                                                    For example, "forward" is a neutral tone, "turn right" is a rising tone, and "turn left" is a falling tone. A dipping tone (neutral, lower, back to neutral) could indicate down, while the opposite of a dipping tone could indicate up. Multiple words sung in a particular tone means "that direction for as many units as there are words. This schema allows encoding direction and magnitude into a song.



                                                    The phrase "when the bird flies high" sung in a rising tone indicates "turn right, then go forward 5 units". There can also be a pre-existing vocabulary for other important things, like mentioning a dog means "guards", describing how beautiful something is means "look out for something with description". For more examples of this, you can read letters sent from WW2 POWs that have hidden messages encoded into rather innocuous statements.



                                                    There could be a physical signal or even another tonal encoding to indicate "these are instructions", such as a certain chord or part of a song, like the bridge leading into the final chorus.




                                                    The time has finally come - neutral, walk 5 feet forward



                                                    for us to strike - rising tone, turn right, walk 4 feet



                                                    our enemies down - dipping tone, 3 feet below where you are




                                                    The combination of key words, tonality, and word count can be used to discretely convey hidden messages in song without tipping off casual listeners that there is a hidden message.






                                                    share|improve this answer










                                                    $endgroup$



                                                    Many have covered encoding messages into speaking or writing, but I didn't see any covering singing.



                                                    There are many languages like Mandarin or Cantonese that use tonality to change the meaning of a word. Using tonality in a language that doesn't naturally ascribe meaning to it, especially during song, is a great way to hide messages.



                                                    For example, "forward" is a neutral tone, "turn right" is a rising tone, and "turn left" is a falling tone. A dipping tone (neutral, lower, back to neutral) could indicate down, while the opposite of a dipping tone could indicate up. Multiple words sung in a particular tone means "that direction for as many units as there are words. This schema allows encoding direction and magnitude into a song.



                                                    The phrase "when the bird flies high" sung in a rising tone indicates "turn right, then go forward 5 units". There can also be a pre-existing vocabulary for other important things, like mentioning a dog means "guards", describing how beautiful something is means "look out for something with description". For more examples of this, you can read letters sent from WW2 POWs that have hidden messages encoded into rather innocuous statements.



                                                    There could be a physical signal or even another tonal encoding to indicate "these are instructions", such as a certain chord or part of a song, like the bridge leading into the final chorus.




                                                    The time has finally come - neutral, walk 5 feet forward



                                                    for us to strike - rising tone, turn right, walk 4 feet



                                                    our enemies down - dipping tone, 3 feet below where you are




                                                    The combination of key words, tonality, and word count can be used to discretely convey hidden messages in song without tipping off casual listeners that there is a hidden message.







                                                    share|improve this answer













                                                    share|improve this answer




                                                    share|improve this answer










                                                    answered Sep 11 at 13:24









                                                    JRodge01JRodge01

                                                    1911 bronze badge




                                                    1911 bronze badge














                                                    • $begingroup$
                                                      Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 12 at 22:23











                                                    • $begingroup$
                                                      You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
                                                      $endgroup$
                                                      – JRodge01
                                                      Sep 13 at 11:25










                                                    • $begingroup$
                                                      You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:36










                                                    • $begingroup$
                                                      but thanks for answering/my apologies on questioning what I hadn't read thoroughly
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:37
















                                                    • $begingroup$
                                                      Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 12 at 22:23











                                                    • $begingroup$
                                                      You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
                                                      $endgroup$
                                                      – JRodge01
                                                      Sep 13 at 11:25










                                                    • $begingroup$
                                                      You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:36










                                                    • $begingroup$
                                                      but thanks for answering/my apologies on questioning what I hadn't read thoroughly
                                                      $endgroup$
                                                      – TCooper
                                                      Sep 13 at 16:37















                                                    $begingroup$
                                                    Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
                                                    $endgroup$
                                                    – TCooper
                                                    Sep 12 at 22:23





                                                    $begingroup$
                                                    Cool answer, but what if rising is right, dipping is below, and neutral is forward, how do I turn left or around, tell someone to look up? Or is every left turn sung as three rights, rising tone, in a row with 4 words each? Still not sure on up... Singing was my first thought as well. "Follow the drinking gourd" and other underground railroad references are prevalent here, even if they don't exactly match the requirements (bolded meaning of words clause)
                                                    $endgroup$
                                                    – TCooper
                                                    Sep 12 at 22:23













                                                    $begingroup$
                                                    You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
                                                    $endgroup$
                                                    – JRodge01
                                                    Sep 13 at 11:25




                                                    $begingroup$
                                                    You're asking questions I feel are already addressed by my answer. I already identify how to turn left and how to encode meanings to phrases. If you're proposing a language that doesn't have a "turn left" direction, then you're going to have to combine other existing instructions to create one.
                                                    $endgroup$
                                                    – JRodge01
                                                    Sep 13 at 11:25












                                                    $begingroup$
                                                    You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
                                                    $endgroup$
                                                    – TCooper
                                                    Sep 13 at 16:36




                                                    $begingroup$
                                                    You're right, I skimmed too much and thought I had the full picture/focused too much on the implementation of this case. I would still question the slight difference between a falling and dipping tone - I think it would make this near impossible to use in a real life application for reliable directions. Unless there are very very very clear delineations for the end of a line in the song(cus you can't see them)
                                                    $endgroup$
                                                    – TCooper
                                                    Sep 13 at 16:36












                                                    $begingroup$
                                                    but thanks for answering/my apologies on questioning what I hadn't read thoroughly
                                                    $endgroup$
                                                    – TCooper
                                                    Sep 13 at 16:37




                                                    $begingroup$
                                                    but thanks for answering/my apologies on questioning what I hadn't read thoroughly
                                                    $endgroup$
                                                    – TCooper
                                                    Sep 13 at 16:37











                                                    3

















                                                    $begingroup$

                                                    Morse is what probably comes to mind first, but, as pointed out, it is incredibly low density.



                                                    If you can have the pitch of the voice change by 16 distinguishable levels, you can use it to encode hex codes, which are reasonably denser way of encoding something than Morse.



                                                    Since you probably only need to encode 26 letters + 10 numbers, it gives us 36 hex numbers, which fits neatly into two hex codes:



                                                    Pitch. Hex
                                                    Pitch1 - 0
                                                    Pitch2 - 1
                                                    ...
                                                    Pitch10 - A
                                                    Pitch16 - F


                                                    Letters A...Z - pitch combos from 00-1A



                                                    Numbers 0-9 - pitch combos from 1B-24



                                                    Actually there is so much surplus, that you might wish to either reduce alphabet to 16 values (to use single hex digit for encode) or use a higher digit number encoding, if you can expect to distinguish between 36 voice pitches). For example you could reduce 0 to O, 1 to I, use U for V, I for J and Y, get rid of Q etc. Some of this was used in early typewriters, for example. It all depends on how smart you can assume the intended recipient to be so that he can decode:



                                                    Walk5ftforwardturn9odegreesclockwisewalkanother4ftanddig3ftintoground.



                                                    Additionally you could have pitch ordering different so that lowest and highest pitches encode less used values, to reduce attention to pitch changes.



                                                    So now you have pitch modulated hex encoding, which could naturally be tied to syllables. Every 2 syllables encode one letter of your alphabet (including numbers).



                                                    Message is unimportant as are usually spaces between text.






                                                    share|improve this answer












                                                    $endgroup$



















                                                      3

















                                                      $begingroup$

                                                      Morse is what probably comes to mind first, but, as pointed out, it is incredibly low density.



                                                      If you can have the pitch of the voice change by 16 distinguishable levels, you can use it to encode hex codes, which are reasonably denser way of encoding something than Morse.



                                                      Since you probably only need to encode 26 letters + 10 numbers, it gives us 36 hex numbers, which fits neatly into two hex codes:



                                                      Pitch. Hex
                                                      Pitch1 - 0
                                                      Pitch2 - 1
                                                      ...
                                                      Pitch10 - A
                                                      Pitch16 - F


                                                      Letters A...Z - pitch combos from 00-1A



                                                      Numbers 0-9 - pitch combos from 1B-24



                                                      Actually there is so much surplus, that you might wish to either reduce alphabet to 16 values (to use single hex digit for encode) or use a higher digit number encoding, if you can expect to distinguish between 36 voice pitches). For example you could reduce 0 to O, 1 to I, use U for V, I for J and Y, get rid of Q etc. Some of this was used in early typewriters, for example. It all depends on how smart you can assume the intended recipient to be so that he can decode:



                                                      Walk5ftforwardturn9odegreesclockwisewalkanother4ftanddig3ftintoground.



                                                      Additionally you could have pitch ordering different so that lowest and highest pitches encode less used values, to reduce attention to pitch changes.



                                                      So now you have pitch modulated hex encoding, which could naturally be tied to syllables. Every 2 syllables encode one letter of your alphabet (including numbers).



                                                      Message is unimportant as are usually spaces between text.






                                                      share|improve this answer












                                                      $endgroup$

















                                                        3















                                                        3











                                                        3







                                                        $begingroup$

                                                        Morse is what probably comes to mind first, but, as pointed out, it is incredibly low density.



                                                        If you can have the pitch of the voice change by 16 distinguishable levels, you can use it to encode hex codes, which are reasonably denser way of encoding something than Morse.



                                                        Since you probably only need to encode 26 letters + 10 numbers, it gives us 36 hex numbers, which fits neatly into two hex codes:



                                                        Pitch. Hex
                                                        Pitch1 - 0
                                                        Pitch2 - 1
                                                        ...
                                                        Pitch10 - A
                                                        Pitch16 - F


                                                        Letters A...Z - pitch combos from 00-1A



                                                        Numbers 0-9 - pitch combos from 1B-24



                                                        Actually there is so much surplus, that you might wish to either reduce alphabet to 16 values (to use single hex digit for encode) or use a higher digit number encoding, if you can expect to distinguish between 36 voice pitches). For example you could reduce 0 to O, 1 to I, use U for V, I for J and Y, get rid of Q etc. Some of this was used in early typewriters, for example. It all depends on how smart you can assume the intended recipient to be so that he can decode:



                                                        Walk5ftforwardturn9odegreesclockwisewalkanother4ftanddig3ftintoground.



                                                        Additionally you could have pitch ordering different so that lowest and highest pitches encode less used values, to reduce attention to pitch changes.



                                                        So now you have pitch modulated hex encoding, which could naturally be tied to syllables. Every 2 syllables encode one letter of your alphabet (including numbers).



                                                        Message is unimportant as are usually spaces between text.






                                                        share|improve this answer












                                                        $endgroup$



                                                        Morse is what probably comes to mind first, but, as pointed out, it is incredibly low density.



                                                        If you can have the pitch of the voice change by 16 distinguishable levels, you can use it to encode hex codes, which are reasonably denser way of encoding something than Morse.



                                                        Since you probably only need to encode 26 letters + 10 numbers, it gives us 36 hex numbers, which fits neatly into two hex codes:



                                                        Pitch. Hex
                                                        Pitch1 - 0
                                                        Pitch2 - 1
                                                        ...
                                                        Pitch10 - A
                                                        Pitch16 - F


                                                        Letters A...Z - pitch combos from 00-1A



                                                        Numbers 0-9 - pitch combos from 1B-24



                                                        Actually there is so much surplus, that you might wish to either reduce alphabet to 16 values (to use single hex digit for encode) or use a higher digit number encoding, if you can expect to distinguish between 36 voice pitches). For example you could reduce 0 to O, 1 to I, use U for V, I for J and Y, get rid of Q etc. Some of this was used in early typewriters, for example. It all depends on how smart you can assume the intended recipient to be so that he can decode:



                                                        Walk5ftforwardturn9odegreesclockwisewalkanother4ftanddig3ftintoground.



                                                        Additionally you could have pitch ordering different so that lowest and highest pitches encode less used values, to reduce attention to pitch changes.



                                                        So now you have pitch modulated hex encoding, which could naturally be tied to syllables. Every 2 syllables encode one letter of your alphabet (including numbers).



                                                        Message is unimportant as are usually spaces between text.







                                                        share|improve this answer















                                                        share|improve this answer




                                                        share|improve this answer








                                                        edited Sep 11 at 19:07









                                                        Yakk

                                                        10.8k1 gold badge15 silver badges42 bronze badges




                                                        10.8k1 gold badge15 silver badges42 bronze badges










                                                        answered Sep 11 at 8:58









                                                        GnudiffGnudiff

                                                        9563 silver badges9 bronze badges




                                                        9563 silver badges9 bronze badges
























                                                            2

















                                                            $begingroup$

                                                            Singing: Encoded in nonsense



                                                            There are a bunch of answers that cover spoken word. Many of them while slightly noticeable in normal conversation/speech would be extremely obvious when sung. There's no reason to not have two (or three if you have something else planned for screaming) different means of encoding your message.



                                                            Many songs have nonsense lyrics such as La La La or Na Na Na. For example Police's De Do Do Do, De Da Da Da has the chorus:



                                                            De do do do de da da da Is all I want to say to you 
                                                            De do do do de da da da Their innocence will pull me through
                                                            De do do do de da da da Is all I want to say to you
                                                            De do do do de da da da They're meaningless and all that's true


                                                            Which could easily be altered to encode your message, either using Morse code and varying between DO and DA or with some other scheme. And in fact you'll probably want to try and find some other scheme, because Morse code uses between 1 & 4 bits per letter, and the Police song for example only has 32 bits per chorus (96 with three repeats), meaning you probably only have around 40-50 letters for your message. That said you could probably
                                                            find a song with more nonsense syllables in it (maybe Goldfrapp's Oh La La @ ~110).



                                                            All that said, you will probably want to shorten the message some. Just removing the spaces, gives a 258 character message once converted to Morse code (190 if you discount spaces). You could shorten that considerably by not using proper full English. For example "Go N 5 f W 4 f Dig 3 f" is only 58 characters once converted Morse code which should easily fit in the Police song.



                                                            Remember that brevity is fairly important when encoding a message within another unless you're planning an hour long speech or a full concert. Almost any kind of encoding is going to have a slower transmission speed then the medium using your requirements.






                                                            share|improve this answer










                                                            $endgroup$



















                                                              2

















                                                              $begingroup$

                                                              Singing: Encoded in nonsense



                                                              There are a bunch of answers that cover spoken word. Many of them while slightly noticeable in normal conversation/speech would be extremely obvious when sung. There's no reason to not have two (or three if you have something else planned for screaming) different means of encoding your message.



                                                              Many songs have nonsense lyrics such as La La La or Na Na Na. For example Police's De Do Do Do, De Da Da Da has the chorus:



                                                              De do do do de da da da Is all I want to say to you 
                                                              De do do do de da da da Their innocence will pull me through
                                                              De do do do de da da da Is all I want to say to you
                                                              De do do do de da da da They're meaningless and all that's true


                                                              Which could easily be altered to encode your message, either using Morse code and varying between DO and DA or with some other scheme. And in fact you'll probably want to try and find some other scheme, because Morse code uses between 1 & 4 bits per letter, and the Police song for example only has 32 bits per chorus (96 with three repeats), meaning you probably only have around 40-50 letters for your message. That said you could probably
                                                              find a song with more nonsense syllables in it (maybe Goldfrapp's Oh La La @ ~110).



                                                              All that said, you will probably want to shorten the message some. Just removing the spaces, gives a 258 character message once converted to Morse code (190 if you discount spaces). You could shorten that considerably by not using proper full English. For example "Go N 5 f W 4 f Dig 3 f" is only 58 characters once converted Morse code which should easily fit in the Police song.



                                                              Remember that brevity is fairly important when encoding a message within another unless you're planning an hour long speech or a full concert. Almost any kind of encoding is going to have a slower transmission speed then the medium using your requirements.






                                                              share|improve this answer










                                                              $endgroup$

















                                                                2















                                                                2











                                                                2







                                                                $begingroup$

                                                                Singing: Encoded in nonsense



                                                                There are a bunch of answers that cover spoken word. Many of them while slightly noticeable in normal conversation/speech would be extremely obvious when sung. There's no reason to not have two (or three if you have something else planned for screaming) different means of encoding your message.



                                                                Many songs have nonsense lyrics such as La La La or Na Na Na. For example Police's De Do Do Do, De Da Da Da has the chorus:



                                                                De do do do de da da da Is all I want to say to you 
                                                                De do do do de da da da Their innocence will pull me through
                                                                De do do do de da da da Is all I want to say to you
                                                                De do do do de da da da They're meaningless and all that's true


                                                                Which could easily be altered to encode your message, either using Morse code and varying between DO and DA or with some other scheme. And in fact you'll probably want to try and find some other scheme, because Morse code uses between 1 & 4 bits per letter, and the Police song for example only has 32 bits per chorus (96 with three repeats), meaning you probably only have around 40-50 letters for your message. That said you could probably
                                                                find a song with more nonsense syllables in it (maybe Goldfrapp's Oh La La @ ~110).



                                                                All that said, you will probably want to shorten the message some. Just removing the spaces, gives a 258 character message once converted to Morse code (190 if you discount spaces). You could shorten that considerably by not using proper full English. For example "Go N 5 f W 4 f Dig 3 f" is only 58 characters once converted Morse code which should easily fit in the Police song.



                                                                Remember that brevity is fairly important when encoding a message within another unless you're planning an hour long speech or a full concert. Almost any kind of encoding is going to have a slower transmission speed then the medium using your requirements.






                                                                share|improve this answer










                                                                $endgroup$



                                                                Singing: Encoded in nonsense



                                                                There are a bunch of answers that cover spoken word. Many of them while slightly noticeable in normal conversation/speech would be extremely obvious when sung. There's no reason to not have two (or three if you have something else planned for screaming) different means of encoding your message.



                                                                Many songs have nonsense lyrics such as La La La or Na Na Na. For example Police's De Do Do Do, De Da Da Da has the chorus:



                                                                De do do do de da da da Is all I want to say to you 
                                                                De do do do de da da da Their innocence will pull me through
                                                                De do do do de da da da Is all I want to say to you
                                                                De do do do de da da da They're meaningless and all that's true


                                                                Which could easily be altered to encode your message, either using Morse code and varying between DO and DA or with some other scheme. And in fact you'll probably want to try and find some other scheme, because Morse code uses between 1 & 4 bits per letter, and the Police song for example only has 32 bits per chorus (96 with three repeats), meaning you probably only have around 40-50 letters for your message. That said you could probably
                                                                find a song with more nonsense syllables in it (maybe Goldfrapp's Oh La La @ ~110).



                                                                All that said, you will probably want to shorten the message some. Just removing the spaces, gives a 258 character message once converted to Morse code (190 if you discount spaces). You could shorten that considerably by not using proper full English. For example "Go N 5 f W 4 f Dig 3 f" is only 58 characters once converted Morse code which should easily fit in the Police song.



                                                                Remember that brevity is fairly important when encoding a message within another unless you're planning an hour long speech or a full concert. Almost any kind of encoding is going to have a slower transmission speed then the medium using your requirements.







                                                                share|improve this answer













                                                                share|improve this answer




                                                                share|improve this answer










                                                                answered Sep 11 at 15:03









                                                                aslumaslum

                                                                5,63515 silver badges28 bronze badges




                                                                5,63515 silver badges28 bronze badges
























                                                                    2

















                                                                    $begingroup$

                                                                    Steganography has already been brought up, but not the best method for it. Modern Steganography usually involves encoding binary into the least significant values of a media file which can be extracted by comparing a secret original file to the modified one to extract the differences. This creates such small variances that they are indistinguishable from normal noise, while also being 100% error proof.



                                                                    How this applies to an audio file:



                                                                    An audio file decompresses to a continuous string of values that can be represented by numbers. Let's say for this example that you are using an 8-bit depth. Your first clip might look like [122,143,201,203,198,152,100,84,...] and your second clip looks like [122,144,202,203,199,153,101,84,...]. To the human ear, these two tracks are indistinguishable, but by subtracting one from the other, you get the following binary [0,1,1,0,1,1,1,0] which is the letter "n" in ASCII.



                                                                    Most modern audio files have a 44,100Hz sample rate with a 16-bit depth meaning that you can pack your entire example message into an audio file by adding a 0.15% fluctuation to 0.02 seconds of the audio file. (good luck hearing that!)



                                                                    Normally, this type of steganography is impossible to decode without the source file, meaning you can not use it ad hoc thereby violating your second bullet point, but audio files have the unique quality of often being saved in stereo. It is very common for mono-track audio to be transmitted in stereo where both tracks are just the same track repeated. But in this case, you make one stereo track the source, and the other the encoded message so you can just extract one from the other. This is less secure than using a secret source file in that someone looking for an encoded message may be able to find it; so, you will need to make sure the message is encrypted before you encode it.



                                                                    In this manner you can send a long and detailed message in any audio file (even just some random screaming). As long as the recipient has the right software and decryption key, they can read it.






                                                                    share|improve this answer












                                                                    $endgroup$



















                                                                      2

















                                                                      $begingroup$

                                                                      Steganography has already been brought up, but not the best method for it. Modern Steganography usually involves encoding binary into the least significant values of a media file which can be extracted by comparing a secret original file to the modified one to extract the differences. This creates such small variances that they are indistinguishable from normal noise, while also being 100% error proof.



                                                                      How this applies to an audio file:



                                                                      An audio file decompresses to a continuous string of values that can be represented by numbers. Let's say for this example that you are using an 8-bit depth. Your first clip might look like [122,143,201,203,198,152,100,84,...] and your second clip looks like [122,144,202,203,199,153,101,84,...]. To the human ear, these two tracks are indistinguishable, but by subtracting one from the other, you get the following binary [0,1,1,0,1,1,1,0] which is the letter "n" in ASCII.



                                                                      Most modern audio files have a 44,100Hz sample rate with a 16-bit depth meaning that you can pack your entire example message into an audio file by adding a 0.15% fluctuation to 0.02 seconds of the audio file. (good luck hearing that!)



                                                                      Normally, this type of steganography is impossible to decode without the source file, meaning you can not use it ad hoc thereby violating your second bullet point, but audio files have the unique quality of often being saved in stereo. It is very common for mono-track audio to be transmitted in stereo where both tracks are just the same track repeated. But in this case, you make one stereo track the source, and the other the encoded message so you can just extract one from the other. This is less secure than using a secret source file in that someone looking for an encoded message may be able to find it; so, you will need to make sure the message is encrypted before you encode it.



                                                                      In this manner you can send a long and detailed message in any audio file (even just some random screaming). As long as the recipient has the right software and decryption key, they can read it.






                                                                      share|improve this answer












                                                                      $endgroup$

















                                                                        2















                                                                        2











                                                                        2







                                                                        $begingroup$

                                                                        Steganography has already been brought up, but not the best method for it. Modern Steganography usually involves encoding binary into the least significant values of a media file which can be extracted by comparing a secret original file to the modified one to extract the differences. This creates such small variances that they are indistinguishable from normal noise, while also being 100% error proof.



                                                                        How this applies to an audio file:



                                                                        An audio file decompresses to a continuous string of values that can be represented by numbers. Let's say for this example that you are using an 8-bit depth. Your first clip might look like [122,143,201,203,198,152,100,84,...] and your second clip looks like [122,144,202,203,199,153,101,84,...]. To the human ear, these two tracks are indistinguishable, but by subtracting one from the other, you get the following binary [0,1,1,0,1,1,1,0] which is the letter "n" in ASCII.



                                                                        Most modern audio files have a 44,100Hz sample rate with a 16-bit depth meaning that you can pack your entire example message into an audio file by adding a 0.15% fluctuation to 0.02 seconds of the audio file. (good luck hearing that!)



                                                                        Normally, this type of steganography is impossible to decode without the source file, meaning you can not use it ad hoc thereby violating your second bullet point, but audio files have the unique quality of often being saved in stereo. It is very common for mono-track audio to be transmitted in stereo where both tracks are just the same track repeated. But in this case, you make one stereo track the source, and the other the encoded message so you can just extract one from the other. This is less secure than using a secret source file in that someone looking for an encoded message may be able to find it; so, you will need to make sure the message is encrypted before you encode it.



                                                                        In this manner you can send a long and detailed message in any audio file (even just some random screaming). As long as the recipient has the right software and decryption key, they can read it.






                                                                        share|improve this answer












                                                                        $endgroup$



                                                                        Steganography has already been brought up, but not the best method for it. Modern Steganography usually involves encoding binary into the least significant values of a media file which can be extracted by comparing a secret original file to the modified one to extract the differences. This creates such small variances that they are indistinguishable from normal noise, while also being 100% error proof.



                                                                        How this applies to an audio file:



                                                                        An audio file decompresses to a continuous string of values that can be represented by numbers. Let's say for this example that you are using an 8-bit depth. Your first clip might look like [122,143,201,203,198,152,100,84,...] and your second clip looks like [122,144,202,203,199,153,101,84,...]. To the human ear, these two tracks are indistinguishable, but by subtracting one from the other, you get the following binary [0,1,1,0,1,1,1,0] which is the letter "n" in ASCII.



                                                                        Most modern audio files have a 44,100Hz sample rate with a 16-bit depth meaning that you can pack your entire example message into an audio file by adding a 0.15% fluctuation to 0.02 seconds of the audio file. (good luck hearing that!)



                                                                        Normally, this type of steganography is impossible to decode without the source file, meaning you can not use it ad hoc thereby violating your second bullet point, but audio files have the unique quality of often being saved in stereo. It is very common for mono-track audio to be transmitted in stereo where both tracks are just the same track repeated. But in this case, you make one stereo track the source, and the other the encoded message so you can just extract one from the other. This is less secure than using a secret source file in that someone looking for an encoded message may be able to find it; so, you will need to make sure the message is encrypted before you encode it.



                                                                        In this manner you can send a long and detailed message in any audio file (even just some random screaming). As long as the recipient has the right software and decryption key, they can read it.







                                                                        share|improve this answer















                                                                        share|improve this answer




                                                                        share|improve this answer








                                                                        edited Sep 11 at 17:15

























                                                                        answered Sep 11 at 16:06









                                                                        NosajimikiNosajimiki

                                                                        12.2k1 gold badge17 silver badges54 bronze badges




                                                                        12.2k1 gold badge17 silver badges54 bronze badges
























                                                                            2

















                                                                            $begingroup$

                                                                            I'm not sure exactly what you're trying to rule out with your first bullet. Are you just ruling out simply jumbling the words or adding a bunch of null words, i.e. you don't want someone to be able to see the clear words in the song? Or do you mean that the spoken words cannot be used in any way?



                                                                            Because there are lots of ways to encode a message in a cover text. Sure, if you say "take every other word", someone who is expecting a message of this sort could see suspicious words in the song. I recall a code that was really used to pass a message to a captured spy once (I forget the circumstances, have to look it up) where the rule was, "take the 3rd letter after every punctuation mark". Punctuation wouldn't work well for a verbal message, of course, but you could have a rule like "take every fifth letter" or "take the second letter after every 'e'" or some such.



                                                                            You could apply some formula to the text. Like take the number of letters in each word and run these through some sort of formula. Or one syllable words count as a dot and multi-syllable words as a dash and use Morse code. Etc etc.



                                                                            Use a true code as opposed to a cypher. Make up a list of code words in advance. Like if he's going to sing a love song, say that "love" means "clockwise" and "beautiful" means "counter-clockwise" and "sunset" means "feet" and "dress" means "dig" and so on. Then he translates the message into the code words and fills in words that are not code words to make a coherent song.



                                                                            If you mean that the words cannot be used in any way to convey the message:



                                                                            You say both people can be assumed to be extremely perceptive. So okay, let's say the song covers 2 octaves. That's 16 notes. Any combination of 2 notes has 256 possibilities. Let 26 of those combinations map to the letters of the alphabet, maybe a few more map to other symbols you need, like spaces and essential punctuation. Then translate the message into these notes. A catch to this is that a random collection of notes wouldn't sound like a real song and might be impossible to sing. But if you're only using 30 or so of 256 note-pairs, you could use the other 226 freely as fillers to try to "smooth out" the tune. Or you could say that only every 4th note counts or some such. I haven't tried this and I don't know how practical it would be.



                                                                            Similarly, code the message in the lengths of the notes. Say use binary: A quarter note is 1, a half note is 2, a quarter note is 4, an eigth note is 8, a sixteenth note is 16. A rest marks the end of a group of notes making up a number. Then A=1, B=2, C=3, etc. Again, coding a message like this would make a highly discordant tune, you'd have to be able to add lots of nulls to smooth it out.






                                                                            share|improve this answer










                                                                            $endgroup$



















                                                                              2

















                                                                              $begingroup$

                                                                              I'm not sure exactly what you're trying to rule out with your first bullet. Are you just ruling out simply jumbling the words or adding a bunch of null words, i.e. you don't want someone to be able to see the clear words in the song? Or do you mean that the spoken words cannot be used in any way?



                                                                              Because there are lots of ways to encode a message in a cover text. Sure, if you say "take every other word", someone who is expecting a message of this sort could see suspicious words in the song. I recall a code that was really used to pass a message to a captured spy once (I forget the circumstances, have to look it up) where the rule was, "take the 3rd letter after every punctuation mark". Punctuation wouldn't work well for a verbal message, of course, but you could have a rule like "take every fifth letter" or "take the second letter after every 'e'" or some such.



                                                                              You could apply some formula to the text. Like take the number of letters in each word and run these through some sort of formula. Or one syllable words count as a dot and multi-syllable words as a dash and use Morse code. Etc etc.



                                                                              Use a true code as opposed to a cypher. Make up a list of code words in advance. Like if he's going to sing a love song, say that "love" means "clockwise" and "beautiful" means "counter-clockwise" and "sunset" means "feet" and "dress" means "dig" and so on. Then he translates the message into the code words and fills in words that are not code words to make a coherent song.



                                                                              If you mean that the words cannot be used in any way to convey the message:



                                                                              You say both people can be assumed to be extremely perceptive. So okay, let's say the song covers 2 octaves. That's 16 notes. Any combination of 2 notes has 256 possibilities. Let 26 of those combinations map to the letters of the alphabet, maybe a few more map to other symbols you need, like spaces and essential punctuation. Then translate the message into these notes. A catch to this is that a random collection of notes wouldn't sound like a real song and might be impossible to sing. But if you're only using 30 or so of 256 note-pairs, you could use the other 226 freely as fillers to try to "smooth out" the tune. Or you could say that only every 4th note counts or some such. I haven't tried this and I don't know how practical it would be.



                                                                              Similarly, code the message in the lengths of the notes. Say use binary: A quarter note is 1, a half note is 2, a quarter note is 4, an eigth note is 8, a sixteenth note is 16. A rest marks the end of a group of notes making up a number. Then A=1, B=2, C=3, etc. Again, coding a message like this would make a highly discordant tune, you'd have to be able to add lots of nulls to smooth it out.






                                                                              share|improve this answer










                                                                              $endgroup$

















                                                                                2















                                                                                2











                                                                                2







                                                                                $begingroup$

                                                                                I'm not sure exactly what you're trying to rule out with your first bullet. Are you just ruling out simply jumbling the words or adding a bunch of null words, i.e. you don't want someone to be able to see the clear words in the song? Or do you mean that the spoken words cannot be used in any way?



                                                                                Because there are lots of ways to encode a message in a cover text. Sure, if you say "take every other word", someone who is expecting a message of this sort could see suspicious words in the song. I recall a code that was really used to pass a message to a captured spy once (I forget the circumstances, have to look it up) where the rule was, "take the 3rd letter after every punctuation mark". Punctuation wouldn't work well for a verbal message, of course, but you could have a rule like "take every fifth letter" or "take the second letter after every 'e'" or some such.



                                                                                You could apply some formula to the text. Like take the number of letters in each word and run these through some sort of formula. Or one syllable words count as a dot and multi-syllable words as a dash and use Morse code. Etc etc.



                                                                                Use a true code as opposed to a cypher. Make up a list of code words in advance. Like if he's going to sing a love song, say that "love" means "clockwise" and "beautiful" means "counter-clockwise" and "sunset" means "feet" and "dress" means "dig" and so on. Then he translates the message into the code words and fills in words that are not code words to make a coherent song.



                                                                                If you mean that the words cannot be used in any way to convey the message:



                                                                                You say both people can be assumed to be extremely perceptive. So okay, let's say the song covers 2 octaves. That's 16 notes. Any combination of 2 notes has 256 possibilities. Let 26 of those combinations map to the letters of the alphabet, maybe a few more map to other symbols you need, like spaces and essential punctuation. Then translate the message into these notes. A catch to this is that a random collection of notes wouldn't sound like a real song and might be impossible to sing. But if you're only using 30 or so of 256 note-pairs, you could use the other 226 freely as fillers to try to "smooth out" the tune. Or you could say that only every 4th note counts or some such. I haven't tried this and I don't know how practical it would be.



                                                                                Similarly, code the message in the lengths of the notes. Say use binary: A quarter note is 1, a half note is 2, a quarter note is 4, an eigth note is 8, a sixteenth note is 16. A rest marks the end of a group of notes making up a number. Then A=1, B=2, C=3, etc. Again, coding a message like this would make a highly discordant tune, you'd have to be able to add lots of nulls to smooth it out.






                                                                                share|improve this answer










                                                                                $endgroup$



                                                                                I'm not sure exactly what you're trying to rule out with your first bullet. Are you just ruling out simply jumbling the words or adding a bunch of null words, i.e. you don't want someone to be able to see the clear words in the song? Or do you mean that the spoken words cannot be used in any way?



                                                                                Because there are lots of ways to encode a message in a cover text. Sure, if you say "take every other word", someone who is expecting a message of this sort could see suspicious words in the song. I recall a code that was really used to pass a message to a captured spy once (I forget the circumstances, have to look it up) where the rule was, "take the 3rd letter after every punctuation mark". Punctuation wouldn't work well for a verbal message, of course, but you could have a rule like "take every fifth letter" or "take the second letter after every 'e'" or some such.



                                                                                You could apply some formula to the text. Like take the number of letters in each word and run these through some sort of formula. Or one syllable words count as a dot and multi-syllable words as a dash and use Morse code. Etc etc.



                                                                                Use a true code as opposed to a cypher. Make up a list of code words in advance. Like if he's going to sing a love song, say that "love" means "clockwise" and "beautiful" means "counter-clockwise" and "sunset" means "feet" and "dress" means "dig" and so on. Then he translates the message into the code words and fills in words that are not code words to make a coherent song.



                                                                                If you mean that the words cannot be used in any way to convey the message:



                                                                                You say both people can be assumed to be extremely perceptive. So okay, let's say the song covers 2 octaves. That's 16 notes. Any combination of 2 notes has 256 possibilities. Let 26 of those combinations map to the letters of the alphabet, maybe a few more map to other symbols you need, like spaces and essential punctuation. Then translate the message into these notes. A catch to this is that a random collection of notes wouldn't sound like a real song and might be impossible to sing. But if you're only using 30 or so of 256 note-pairs, you could use the other 226 freely as fillers to try to "smooth out" the tune. Or you could say that only every 4th note counts or some such. I haven't tried this and I don't know how practical it would be.



                                                                                Similarly, code the message in the lengths of the notes. Say use binary: A quarter note is 1, a half note is 2, a quarter note is 4, an eigth note is 8, a sixteenth note is 16. A rest marks the end of a group of notes making up a number. Then A=1, B=2, C=3, etc. Again, coding a message like this would make a highly discordant tune, you'd have to be able to add lots of nulls to smooth it out.







                                                                                share|improve this answer













                                                                                share|improve this answer




                                                                                share|improve this answer










                                                                                answered Sep 11 at 21:04









                                                                                JayJay

                                                                                11.6k1 gold badge22 silver badges37 bronze badges




                                                                                11.6k1 gold badge22 silver badges37 bronze badges
























                                                                                    1

















                                                                                    $begingroup$

                                                                                    Take any well known tune ("Ode to Joy" came to mind to me for its simplicity).
                                                                                    Sing the tune with nonsense words - or perhaps alternate incorrect instructions - it is not the words that matter.



                                                                                    Treat the notes as binary - A 0 will be "in tune" and a 1 will be "off key".



                                                                                    Has the prerequisite that the encoder has reasonable singing ability (otherwise the decryption may come out all 1s).



                                                                                    An unfortunate side effect is that anyone familiar with the decryption method may read a lot of unintended nonsense from passing a karaoke bar






                                                                                    share|improve this answer










                                                                                    $endgroup$



















                                                                                      1

















                                                                                      $begingroup$

                                                                                      Take any well known tune ("Ode to Joy" came to mind to me for its simplicity).
                                                                                      Sing the tune with nonsense words - or perhaps alternate incorrect instructions - it is not the words that matter.



                                                                                      Treat the notes as binary - A 0 will be "in tune" and a 1 will be "off key".



                                                                                      Has the prerequisite that the encoder has reasonable singing ability (otherwise the decryption may come out all 1s).



                                                                                      An unfortunate side effect is that anyone familiar with the decryption method may read a lot of unintended nonsense from passing a karaoke bar






                                                                                      share|improve this answer










                                                                                      $endgroup$

















                                                                                        1















                                                                                        1











                                                                                        1







                                                                                        $begingroup$

                                                                                        Take any well known tune ("Ode to Joy" came to mind to me for its simplicity).
                                                                                        Sing the tune with nonsense words - or perhaps alternate incorrect instructions - it is not the words that matter.



                                                                                        Treat the notes as binary - A 0 will be "in tune" and a 1 will be "off key".



                                                                                        Has the prerequisite that the encoder has reasonable singing ability (otherwise the decryption may come out all 1s).



                                                                                        An unfortunate side effect is that anyone familiar with the decryption method may read a lot of unintended nonsense from passing a karaoke bar






                                                                                        share|improve this answer










                                                                                        $endgroup$



                                                                                        Take any well known tune ("Ode to Joy" came to mind to me for its simplicity).
                                                                                        Sing the tune with nonsense words - or perhaps alternate incorrect instructions - it is not the words that matter.



                                                                                        Treat the notes as binary - A 0 will be "in tune" and a 1 will be "off key".



                                                                                        Has the prerequisite that the encoder has reasonable singing ability (otherwise the decryption may come out all 1s).



                                                                                        An unfortunate side effect is that anyone familiar with the decryption method may read a lot of unintended nonsense from passing a karaoke bar







                                                                                        share|improve this answer













                                                                                        share|improve this answer




                                                                                        share|improve this answer










                                                                                        answered Sep 12 at 10:19









                                                                                        Collett89Collett89

                                                                                        1113 bronze badges




                                                                                        1113 bronze badges































                                                                                            draft saved

                                                                                            draft discarded















































                                                                                            Thanks for contributing an answer to Worldbuilding Stack Exchange!


                                                                                            • Please be sure to answer the question. Provide details and share your research!

                                                                                            But avoid


                                                                                            • Asking for help, clarification, or responding to other answers.

                                                                                            • Making statements based on opinion; back them up with references or personal experience.

                                                                                            Use MathJax to format equations. MathJax reference.


                                                                                            To learn more, see our tips on writing great answers.




                                                                                            draft saved


                                                                                            draft discarded














                                                                                            StackExchange.ready(
                                                                                            function ()
                                                                                            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fworldbuilding.stackexchange.com%2fquestions%2f154862%2fis-there-a-reliable-way-to-hide-convey-a-message-in-vocal-expressions-speech-s%23new-answer', 'question_page');

                                                                                            );

                                                                                            Post as a guest















                                                                                            Required, but never shown





















































                                                                                            Required, but never shown














                                                                                            Required, but never shown












                                                                                            Required, but never shown







                                                                                            Required, but never shown

































                                                                                            Required, but never shown














                                                                                            Required, but never shown












                                                                                            Required, but never shown







                                                                                            Required, but never shown









                                                                                            Popular posts from this blog

                                                                                            Tamil (spriik) Luke uk diar | Nawigatjuun

                                                                                            Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                                                                                            Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?