I'm looking to make a program to encrypt a string using a vigenere cipher. So far, I have been successful in doing this, apart from special characters (e.g. spaces, full stops, commas, etc).
I have come to this solution, which includes the correct special characters. However, everything after the first special character in the string becomes gibberish. They are not special characters, they are still in the alphabet, although they don't match with the cipher. I cannot work out why this happening. I've tried several totally different methods, and all of them lead to this same error. This is the neatest method I've come up with so far, but it still doesn't work (for this example you can assume that the text and the key are the same length).
for (i=0, l=[], k=[], output=""; i < text.length; i++) {
l[i] = (text.charCodeAt(i)) - 97;
k[i] = (key.charCodeAt(i)) - 97;
if ((l[i] > -1) && (l[i] < 26)) { // if the ASCII code is between 0 and 25
ans = parseInt(encryptLetter(l[i], k[i]));
output += String.fromCharCode(97 + ans);
};
if ((l[i] < 0) || (l[i] > 25)) { // if the ASCII code is not between 0 and 25
output += String.fromCharCode(97 + l[i])
};
};
function encryptLetter(l, k) {
en = l + k;
if (en > 25) { // if encrypted letter is greater than 26.
en -= 26;
}
return en;
}
If you need, you can test the encryption here. Any help would be greatly appreciated.
Edit:
I have noticed that every four special characters, there is a block of regular characters that is correct to the cipher. I have no idea why. It completely baffles me.
For anyone wondering, I fixed it. I noticed that after every special character, they key would shift back one letter. For example, if the key was apples, then after the first special character, the key would become pplesa. After the second special character, the key would become plesap. To counter this, I just added p -= 1; at the end of the if statement for special characters. This fixed the problem. Thank you to everyone who helped.
Kind regards,
You can check the character type using ASCII value. Since you specified any non-lowercase letter, you can mark a character special if its ASCII value is not in the range 97-122.
You can store the special characters from your original string in some sort of hashmap. You can make the character the key, and the value being a linked list. The linked list can store the indices of the characters, so you know where they were in the original string.
Related
I am trying to create a button that displays the text from the "TextCollector" input as numbers seperated by commas and ignores any symbol that is not in the alphabet. Update: I also need it to ignore the fact that a letter is capitalized.
Example:
a = 1
b = 2
c = 3
and so on...
So if I typed in "cat's" in the input at the bottom would display "3,1,20,19".
Here's what I've tried so far:
<form action="">
<input type="text" id="TextCollector" name="TextCollector" placeholder="Type in something">
<br>
<input type="button" value="Submit" onclick="ShowMe()">
</form>
<h1 id="numbers"></h1>
<script>
function ShowMe() {
var text = document.getElementById("TextCollector").value;
var textnum = text.charCodeAt(0) - 97;
document.getElementById("numbers").innerHTML = textnum;
}
</script>
But the code I tried halfly works, it just displays the first letter as a number and ignores the rest. Also with my code "0" is "a", but I need "1" to be "a".
Can someone help me? I hope I made myself clear...
First .replace all non-alphabetical characters with the empty string, then you can turn the resulting string into an array, .map each character to its character code, and join it by commas:
function ShowMe() {
const replaced = document.getElementById("TextCollector").value.replace(/[^a-z]/gi, '').toLowerCase();
document.getElementById("numbers").textContent = [...replaced]
.map(char => char.charCodeAt(0) - 96)
.join(', ');
}
<form action="">
<input type="text" id="TextCollector" name="TextCollector" placeholder="Type in something">
<br>
<input type="button" value="Submit" onclick="ShowMe()">
</form>
<h1 id="numbers"></h1>
Allow me to introduce to you a few concepts which I highly recommend you learn instead of copy-pasting any code given to you. I'm not going to go into great detail because, honestly, all you have to know is how to ask the correct question on the google search bar to get your answer. I'll also talk about how you can develop a strategy to solving problems such as this one at the end of the post.
Loops
You use a loop in programming when you want to repeat a set of instructions multiple times. There are multiple ways to write a loop, the two most popular ways of writing a loop are: the for loop, and the while loop. Other less popular methods include do loop, recursion, etc.
Types
Javascript is weakly typed, which makes for a lot of weird and unexpected behavior i you try to add a bool value with an integer. Examples of primitive types are: Integer, Boolean, Char, String, etc. Numbers can be represented in multiple ways: integer, double, float. Don't worry too much about the differences between each of these, but if you need to use decimals and negative values, use floats. Boolean is either TRUE or FALSE. Char (short for Character) is a map between a number and a letter. (Read this document if you care to learn why your code outputs a "0" instead of a "1" when you run this line:
text.charCodeAt(0) - 97;
Operators
You should know what +, -, *, /, do grade school math class. You should also know > (greater than), < (less than), >= (greater than or equal), <= (less than or equal), == (equals), != (not equal), && (intersection, also known as AND), || (union, also known as OR). There are also some quality of life operators such as: ++ (increment value by 1), -- (decrement value by 1), += (increase target value by given amount), -= (decrease target value by given amount) etc etc....
THE ANSWER
Tie all this knowledge together, and we arrive at a solution.
string text = document.getElementById("TextCollector").value;
var i;
var textnum;
//create empty string variable to append char values to
var myString = '';
for (i = 0; i < text.length; i++)
{
//Convert char to number
textnum = text.charCodeAt(i) - 96;
//concatenate textnum to string, and add a comma at the end
myString = myString + textnum + ",";
}
//remove the last unwanted comma by applying "substr" method to the string
myString = myString.substr(0,myString.length - 1);
document.getElementById("numbers").innerHTML = textnum;
I encourage you to ask questions you do not understand from the solution above.
**Edit: The strategy to solving any problem is to break it down into sub-problems. Your goal is turn a bunch of characters from a string into numbers. Ask yourself these questions:
How can I turn characters into numbers?
Well, from the looks of your code it looks like you already knew how. Next question
How can I turn all characters into numbers, not just the starting character?
Ok, take a closer look to the method you applied to your "text" value
charCodeAt(0)
That zero represents the index of a string. A string is an array of char, and if you understand how arrays work, it should be no surprise why it only returns the first character.
Ok so, how can I apply this charCodeAt() method to ALL my characters?
This is a little tricky because if you don't know the concept of loops in programming (or recursion), you are not adequately equipped to solve these problems. There are many free online resources for learning basic programming concepts such as loops. I recommend this site here: https://www.w3schools.com/
Ok I can turn multiple characters to numbers. How do I glue them together into a single string?
This is something you can google. That's what I did. Hint: how to add chars to the end of a string
How do I get rid of the last comma in my string?
Google: How do I remove last char in a string?
Sources:
https://en.wikipedia.org/wiki/Control_flow#Loops
https://en.wikipedia.org/wiki/Primitive_data_type
https://www.w3schools.com/js/js_loop_for.asp
https://www.w3schools.com/js/js_string_methods.asp
https://codehandbook.org/remove-character-from-string/
I have to find blank spaces in a string, this includes enter, tabs and spaces using Javascript. I have this code to find spaces
function countThis() {
var string = document.getElementById("textt").value;
var spaceCount = (string.split(" ").length - 1);
document.getElementById("countRedundants").value = spaceCount;
}
This works fine, and gives me the total number of spaces.
The problem is, i want it to only count once, if the space/enter/tab is next to each other. I cant solve this and would appreciate some help or point in the right direction.
Thanks, Gustav
Tou can use regular expressions in your split:
var spaceCount = (string.split(/\s+/gi).length - 1);
Use regex in order to achieve this.
For instance, you could check how many matches of one or more tabs, spaces or newlines exist, and use their count.
The regex rule is : [\t\s\n]+ - meaning that one or more chuncks of tabs, spaces or newlines match the rule.
For JavaScript:
var test = "Test Test Test\nTest\nTest\n\n";
var spacesCount = test.split(/[\t\s\n]+/g).length - 1;
console.log(spacesCount);
Regex is a pretty efficient way of doing this. Alternatively, you would have to manually iterate via the object, and attempt to match the cases where one or multiple spaces, tabs, or newlines exist.
Consider that, what you are attempting to do, is used inside a compiler in order to recognize specific character sequences as specific elements, called tokens. This practice is called Lexical Analysis, or tokenization. Since regex exists, there is no need to perform this check manually, except if you want to do something very advanced or specific.
Here is an ugly solution without using any regex, performance wise it's optimal, but it could be made more pythonic.
def countThis(s):
count = 0
i = 0
while i < len(s):
while i < len(s) and not s[i].isspace():
i += 1
if i < len(s):
count += 1
i += 1
while i < len(s) and s[i].isspace():
i += 1
return count
print(countThis("str"))
print(countThis(" str toto"))
print(countThis("Hello, world!"))
Stéphane Ammar's solution is probably the easiest on the eyes, but if you want something more performant:
function countGaps(str) {
let gaps = 0;
const isWhitespace = ch => ' \t\n\r\v'.indexOf(ch) > -1;
for (let i = 0; i < str.length; i++)
if (isWhitespace(str[i]) && !isWhitespace(str[i - 1]))
++gaps;
return gaps;
}
I'd like to remove all invalid UTF-8 characters from a string in JavaScript. I've tried with this JavaScript:
strTest = strTest.replace(/([\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})|./g, "$1");
It seems that the UTF-8 validation regex described here (link removed) is more complete and I adapted it in the same way like:
strTest = strTest.replace(/([\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x90-\xBF][\x80-\xBF]{2}|[\xF1-\xF3][\x80-\xBF]{3}|\xF4[\x80-\x8F][\x80-\xBF]{2})|./g, "$1");
Both of these pieces of code seem to be allowing valid UTF-8 through, but aren't filtering out hardly any of the bad UTF-8 characters from my test data: UTF-8 decoder capability and stress test. Either the bad characters come through unchanged or seem to have some of their bytes removed creating a new, invalid character.
I'm not very familiar with the UTF-8 standard or with multibyte in JavaScript so I'm not sure if I'm failing to represent proper UTF-8 in the regex or if I'm applying that regex improperly in JavaScript.
Edit: added global flag to my regex per Tomalak's comment - however this still isn't working for me. I'm abandoning doing this on the client side per bobince's comment.
I use this simple and sturdy approach:
function cleanString(input) {
var output = "";
for (var i=0; i<input.length; i++) {
if (input.charCodeAt(i) <= 127) {
output += input.charAt(i);
}
}
return output;
}
Basically all you really want are the ASCII chars 0-127 so just rebuild the string char by char. If it's a good char, keep it - if not, ditch it. Pretty robust and if if sanitation is your goal, it's fast enough (in fact it's really fast).
JavaScript strings are natively Unicode. They hold character sequences* not byte sequences, so it is impossible for one to contain an invalid byte sequence.
(Technically, they actually contain UTF-16 code unit sequences, which is not quite the same thing, but this probably isn't anything you need to worry about right now.)
You can, if you need to for some reason, create a string holding characters used as placeholders for bytes. ie. using the character U+0080 ('\x80') to stand for the byte 0x80. This is what you would get if you encoded characters to bytes using UTF-8, then decoded them back to characters using ISO-8859-1 by mistake. There is a special JavaScript idiom for this:
var bytelike= unescape(encodeURIComponent(characters));
and to get back from UTF-8 pseudobytes to characters again:
var characters= decodeURIComponent(escape(bytelike));
(This is, notably, pretty much the only time the escape/unescape functions should ever be used. Their existence in any other program is almost always a bug.)
decodeURIComponent(escape(bytes)), since it behaves like a UTF-8 decoder, will raise an error if the sequence of code units fed into it would not be acceptable as UTF-8 bytes.
It is very rare for you to need to work on byte strings like this in JavaScript. Better to keep working natively in Unicode on the client side. The browser will take care of UTF-8-encoding the string on the wire (in a form submission or XMLHttpRequest).
Languages like spanish and french have accented characters like "é" and codes are in the range 160-255 see https://www.ascii.cl/htmlcodes.htm
function cleanString(input) {
var output = "";
for (var i=0; i<input.length; i++) {
if (input.charCodeAt(i) <= 127 || input.charCodeAt(i) >= 160 && input.charCodeAt(i) <= 255) {
output += input.charAt(i);
}
}
return output;
}
Simple mistake, big effect:
strTest = strTest.replace(/your regex here/g, "$1");
// ----------------------------------------^
without the "global" flag, the replace occurs for the first match only.
Side note: To remove any character that does not fulfill some kind of complex condition, like falling into a set of certain Unicode character ranges, you can use negative lookahead:
var re = /(?![\x00-\x7F]|[\xC0-\xDF][\x80-\xBF]|[\xE0-\xEF][\x80-\xBF]{2}|[\xF0-\xF7][\x80-\xBF]{3})./g;
strTest = strTest.replace(re, "")
where re reads as
(?! # negative look-ahead: a position *not followed by*:
[…] # any allowed character range from above
) # end lookahead
. # match this character (only if previous condition is met!)
If you're trying to remove the "invalid character" - � - from javascript strings then you can get rid of them like this:
myString = myString.replace(/\uFFFD/g, '')
I ran into this problem with a really weird result from the Date Taken data of a digital image. My scenario is admittedly unique - using windows scripting host (wsh) and the Shell.Application activex object which allows for getting the namespace object of a folder and calling the GetDetailsOf function to essentially return exif data after it has been parsed by the OS.
var app = new ActiveXObject("Shell.Application");
var info = app.Namespace("c:\");
var date = info.GetDetailsOf(info.ParseName("testimg.jpg"), 12);
In windws vista and 7, the result looked like this:
?8/?27/?2011 ??11:45 PM
So my approach was as follows:
var chars = date.split(''); //split into characters
var clean = "";
for (var i = 0; i < chars.length; i++) {
if (chars[i].charCodeAt(0) < 255) clean += chars[i];
}
The result of course is a string that excludes those question mark characters.
I know you went with a different solution altogether, but I thought I'd post my solution in case anyone else is having troubles with this and cannot use a server side language approach.
I used #Ali's solution to not only clean my string, but replace the invalid chars with html replacement:
cleanString(input) {
var output = "";
for (var i = 0; i < input.length; i++) {
if (input.charCodeAt(i) <= 127) {
output += input.charAt(i);
} else {
output += "&#" + input.charCodeAt(i) + ";";
}
}
return output;
}
I have put together some solutions proposed above to be error-safe
var removeNonUtf8 = (characters) => {
try {
// ignore invalid char ranges
var bytelike = unescape(encodeURIComponent(characters));
characters = decodeURIComponent(escape(bytelike));
} catch (error) { }
// remove �
characters = characters.replace(/\uFFFD/g, '');
return characters;
},
I'm trying to set up a field to prepopulate with a unique set of characters, so that i can automatically generate test accounts. Because of the way the system is set up, the name field must be unique, and must not include numerical characters.
I put together this selenium code, and it works 99% of the way, but leaves extra garbage characters at the end of the good code.
javascript{stringtime='';
nowtime=new Date().getTime().toString();
for ( var i in nowtime )
{ stringtime+=String.fromCharCode(parseInt(nowtime[i])+65 ); };
'test' + stringtime + '\0'}
Result:
testBCEBBJCBFBBAI + a bunch of characters that won't copy into here. They look like 4 zeros in a box.
Thanks in advance for the help.
Excluding the '\0' character at the end, which shows up at a ?, and within Selenium, I think it's javascript engine is having trouble processing the for(var i in nowtime).
Try it like this:
javascript{
stringtime= '';
nowtime=new Date().getTime().toString();
for(var i = 0; i < nowtime.length; i++){
stringtime += String.fromCharCode(parseInt(nowtime[i])+65);
}
stringtime;
}
Those characters are ones that are outside the standard ASCII that your font can't reproduce. Those numbers signify which character it is. If its 4 zeros, its that \0 char you are putting on at the end. I don't know the language, but it doesn't look like you need that.
Also your random number generator is a bit flawed. Have a look here:
http://www.mediacollege.com/internet/javascript/number/random.html
To be more precise, I need to know whether (and if possible, how) I can find whether a given string has double byte characters or not. Basically, I need to open a pop-up to display a given text which can contain double byte characters, like Chinese or Japanese. In this case, we need to adjust the window size than it would be for English or ASCII.
Anyone has a clue?
I used mikesamuel answer on this one. However I noticed perhaps because of this form that there should only be one escape slash before the u, e.g. \u and not \\u to make this work correctly.
function containsNonLatinCodepoints(s) {
return /[^\u0000-\u00ff]/.test(s);
}
Works for me :)
JavaScript holds text internally as UCS-2, which can encode a fairly extensive subset of Unicode.
But that's not really germane to your question. One solution might be to loop through the string and examine the character codes at each position:
function isDoubleByte(str) {
for (var i = 0, n = str.length; i < n; i++) {
if (str.charCodeAt( i ) > 255) { return true; }
}
return false;
}
This might not be as fast as you would like.
I have benchmarked the two functions in the top answers and thought I would share the results. Here is the test code I used:
const text1 = `The Chinese Wikipedia was established along with 12 other Wikipedias in May 2001. 中文維基百科的副標題是「海納百川,有容乃大」,這是中国的清朝政治家林则徐(1785年-1850年)於1839年為`;
const regex = /[^\u0000-\u00ff]/; // Small performance gain from pre-compiling the regex
function containsNonLatinCodepoints(s) {
return regex.test(s);
}
function isDoubleByte(str) {
for (var i = 0, n = str.length; i < n; i++) {
if (str.charCodeAt( i ) > 255) { return true; }
}
return false;
}
function benchmark(fn, str) {
let startTime = new Date();
for (let i = 0; i < 10000000; i++) {
fn(str);
}
let endTime = new Date();
return endTime.getTime() - startTime.getTime();
}
console.info('isDoubleByte => ' + benchmark(isDoubleByte, text1));
console.info('containsNonLatinCodepoints => ' + benchmark(containsNonLatinCodepoints, text1));
When running this I got:
isDoubleByte => 2421
containsNonLatinCodepoints => 868
So for this particular string the regex solution is about 3 times faster.
However note that for a string where the first character is unicode, isDoubleByte() returns right away and so is much faster than the regex (which still has the overhead of the regular expression).
For instance for the string 中国, I got these results:
isDoubleByte => 51
containsNonLatinCodepoints => 288
To get the best of both world, it's probably better to combine both:
var regex = /[^\u0000-\u00ff]/; // Small performance gain from pre-compiling the regex
function containsDoubleByte(str) {
if (!str.length) return false;
if (str.charCodeAt(0) > 255) return true;
return regex.test(str);
}
In that case, if the first character is Chinese (which is likely if the whole text is Chinese), the function will be fast and return right away. If not, it will run the regex, which is still faster than checking each character individually.
Actually, all of the characters are Unicode, at least from the Javascript engine's perspective.
Unfortunately, the mere presence of characters in a particular Unicode range won't be enough to determine you need more space. There are a number of characters which take up roughly the same amount of space as other characters which have Unicode codepoints well above the ASCII range. Typographic quotes, characters with diacritics, certain punctuation symbols, and various currency symbols are outside of the low ASCII range and are allocated in quite disparate places on the Unicode basic multilingual plane.
Generally, projects that I've worked on elect to provide extra space for all languages, or sometimes use javascript to determine whether a window with auto-scrollbar css attributes actually has content with a height which would trigger a scrollbar or not.
If detecting the presence of, or count of, CJK characters will be adequate to determine you need a bit of extra space, you could construct a regex using the following ranges:
[\u3300-\u9fff\uf900-\ufaff], and use that to extract a count of the number of characters that match. (This is a little excessively coarse, and misses all the non-BMP cases, probably excludes some other relevant ranges, and most likely includes some irrelevant characters, but it's a starting point).
Again, you're only going to be able to manage a rough heuristic without something along the lines of a full text rendering engine, because what you really want is something like GDI's MeasureString (or any other text rendering engine's equivalent). It's been a while since I've done so, but I think the closest HTML/DOM equivalent is setting a width on a div and requesting the height (cut and paste reuse, so apologies if this contains errors):
o = document.getElementById("test");
document.defaultView.getComputedStyle(o,"").getPropertyValue("height"))
Here is benchmark test: http://jsben.ch/NKjKd
This is much faster:
function containsNonLatinCodepoints(s) {
return /[^\u0000-\u00ff]/.test(s);
}
than this:
function isDoubleByte(str) {
for (var i = 0, n = str.length; i < n; i++) {
if (str.charCodeAt( i ) > 255) { return true; }
}
return false;
}
Why not let the window resize itself based on the runtime height/width?
Run something like this in your pop-up:
window.resizeTo(document.body.clientWidth, document.body.clientHeight);