I am generating XML using Javascript. It works fine if there are no special characters in the XML. Otherwise, it will generate this message: "invalid xml".
I tried to replace some special characters, like:
xmlData=xmlData.replaceAll(">",">");
xmlData=xmlData.replaceAll("&","&");
//but it doesn't work.
For example:
<category label='ARR Builders & Developers'>
Thanks.
Consider generating the XML using DOM methods. For example:
var c = document.createElement("category");
c.setAttribute("label", "ARR Builders & Developers");
var s = new XMLSerializer().serializeToString(c);
s; // => "<category label=\"ARR Builder & Developers\"></category>"
This strategy should avoid the XML entity escaping problems you mention but might have some cross-browser issues.
This will do the replacement in JavaScript:
xml = xml.replace(/</g, "<");
xml = xml.replace(/>/g, ">");
This uses regular expression literals to replace all less than and greater than symbols with their escaped equivalent.
JavaScript comes with a powerful replace() method for string objects.
In general - and basic - terms, it works this way:
var myString = yourString.replace([regular expression or simple string], [replacement string]);
The first argument to .replace() method is the portion of the original string that you wish to replace. It can be represented by either a plain string object (even literal) or a regular expression.
The regular expression is obviously the most powerful way to select a substring.
The second argument is the string object (even literal) that you want to provide as a replacement.
In your case, the replacement operation should look as follows:
xmlData=xmlData.replace(/&/g,"&");
xmlData=xmlData.replace(/>/g,">");
//this time it should work.
Notice the first replacement operation is the ampersand, as if you should try to replace it later you would screw up pre-existing well-quoted entities for sure, just as ">".
In addition, pay attention to the regex 'g' flag, as with it the replacement will take place all throughout your text, not only on the first match.
I used regular expressions, but for simple replacements like these also plain strings would be a perfect fit.
You can find a complete reference for String.replace() here.
Related
I need to concatenate untrusted* data into a javascript string, but I need it to work for all types of strings (single quoted, double quoted, or backtick quoted)
And ideally, I need it to work for multiple string types at once
I could use string replace, but this is usually a bad idea.
I was using JSON.stringify, but this only escapes double quotes, not single or backtick.
Other answers that I've found deal with escaping only a single type of quote at a time (and never backticks).
An example of what I need:
untrustedData = 'a String with \'single quotes\', \"double quotes\" and \`backticks\` in it';
const someJS = `console.log(\`the thing is "${escapingFunctionHere(untrustedString)}"\`)`
someJS will be passed to new Function
* N.B. In my context "untrusted" here doesn't mean potentially malicous, but it does need to cope with quotes, escapes and the like.
I am building javascript code dynamically, the constructed code will not be in any way web-facing. In fact its likely that I am the only one who will use this tool directly or indirectly.
I am happy to accept the minimal associated risks
NOTE TO OTHERS: Be sure you understand the risks before doing this kind of thing.
For those interested, I am writing a parser creator. Given an input ebnf grammar file, it will output a JS class that can be used to parse things.
I really do need to output code here.
If all you need to do is escape single quotes ', double quotes " and backticks `, then using replace to prepend a backslash \ should be enough:
untrustedData.replace(/['"`]/g, '\\$&')
const untrustedData = 'I \'am\' "a `string`"';
console.log(untrustedData);
const escapedData = untrustedData.replace(/['"`]/g, '\\$&');
console.log(escapedData);
I'm not the best at regular expressions and need some help.
I have these kind of strings: data-some-thing="5 10 red". Word 'data-some' is constant and 'thing' changes. 'thing' also may contain dashes. The values in double quotes contain only alphanumeric symbols or spaces.
Is it possible to get 'thing' and values in double quotes using only regex? If yes then what expression should I use? I tried using lookarounds but didn't have much success.
You could use:
var result = data.match(/data-some-(.*?)="(.*?)"/);
The result array will have three elements:
0: the complete match (not of your interest)
1: the variable part before the equal sign
2: the value between quotes.
Demo:
var data = 'data-some-thing="5 10 red"';
var result = data.match(/data-some-(.*?)="(.*?)"/);
document.write(result[1] + '<br>' + result[2]);
Disclaimer:
Please note that if you are doing this in the context of larger HTML parsing (it is not mentioned in the question), you should not use regular expressions. Instead you should load the HTML string into a DOM, and use DOM methods to find the attribute name and value pairs you are interested in.
For node.js you can use the npm modules jsdom and htmlparser to do this.
I'm trying to use this great RegEx presented here for grabbing a video id from any youtube type url:
parse youtube video id using preg_match
// getting our youtube url from an input field.
var yt_url = $('#yt_url').val();
var regexp = new RegExp('%(?:youtube(?:-nocookie)?\\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\\.be/)([^"&?/ ]{11})%','i');
var videoId = yt_url.match( regexp ) ;
console.log('vid: '+videoId);
My console is always giving me a null videoId though. Am I incorrectly escaping something in my regexp var? I added the a second backslash to escape the single backslashes already.
Scratching my head?
% are delimiters for the PHP you got the link from, Javascript does not expect delimiters when using new RegExp(). Also, it looks like \\. should probably be replaced with \. Try:
var regexp = new RegExp('(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})','i');
Also, you can create a regular expression literally by using Javascript's /.../ delimiters, but then you'll need to escape all of your /s:
var regexp = /(?:youtube(?:-nocookie)?\.com\/(?:[^/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\\.be\/)([^"&?\/ ]{11})/i;
Documentation
Update:
A quick update to address the comment on efficiency for literal expressions (/ab+c/) vs. constructors (new RegExp("ab+c")). The documentation says:
Regular expression literals provide compilation of the regular expression when the script is loaded. When the regular expression will remain constant, use this for better performance.
And:
Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.
Since your expression will always be static, I would say creating it literally (the second example) would be slightly faster since it is compiled when loaded (however, don't confuse this into thinking it won't be creating a RegExp object). This small difference is confirmed with a quick benchmark test.
I developed a javascript function to clean a range of Unicode characters. For example, "ñeóñú a1.txt" => "neonu a1.txt". For this, I used a regular expression:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
name = name.replace (patternA,'');
But it does not work properly in IE. If my research is correct, IE does not detect Unicode in the same way. I'm trying to make an equivalent function using the library XRegExp (http://xregexp.com/), which is compatible with all browsers, but I don't know how to write the Unicode pattern so XRegExp works in IE.
One of the failed attemps:
XRegExp.replace(name,'\\u0300-\\u036F','');
How can I build this pattern?
The value provided as the XRegExp.replace method's second argument should be a regular expression object, not a string. The regex can be built by the XRegExp or the native RegExp constructor. Thus, the following two lines are equivalent:
name = name.replace(/[\u0300-\u036F]/g, '');
// Is equivalent to:
name = XRegExp.replace(name, /[\u0300-\u036F]/g, '');
The following line you wrote, however, is not valid:
var = new RegExp patternA ("[\\u0300-\\u036F]", "g");
Instead, it should be:
var patternA = new RegExp ("[\\u0300-\\u036F]", "g");
I don't know if that is the source of your problem, but perhaps. For the record, IE's Unicode support is as good or better than other browsers.
XRegExp can let you identify your block by name, rather than using magic numbers. XRegExp('[\\u0300-\\u036F]') and XRegExp('\\p{InCombiningDiacriticalMarks}') are exactly equivalent. However, the marks in that block are a small subset of all combining marks. You might actually want to match something like XRegExp('\\p{M}'). However, note that simply removing marks like you're doing is not a safe way to remove diacritics. Generally, what you're trying to do is a bad idea and should be avoided, since it will often lead to wrong or unintelligible results.
I have a JSON object as follows:
var jsonObject = {"regex":"<span class=\"Value\">\\$(.+?)<\\/span>"};
My target is to use this regular expression to scrape a value from a html document.
var match = html.match(new RegExp(jsonObject.regex, 'i'));
This however returns an error. The problem seems to be that the escape sequences in the regex string are lost in the string jsonObject.regex
A call to jsonObject.regex returns
< span class="Value">\$(.+?)<\ /span>
(The escape sequences like \" and \\ are lost)
I could replace the respective characters using javascript, but it seems the inefficient thing to do since I already have the correct format in the json object.
Any clues or workarounds are appreciated. Thanks!
You are doing two things wrong here.
First and foremost, you are trying to build a program that uses arbitrary regular expressions on HTML. Don't do that. You have a DOM at your disposal on the client side, you should use one of the selector engines available. Examples include the browser built-in document.querySelectorAll(), Sizzle (which is also part of jQuery), NWMatcher, or an XPath-based selector engine like XPath.js.
Then, you obviously do not use a JSON serializer to build your JSON string on the server side, or things like messed-up escaping would not happen on the client side.
Lastly, what you have in your first code sample is not JSON. It's a JavaScript object literal. JSON is always a string:
'{"regex":"<span class=\"Value\">\\$(.+?)<\\/span>"}'
Selecting what you seem to want in jQuery would become as simple as
var value = $("span.value").text();
But as I said, you are not bound to use jQuery, there are lighter-weight alternatives if HTML-scraping is your main goal.