How to do multiple regular expressions, each time refining the results? - javascript

Why can't I output my regex to a variable, and then run regex on it a second time?
I'm writing a greasemonkey javascript that grabs some raw data, runs some regex on it, then runs some more regex on it to refine the results:
// I tried this on :: http://stackoverflow.com/
var tagsraw = (document.getElementById("subheader").innerHTML);
alert(tagsraw);
Getting the raw data (above code) works
var trimone = tagsraw.match(/title\W\W\w+\s\w+\s\w+\s\w+\s\w+/g);
alert(trimone);
running regex once works (above code); but running (code below) doesn't??
var trimtwo = trimone.match(/\s\w+\s\w+\s\w+\s\w+/g);
alert(trimtwo);
Can some advise me as to what is wrong with my code/approach?

The reason the first match works, is because innerHTML returns a string.
However the match returns an array, thus treat it as one:
for (var i=0; i<trimone.length; i++)
{
var trimtwo = trimone[i].match(/\s\w+\s\w+\s\w+\s\w+/g);
alert(trimtwo);
}
Edit:
Try this code instead though, I think this is a bit closer to what you want to achieve:
var trimone = tagsraw.match(/title\s*=\s*".*"/g);
alert(trimone);
for (var i=0; i<trimone.length; i++)
{
alert(trimone[i]);
}

You could do something like this:
var str = "<title> foo bar baz quux blah</title>",
re = [
/title\W\W\w+\s\w+\s\w+\s\w+\s\w+/g,
/\s\w+\s\w+\s\w+\s\w+/g
],
tmp = [str];
for (var i=0, n=re.length; i<n; ++i) {
tmp = tmp.map(function(val) {
return val.match(re[i])[0];
});
}
alert(tmp);

.match should be returning an array, not a string.

Your case is better suited to using .exec. You could even chain the two if you don't care about the intermediate result:
/\s\w+\s\w+\s\w+\s\w+/g.exec(/title\W\W\w+\s\w+\s\w+\s\w+\s\w+/g.exec(tagsraw));

The problem is that match() returns an array and there is no built-in function to perform a regular expression on an array.
So instead you should be able to do this with the exec function from the Regexp object. It will return the matched string. You can grab the matched string from the first regexp and use it for the second.
So it'd be something like this:
var patt1 = new Regexp(/title\W\W\w+\s\w+\s\w+\s\w+\s\w+/g);
var trimone = patt1.exec(tagsraw);
if (trimone != null) // might be null if no match is found
{
alert(trimone);
var patt2 = new Regexp(/\s\w+\s\w+\s\w+\s\w+/g);
var trimtwo = patt2.exec(trimone);
alert(trimtwo);
}
Note that exec returns null if no match is found so be sure to handle that in your code like I do above.

Related

replace string in json

I am new to javascript and have this code which will replace the string from A to B, but if there is multiple records of As, it will only replace the first A, while the remaining will be remain as A. Note that the stringify is called twice.
"success": function(json) {
var old = JSON.stringify(json).replace('"新交易"', '"待审核"');
var newdata = JSON.parse(old);
var old = JSON.stringify(newdata).replace('"批准"', '"已充值"');
var newdata = JSON.parse(old);
fnCallback(newdata);
}
This has little to do with JSON. As documented:
To perform a global search and replace, include the g switch in the regular expression.
So change this:
replace('"新交易"', '"待审核"')
... into this:
replace(/"新交易"/g, '"待审核"')
To replace every word in your context use Regular Expressions. So check this example to see how it works:
var someText = '"新交易""新交易""新交易""新交易""新交易""新交易""新交易""新交易"';
var someText2 = '"批准""批准""批准""批准""批准""批准""批准""批准""批准""批准"';
var old = someText.replace(/"新交易"/g, '"replaced"');
var stuff = someText2.replace(/"批准"/g, '"已充值"');
https://jsfiddle.net/n1otvpy1/

Replacing custom characters during string.split()

I have a fairly large javascript/html application that updates frequently and receives a lot of data. It's running very quickly and smoothly but I need to now introduce a function that will have to process any incoming data for special chars, and I fear it will be a lot of extra processing time (and jsperf is kinda dead at the moment).
I will make a request to get a .json file via AJAX and then simply use the data as is. But now I will need to look out for strings with #2C (hex comma) because all of the incoming data is comma-separated values.
in File.json
{
names: "Bob, Billy",
likes : "meat,potatoes
}
Now I need
{
names: "Bob, Billy",
likes : "meat#2Cbeear#2Cwine,potatoes
}
where #2C (hex for comma) is a comma within the string.
I have this code which works fine
var str = "a,b,c#2Cd";
var arr = str.split(',');
function escapeCommas(arr) {
for (var i = 0; i < arr.length; i++) {
if (arr[i].indexOf("#2C") !== -1) {
var s = arr[i].replace("#2C", ',');
arr[i] = s;
}
}
return arr;
}
console.log(escapeCommas(arr));
http://jsfiddle.net/5hogf5me/1/
I have a lot of functions that process the JSON data often as
var name = str.split(',')[i];
I am wondering how I could extend or re-write .split to automatically replace #2C with a comma.
Thanks for any advice.
Edit: I think this is better:
var j = {
names: "Bob, Billy",
likes : "meat#2Cpotatoes"
};
var result = j.likes.replace(/#2C/g, ',');
// j.likes.replace(/#2C/ig, ','); - if you want case insensitive
// and simply reverse parameters if you want
console.log(result);
This was my initial approach:
var j = {
names: "Bob, Billy",
likes : "meat,potatoes"
}
var result = j.likes.split(",").join("#2C")
console.log(result);
// meat#2Cpotatoes
Or if you have it the reverse:
var j = {
names: "Bob, Billy",
likes : "meat#2Cpotatoes"
}
var result = j.likes.split("#2C").join(",")
console.log(result);
// meat,potatoes
[Updated to reflect feedback] - try at http://jsfiddle.net
var str = 'a,b,c#2Cd,e#2Cf#2Cg';
alert(str.split(',').join('|')); // Original
String.prototype.native_split = String.prototype.split;
String.prototype.split = function (separator, limit) {
if ((separator===',')&&(!limit)) return this.replace(/,/g,'\0').replace(/#2C/gi,',').native_split('\0');
return this.native_split(separator, limit);
}
alert(str.split(',').join('|')); // Enhanced to un-escape "#2C" and "#2c"
String.prototype.split = String.prototype.native_split;
alert(str.split(',').join('|')); // Original restored
Couple minor tangential notes about your function "escapeCommas": this function is really doing a logical "un-escape" and so the function name might be reconsidered. Also, unless it is your intention to only replace the first occurence of "#2C" in each item then you should use the "g" (global) flag, otherwise an item "c#2Cd#2Cde" would come out "c,d#2Ce".

javascript regular expression match only returning last match

I have a small node application that takes some input, applies a regular expression to extract some info and should return an array of matches. All of it it pretty straight forward but the behavior I am seeing is not expected. My understanding was that if I have input with multiple lines that match this regex then each line would be an element in the array that the match returns. Unfortunately it looks like the array only contains the match groups for the last line. Is there a way to rewrite this, without iterating through the input twice, so that I can populate a nested array with the matched data per line? It would be great to return the match groups as elements, but I need to do this for each line. The end goal is to turn all this into formatted JSON for a downstream application.
Thanks for taking a look...
Now the CODE
Also available for experimentation here in a cloud 9 ide.
var util = require('util');
var re = /(processed)(.*?)\((.*?)\)(.*?)([0-9]\.[0-9]+[0-9])/g;
var data;
var returnData = [];
var Parser = function(input) {
util.log("Instantiating Parser");
this.data = input;
};
Parser.prototype.parse = function(callback) {
util.log("In the parser");
this.returnData = re.exec(this.data);
callback(this.returnData);
}
exports.Parser = Parser;
And a test file:
var Parser = require("./parser.js").Parser;
var util = require('util');
var fs = require('fs');
var data = "worker[0] processed packet (0x2000000, 1200358, t) in 0.000021 seconds\n" +
"worker[0] processed packet (0x2000000, 400115, b) in 0.000030 seconds\n"+
" (0) Registration Stats: (1387305947, 0x3d00000a, 17024, 2504, 0, 400109, 400116, b)\n"+
"worker[0] processed packet (0x1000000, 400116, b) in 0.000045 seconds\n"+
"worker[0] processed packet (0x1000000, 1200369, t) in 0.000024 seconds\n";
util.log("creating new parser");
var Parser = new Parser(data);
util.log("calling parse");
Parser.parse(function(data) {
for (var i=0; i < data.length; i++)
util.log(data[i]);
});
Here is the debuggex for the regular expression.
re.exec only returns one match each time it is executed. If you want an array of all the matches, you need to do something like this:
var matchedData = [];
var match;
while (match = re.exec(this.data)) {
matchedData.push(match);
}

How to extract values from a string in javascript?

I need some help with extracting values from a cookie using javascript.
The string in a cookie looks something like this:
string = 'id=1||price=500||name=Item name||shipping=0||quantity=2++id=2||price=1500||name=Some other name||shipping=10||quantity=2'
By using string.split() and string.replace() and a some ugly looking code I've somehow managed to get the values i need (price, name, shipping, quantity). But the problem is that sometimes not all of the strings in the cookie are the same. Sometimes the sting in a cookie will look something like this :
string = 'id=c1||color=red||size=XL||price=500||name=Item name||shipping=0||quantity=2++id=c1||price=500||name=Item name||shipping=0||quantity=2'
with some items having color and size as parameters and sometimes only one of those.
Is there some more efficient way to explain to my computer that i want the part of the string after 'price=' to be a variable named 'price' etc.
I hope I'm making sense I've tried to be as precise as I could.
Anyway, thank you for any help
EDIT: I just wanted to say thanks to all the great people of StackOverflow for such wonderfull ideas. Because of all of your great suggestions I'm going out to get drunk tonight. Thank you all :)
Let's write a parser!
function parse(input)
{
function parseSingle(input)
{
var parts = input.split('||'),
part,
record = {};
for (var i=0; i<parts.length; i++)
{
part = parts[i].split('=');
record[part[0]] = part[1];
}
return record;
}
var parts = input.split('++'),
records = [];
for (var i=0; i<parts.length; i++)
{
records.push(parseSingle(parts[i]));
}
return records;
}
Usage:
var string = 'id=1||price=500||name=Item name||shipping=0||quantity=2++id=2||price=1500||name=Some other name||shipping=10||quantity=2';
var parsed = parse(string);
/* parsed is:
[{id: "1", price: "500", name: "Item name", shipping: "0", quantity: "2"},
{id: "2", price: "1500", name: "Some other name", shipping: "10", quantity: "2"}]
*/
You can achieve this using regular expressions. For example, the regex /price=([0-9]+)/ will match price=XXX where XXX is one or more numbers. As this part of the regex is surrounded by parenthesis it explicitly captures the numeric part for you.
var string = 'id=1||price=500||name=Item name||shipping=0||quantity=2++id=2||price=1500||name=Some other name||shipping=10||quantity=2'
var priceRegex = /price=([0-9]+)/
var match = string.match(priceRegex);
console.log(match[1]); // writes 500 to the console log
Try that:
var string = 'id=1||price=500||name=Item name||shipping=0||quantity=2++id=2||price=1500||name=Some other name||shipping=10||quantity=2';
var obj = new Array();
var arr = string.split('||');
for(var x=0; x<arr.length;x++){
var temp = arr[x].split('=');
obj[temp[0]] = temp[1]
}
alert(obj['id']); // alert 1
First, split your string into two (or more) parts by ++ separator:
var strings = myString.split('++');
then for each of the strings you want an object, right? So you need to have an array and fill it like that:
var objects = [];
for (var i = 0; i < strings.length; ++i) {
var properties = strings[i].split('||');
var obj = {};
for (var j = 0; j < properties.length; ++j) {
var prop = properties[j].split('=');
obj[prop[0]] = prop[1]; //here you add property to your object, no matter what its name is
}
objects.push(obj);
}
thus you have an array of all objects constructed from your string. Naturally, in real life I'd add some checks that strings indeed satisfy the format etc. But the idea is clear, I hope.
If you can replace the || with &, you could try to parse it as if it were a query string.
A personal note - JSON-formatted data would've been easier to work with.
I would attach the data to a javascript object.
var settingsObj = {};
var components = thatString.split('||');
for(var j = 0; j < components.length; j++)
{
var keyValue = components[j].split('=');
settingsObj[keyValue[0]] = keyValue[1];
}
// Now the key value pairs have been set, you can simply request them
var id = settingsObj.id; // 1 or c1
var name = settingsObj.name; // Item Name, etc
You're already using .split() to break down the string by || just take that a step further and split each of those sections by = and assign everything on the left the field and the right the value
This should get the first match in the string:
string.match(/price=(\d{1,})/)[1]
Note this will only match the first price= in the string, not the second one.
If you can use jQuery, it wraps working with cookies and lets you access them like:
Reading a cookie:
var comments = $.cookie('comments');
Writing a cookie:
$.cookie('comments', 'expanded');
This post by someone else has a decent example:
http://www.vagrantradio.com/2009/10/getting-and-setting-cookies-with-jquery.html
If you can't use jQuery, you need to do standard string parsing like you currently are (perhaps regular expressions instead of the string splitting / replacing might trim down your code) or find some other javascript library that you can use.
If you like eye candies in your code you can use a regexp based "search and don't replace" trick by John Resig (cached here) :
var extract = function(string) {
var o = {};
string.replace(/(.*?)=(.*?)(?:\|\||$)/g, function(all, key, value) {
o[key] = value;
});
return o;
};
Then
var objects = string.split('++'),
i = objects.length;
for (;i--;) {
objects[i] = extract(objects[i]);
}
You could do something like this, where you eval the strings when you split them.
<html>
<head>
<script type="text/javascript">
var string = 'id=c1||color=red||size=XL||price=500||name=Item name||shipping=0||quantity=2++id=c1||price=500||name=Item name||shipping=0||quantity=2'
var mySplitResult = string.split("||");
for(i = 0; i < mySplitResult.length; i++){
document.write("<br /> Element " + i + " = " + mySplitResult[i]);
var assignment = mySplitResult[i].split("=");
eval(assignment[0] + "=" + "\""+assignment[1]+"\"");
}
document.write("Price : " + price);
</script>
</head>
<body>
</body>
</html>
var str = 'id=c1||color=red||size=XL||price=500||name=Item name||shipping=0||quantity=2++id=c1||price=500||name=Item name||shipping=0||quantity=2'
var items = str.split("++");
for (var i=0; i<items.length; i++) {
var data = items[i].split("||");
for (var j=0; j<data.length; j++) {
var stuff = data[j].split("=");
var n = stuff[0];
var v = stuff[1];
eval("var "+n+"='"+v+"'");
}
alert(id);
}
EDIT: As per JamieC's suggestion, you can eliminate eval("var "+n+"='"+v+"'"); and replace it with the (somewhat) safer window[n] = v; -- but you still have the simple problem that this will overwrite existing variables, not to mention you can't tell if the variable color was set on this iteration or if this one skipped it and the last one set it. Creating an empty object before the loop and populating it inside the loop (like every other answer suggests) is a better approach in almost every way.
JSON.parse('[{' + string.replace(/\+\+/g, '},{').replace(/(\w*)=([\w\s]*)/g, '"$1":"$2"').replace(/\|\|/g, ',') + '}]')
Convert the string for JSON format, then parse it.

how to load two dimensional array in javascript

I have a string containing many lines of the following format:
<word1><101>
<word2><102>
<word3><103>
I know how to load each line into an array cell using this:
var arrayOfStuff = stringOfStuff.split("\n");
But the above makes one array cell per line, I need a two-dimensional array.
Is there a way to do that using similar logic to the above without having to re-read and re-process the array. I know how to do it in two phases, but would rather do it all in one step.
Thanks in advance,
Cliff
It sounds like you're hoping for something like Python's list comprehension (e.g. [line.split(" ") for line in lines.split("\n")]), but Javascript has no such feature. The very simplest way to get the same result in Javascript is to use a loop:
var lines = lines.split("\n");
for (var i = 0; i < lines.length; i++) {
lines[i] = lines[i].split(" ");
// or alternatively, something more complex using regexes:
var match = /<([^>]+)><([^>]+)>/.exec(lines[i]);
lines[i] = [match[1], match[2]];
}
Not really. There are no native javascript functions that return a two-dimensional array.
If you wanted to parse a CSV for example, you can do
var parsedStuff = [];
stringOfStuff.replace(/\r\n/g, '\n') // Normalize newlines
// Parse lines and dump them in parsedStuff.
.replace(/.*/g, function (_) { parsedStuff.push(_ ? _.split(/,/g)) : []; })
Running
stringOfStuff = 'foo,bar\n\nbaz,boo,boo'
var parsedStuff = [];
stringOfStuff.replace(/\r\n/g, '\n')
.replace(/.*/g, function (_) { parsedStuff.push(_ ? _.split(/,/g)) : []; })
JSON.stringify(parsedStuff);
outputs
[["foo","bar"],[],["baz","boo","boo"]]
You can adjust the /,/ to suite whatever record separator you use.

Categories