Parsing Phrases with a Pipe Character Using JavaScript

Parsing Phrases with a Pipe Character Using JavaScript - javascript

I've been working on my Safari extension for saving content to Instapaper and have been working on enhancing my title parsing for bookmarks. For example, an article that I recently saved has a tag that looks like this:
Report: Bing Users Disproportionately Affected By Malware Redirects | TechCrunch
I want to use the JavaScript in my Safari extension to remove all of the text after the pipe character so that I can make the final bookmark look neater once it is saved to Instapaper.
I've attempted the title parsing successfully in a couple of similar cases using blocks of code that look like this:
if(safari.application.activeBrowserWindow.activeTab.title.search(' - ') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search(' - '));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search(' - '));
console.log(parsedTitle);
};
I started getting thrown for a loop once I tried doing this same thing with the pipe character; however, since JavaScript uses it as a special character. I've tried several bits of code to try and solve this problem. The most recent looks like this (attempting to use regular expressions and escape the pipe character):
if(safari.application.activeBrowserWindow.activeTab.title.search('/\|') != -1) {
console.log(safari.application.activeBrowserWindow.activeTab.title);
console.log(safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
var parsedTitle = safari.application.activeBrowserWindow.activeTab.title.substring(0, safari.application.activeBrowserWindow.activeTab.title.search('/\|'));
console.log(parsedTitle);
};
If anybody could give me a tip that works for this, your help would be greatly appreciated!

Your regex is malformed. It should be:
safari.application.activeBrowserWindow.activeTab.title.search(/\|/)
Note the lack of quotes; I'm using a regex literal here. Also, regex literals need to be bound by /.
Instead of searching and then replacing, you can simply do a replace with the following regex:
str = str.replace(/\|.*$/, "");
This will remove everything after the | character if it exists.

Related

Javascript replacing double backslashed with single backslash

I have been browsing lots of solutions, but somewhy haven't got anything to work.
I need to replace following string: "i:0#.w|dev\\tauri;" with "i:0#.w|dev\tauri;"
I have tried following JS codes to replace:
s.replace(/\\\\/g, "\\$1");
s.replace(/\\\\/g, "\\");
But have had no result. Yet following replaced my \\ with "
s.replace(/\\/g, "\"");
To be honset, then I am really confused behind this logic, it seems like there should be used \\\\ for double backshashed yet it seems to work with just \\ for two backshashes..
I need to do this for comparing if current Sharepoint user (i:0#.w|dev\tauri) is on the list.
Update:
Okay, after I used console.log();, I discovered something interesting.
Incode: var CurrentUser = "i:0#.w|dev\tauri"; and console.log(): i:0#.w|dev auri...
C# code is following:
SPWeb theSite = SPControl.GetContextWeb(Context);
SPUser theUser = theSite.CurrentUser;
return theUser.LoginName;

JavaScript strings need to be escaped so if you are getting a string literal with two back slashes, JavaScript interprets it as just one. In your string you are using to compare, you have \t, which is a tab character, when what you probably want is \\t. My guess is that wherever you are getting the current SharePoint user from, it is being properly escaped, but your compare list isn't.
Edit:
Or maybe the other way around. If you're using .NET 4+ JavaScriptStringEncode might be helpful. If you're still having problems it might help to show us how you are doing the comparison.

Google Docs API - Text.replaceText regex issues

I am trying to do something really really basic.
It's just a search and replace using this function, which uses some proprietary Regex I never used before.
https://developers.google.com/apps-script/reference/document/text#replaceText(String,String)
What I am trying to accomplish is simple, run through the whole document and replace placeholders with text.
The string to match is in this format:
#replace this please#
By using this pattern:
(\W|^)#replace this please#(\W|$)
copied from the Google Examples found here (https://support.google.com/a/answer/1371417?hl=en)
It works absolutely fine for one exception which bugs me out.
If I have 2 or more placeholders on the same line, it won't match any of them.
So if I have something like this:
#replace me please# and some normal text here #replace me too#
None of those 2 will be matched.
I am assuming my expression doesn't take this into account, but the documentation is very hard to find for their implementation of regular expressions.
Can anybody help please?

Having this line in the document:
You may try using the following regex replacement function:
function googleDocsApi27827395() {
var body = DocumentApp.getActiveDocument().getBody();
body.replaceText("(\\W|^)#replace this please#(\\W|$)", "");
}
The result:
The \\W also matches the adjacent symbol after the first and before the last search word and they are also removed. If you do not need that behavior, remove the (\\W|^) and (\\W|$).
In case you have 3 different strings in between #...#s, you can use alternations to build the regex:
body.replaceText("#(replace this please|replace me (please|too))#", "");
This line #replace me please# and #replace this please# some normal text here #replace me too# will turn into and some normal text here.

str replace all in Javascript

I am trying to some some urls throught javascript where some replacement of urls needs to be done. I have a textarea with some URLs example given below:
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=1
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=2
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=3
http://mywebsite.com/preview.aspx?mode=desktop&url=http://mywebsite.com/post.aspx?id=44&content=3
Now what i am trying to do is replacing http://mywebsite.com/preview.aspx?mode=desktop&url= with spaces.
I have tried using str.replace() but it is replacing only first occurence of that url.
I have also tried with Global variable g the query i have used is
str_replace(\http://mywebsite.com/preview.aspx?mode=desktop&url=/g,'');
But its not working So can anyone tell me how i can do that ?
I want the output of the textarea like:
http://mywebsite.com/post.aspx?id=44&content=1
http://mywebsite.com/post.aspx?id=44&content=2
http://mywebsite.com/post.aspx?id=44&content=3
http://mywebsite.com/post.aspx?id=44&content=4

I believe that your biggest issue is that your regex syntax is incorrect. Try this:
Imagine that var s is equal the the value of your textarea.
s.replace(/http\:\/\/mywebsite\.com\/preview.aspx\?mode\=desktop\&url\=/g, '');
The issue you were having was improper delimiters and unescaped reserved symbols.
Though Javascript has some of its own regex idiosyncrasies, the issues here were related to basic regex, you might find these resources useful:
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
http://regexpal.com

try this.
var string = document.getElementById('textareaidhere');
string.replace(/http:\/\/mywebsite\.com\/preview\.aspxmode=desktop&url=/g, '');
JSFiddle here

Confused with Regex JS pattern

ok i do have this following data in my div
<div id="mydiv">
<!--
what is your present
<code>alert("this is my present");</code>
where?
<code>alert("here at my left hand");</code>
oh thank you! i love you!! hehe
<code>alert("welcome my honey ^^");</code>
-->
</div>
well what i need to do there is to get the all the scripts inside the <code> blocks and the html codes text nodes without removing the html comments inside. well its a homework given by my professor and i can't modify that div block..
I need to use regular expressions for this and this is what i did
var block = $.trim($("div#mydiv").html()).replace("<!--","").replace("-->","");
var htmlRegex = new RegExp(""); //I don't know what to do here
var codeRegex = new RegExp("^<code(*n)</code>$","igm");
var code = codeRegex.exec(block);
var html = "";
it really doesn't work... please don't give the exact answer.. please teach me.. thank you
I need to have the following blocks for the variable code
alert("this is my present");
alert("here at my left hand");
alert("welcome my honey ^^");
and this is the blocks i need for variable html
what is your present
where?
oh thank you! i love you!! hehe
my question is what is the regex pattern to get the results above?

Parsing HTML with a regular expression is not something you should do.
I'm sure your professor thinks he/she was really clever and that there's no way to access the DOM API and can wave a banner around and justify some minor corner-case for using regex to parse the DOM and that sometimes it's okay.
Well, no, it isn't. If you have complex code in there, what happens? Your regex breaks, and perhaps becomes a security exploit if this is ever in production.
So, here:
http://jsfiddle.net/zfp6D/
Walk the dom, get the nodeType 8 (comment) text value out of the node.
Invoke the HTML parser (that thing that browsers use to parse HTML, rather than regex, why you wouldn't use the HTML parser to parse HTML is totally beyond me, it's like saying "Yeah, I could nail in this nail with a hammer, but I think I'm going to just stomp on the nail with my foot until it goes in").
Find all the CODE elements in the newly parsed HTML.
Log them to console, or whatever you want to do with them.

First of all, you should be aware that because HTML is not a regular language, you cannot do generic parsing using regular expressions that will work for all valid inputs (generic nesting in particular cannot be expressed with regular expressions). Many parsers do use regular expressions to match individual tokens, but other algorithms need to be built around them
However, for a fixed input such as this, it's just a case of working through the structure you have (though it's still often easier to use different parsing methods than just regular expressions).
First lets get all the code:
var code = '', match = [];
var regex = new RegExp("<code>(.*?)</code>", "g");
while (match = regex.exec(content)) {
code += match[1] + "\n";
}
I assume content contains the content of the div that you've already extracted. Here the "g" flag says this is for "global" matching, so we can reuse the regex to find every match. The brackets indicate a capturing group, . means any character, * means repeated 0 or more times, and ? means "non-greedy" (see what happens without it to see what it does).
Now we can do a similar thing to get all the other bits, but this time the regex is slightly more complicated:
new RegExp("(<!--|</code>)(.*?)(-->|<code>)", "g")
Here | means "or". So this matches all the bits that start with either "start comment" or "end code" and end with "end comment" or "start code". Note also that we now have 3 sets of brackets, so the part we want to extract is match[2] (the second set).

You're doing a lot of unnecessary stuff. .html() gives you the inner contents as a string. You should be able to use regEx to grab exactly what you need from there. Also, try to stick with regEx literals (e.g. /^regexstring$/). You have to escape escape characters using new RegExp which gets really messy. You generally only want to use new RegExp when you need to put a string var into a regEx.
The match function of strings accepts regEx and returns a collection of every match when you add the global flag (e.g. /^regexstring$/g <-- note the 'g'). I would do something like this:
var block = $('#mydiv').html(), //you can set multiple vars in one statement w/commas
matches = block.match(/<code>[^<]*<\/code>/g);
//[^<]* <-- 0 or more characters that aren't '<' - google 'negative character class'
matches.join('_') //lazy way of avoiding a loop - join into a string with a safe character
.replace(/<\/*code>/g,'') //\/* 0 or more forward slashes
.split('_');//return the matches string back to array
//Now do what you want with matches. Eval (ew) or append in a script tag (ew).
//You have no control over the 'ew'. I just prefer data to scripts in strings

Regex extraction of one letter inside html chunk

Hi need to extract ONE letter from a string.
The string i have is a big block of html, but the part where i need to search in is this text:
Vahvistustunnus M :
And I need to get the M inside the nbsp's
So, who is the quickest regex-guru out there? :)

Ok, according to this page in the molybdenum api docs, the results will be all of the groups concatenated together. Given that you just want the char between the two 's then it's not good enough to match the whole thing and then pull out the group. Instead you'll need to do something like this:
(?<=Vahvistustunnus )[a-zA-Z](?= )
Warning
This might not work for you because lookbehinds (?<=pattern) are not available in all regex flavors. Specifically, i think that because molybdenum is a firefox extension, then it's likely using ECMA (javascript) regex flavor. And ECMA doesn't support lookbehinds.
If that's the case, then i'm gonna have to ask someone else to answer your question as my regex ninja (amateur) skills don't go much further than that. If you were using the regex in javascript code, then there are ways around this limitation, but based on your description, it sounds like you have to solve this problem with nothing but a raw regex?

Looks like it uses JavaScript and if so
var str = "Vahvistustunnus M :";
var patt = "Vahvistustunnus ([A-Z]) :";
var result = str.match(patt)[1];
should work.

We Keep Coding

JavaScript is the programming language of the Web.

Parsing Phrases with a Pipe Character Using JavaScript - javascript

Related

Javascript replacing double backslashed with single backslash

Google Docs API - Text.replaceText regex issues

str replace all in Javascript

Confused with Regex JS pattern

Regex extraction of one letter inside html chunk

Categories

Resources