python: urlParse fault? parser Python 3 - javascript

I have the following code(gdaten[n][2] gives an URL, n is the index):
try:
p=urlparse(gdaten[n][2])
while p.scheme == "javascript" or p.scheme == "mailto":
p=urlparse(gdaten[n][2])
print(p," was skipped (", gdaten[n][2],")")
n += 1
print ("check:", gdaten[n][2])
f = urllib.request.urlopen(gdaten[n][2])
htmlcode = str(f.read())
parser = MyHTMLParser(strict=False)
parser.feed(htmlcode)
except urllib.error.URLError:
#do some stuff
except IndexError:
#do some stuff
except ValueError:
#do some stuff
Now I have the following error:
urllib.error.URLError: <urlopen error unknown url type: javascript>
in line 8. How is that possible? I thought with the while-loop I skip all those links with the scheme javascript? Why does the except not work? Where's my fault?
MyHTMLParserappends the links found on the website to gdaten like that [[stuff,stuff, link][stuff,stuff, link]

This is an off by one error.
In other words, n and p are out of sync.
To fix this, add one to n before setting p.
Why wasn't this working?
Assuming n is set to zero at the start (could start at 42, it doesn't matter), let's say gdaten is laid out like so:
gdaten[0][2] = "javascript://blah.js"
gdaten[1][2] = "http://hello.com"
gdaten[2][2] = "javascript://moo.js"
Upon checking the first while condition, p.scheme is 'javascript' so we enter the loop. p gets set to urlparse("javascript://blah.js") again and n is increased to 1. Since we're checking urlparse("javascript://blah.js") again, we continue into the loop again.
p now gets set to urlparse("http://hello.com") and n gets set to 2.
Since urlparse("http://hello.com") passes the check, the while loop ends.
Meanwhile, since n is two, the url that gets opened is gdaten[2][2] which is "javascript://moo.js"
Code fix
try:
p=urlparse(gdaten[n][2])
while p.scheme == "javascript" and p.scheme == "mailto" and not p.scheme:
print(p," was skipped (", gdaten[n][2],")")
# Skipping to the next value
n += 1
p=urlparse(gdaten[n][2])
print ("check:", gdaten[n][2])
f = urllib.request.urlopen(gdaten[n][2])
htmlcode = str(f.read())
...

Related

Parse URL which contain string of two URL

I've node app and Im getting in some header the following URL and I need to parse it and change the content of 3000 to 4000 ,How can I do that since Im getting "two" URLs in the req.headers.location
"http://to-d6faorp:51001/oauth/auth?response_type=code&redirect_uri=http%3AF%2Fmo-d6fa3.ao.tzp.corp%3A3000%2Flogin%2Fcallback&client_id=x2.node"
The issue is that I cannot use just replace since the value can changed (dynmaic value ,now its 3000 later can be any value...)
If the part of the URL you always need to change is going to be a parameter of redirect_uri then you just need to find the index of the second %3A that comes after it.
Javascript indexOf has a second parameter which is the 'start position', so you can first do an indexOf the 'redirect_uri=' string, and then pass that position in to your next call to indexOf to look for the first '%3A' and then pass that result into your next call for the %3A that comes just before your '3000'. Once you have the positions of the tokens you are looking for you should be able to build a new string by using substrings... first substring will be up to the end of your second %3A and the second substring will be from the index of the %2F that comes after it.
Basically, you will be building your string by cutting up the string like so:
"http://to-d6faorp:51001/oauth/auth?response_type=code&redirect_uri=http%3AF%2Fmo-d6fa3.ao.tzp.corp%3A"
"%2Flogin%2Fcallback&client_id=x2.node"
... and appending in whatever port number you are trying to put in.
Hope this helps.
This code should get you what you want:
var strURL = "http://to-d6faorp:51001/oauth/auth?response_type=code&redirect_uri=http%3AF%2Fmo-d6fa3.ao.tzp.corp%3A3000%2Flogin%2Fcallback&client_id=x2.node";
var strNewURL = strURL.substring(0,strURL.indexOf("%3A", strURL.indexOf("%3A", strURL.indexOf("redirect_uri") + 1) + 1) + 3) + "4000" + strURL.substring(strURL.indexOf("%2F",strURL.indexOf("%3A", strURL.indexOf("%3A", strURL.indexOf("redirect_uri") + 1) + 1) + 3));
Split the return string in its parameters:
var parts = req.headers.location.split("&");
then split the parts into fieldname and variable:
var subparts = [];
for (var i = 1; i < parts.length; i++)
subparts[i] = parts[i].split("=");
then check which fieldname equals redirect_uri:
var ret = -1;
for (var i = 0; i < subparts.length; i++)
if (subpart[i][0] == "redirect_uri")
ret = i;
if (ret == -1)
// didnt find redirect_uri, somehow handle this error
now you know which subpart contains the redirect_uri.
Because I dont know which rules your redirect_uri follows I can't tell you how to get the value, thats your task but the problem is isolated to subparts[ret][1]. Thats the string which contains redirect_uri.

regex jquery to replace number with link

I have a page of database results where users occasionally type in a reference to another post. (The database is day event tracker for a scheduling office).
The reference to the other post is always the posts ID (format of 001234). We uses these to match events with dockets and other paperwork from truck drivers. It is always a 6 digit number on its own.
<div class="eventsWrapper">
Data from DB is output here using PHP in a foreach loop.
Presents data in similar fashion to facebook for example.
</div>
What I need to do is once the data in the above DIV is loaded, then go through and replace every whole 6 digit number (not part of a number) with the number as a hyperlink.
It is important it only looks for number with a space either side:
EG: Ref 001122 <- like this - not like this -> ignore AB001122
Once I have the hyperlink tag I can make the reference number clickable to take users directly to that post.
I am not that good with regex but think it is something like:
\b(![0-9])?\d{6}\b
I have no idea how to search the DIV and then replace that regex with the hyperlink. Appreciate any help.
(?:^| )\d{6}(?= |$)
You can use this and replace by <space><whateveryouwant>.See demo.
https://regex101.com/r/bW3aR1/7
\b wont works cos A1 is not a word boundary which you want.
Something like this? Make an array of the individual posts and loop through. If there is only ever one ID in a post, you can do without the second loop.
var str = ['Ref 001122 <- like this - not like this -> ignore AB001122', 'Ref 001123 <- like this - not like this -> ignore AB001122', 'Ref 001124 <- like this - not like this -> ignore AB001122'];
var regex = /\b\d{6}\b/g;
for (var j = 0; j < str.length; j++) {
var urls = str[j].match(regex);
for (var i = 0; i < urls.length; i++) {
var url = urls[i];
newString = str[j].replace('' + urls[i] + '', '<a href = ' + url + '>' + urls[i] + '</a>')
}
$('#output').append(newString);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<div id="output"></div>

deObfuscating in Python using transformed JS function

I needed to convert the following function to python to deobfuscate a text extracted while web scraping:
function obfuscateText(coded, key) {
// Email obfuscator script 2.1 by Tim Williams, University of Arizona
// Random encryption key feature by Andrew Moulden, Site Engineering Ltd
// This code is freeware provided these four comment lines remain intact
// A wizard to generate this code is at http://www.jottings.com/obfuscator/
shift = coded.length
link = ""
for (i = 0; i < coded.length; i++) {
if (key.indexOf(coded.charAt(i)) == -1) {
ltr = coded.charAt(i)
link += (ltr)
}
else {
ltr = (key.indexOf(coded.charAt(i)) - shift + key.length) % key.length
link += (key.charAt(ltr))
}
}
document.write("<a href='mailto:" + link + "'>" + link + "</a>")
}"""
here is my converted python equivalent:
def obfuscateText(coded,key):
shift = len(coded)
link = ""
for i in range(0,len(coded)):
inkey=key.index(coded[i]) if coded[i] in key else None
if ( not inkey):
ltr = coded[i]
link += ltr
else:
ltr = (key.index(coded[i]) - shift + len(key)) % len(key)
link += key[ltr]
return link
print obfuscateText("uw#287u##Guw#287Xw8Iwu!#W7L#", "WXYVZabUcdTefgShiRjklQmnoPpqOrstNuvMwxyLz01K23J456I789H.#G!#$F%&E'*+D-/=C?^B_`{A|}~")
actionattraction$comcastWnet
but I am getting a slightly incorrect output instead of actionattraction#comcast.net I get above. Also many a times the above code gives random characters for the same html page,
The target html page has a obfuscateText function in JS with the coded and key, I extract the function signature in obsfunc and execute it on the fly:
email=eval(obsfunc)
which stores the email in above variable, but the problem is that it works most of the time but fails certain times , I strongly feel that the problem is with the arguments supplied to the python function , they may need escaping or conversion as it contains special characters? I tried passing raw arguments and different castings like repr() but the problem persisted.
Some examples for actionattraction#comcast.net wrongly computed and correctly computed using the same python function(first line is email):
#ation#ttr#ationVaoma#st!nct
obfuscateText("KMd%Y#Kdd8KMd%Y#IMY!MKcdJ#*d", "utvsrwqxpyonzm0l1k2ji3h4g5fe6d7c8b9aZ.Y#X!WV#U$T%S&RQ'P*O+NM-L/K=J?IH^G_F`ED{C|B}A~")
}ction}ttr}ction#comc}st.net
obfuscateText("}ARGML}RRP}ARGMLjAMKA}QRiLCR", "}|{`_^?=/-+*'&%$#!#.9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA~")
actionattraction#comcast.net
obfuscateText("DEWLRQDWWUDEWLRQoERPEDVWnQHW", "%&$#!#.'9876*54321+0zyxw-vutsr/qponm=lkjih?gfed^cbaZY_XWVUT`SRQPO{NMLKJ|IHGFE}DCBA~")
I've rewritten the deobfuscator:
def deobfuscate_text(coded, key):
offset = (len(key) - len(coded)) % len(key)
shifted_key = key[offset:] + key[:offset]
lookup = dict(zip(key, shifted_key))
return "".join(lookup.get(ch, ch) for ch in coded)
and tested it as
tests = [
("KMd%Y#Kdd8KMd%Y#IMY!MKcdJ#*d", "utvsrwqxpyonzm0l1k2ji3h4g5fe6d7c8b9aZ.Y#X!WV#U$T%S&RQ'P*O+NM-L/K=J?IH^G_F`ED{C|B}A~"),
("}ARGML}RRP}ARGMLjAMKA}QRiLCR", "}|{`_^?=/-+*'&%$#!#.9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA~"),
("DEWLRQDWWUDEWLRQoERPEDVWnQHW", "%&$#!#.'9876*54321+0zyxw-vutsr/qponm=lkjih?gfed^cbaZY_XWVUT`SRQPO{NMLKJ|IHGFE}DCBA~"),
("ZUhq4uh#e4Om.04O", "ksYSozqUyFOx9uKvQa2P4lEBhMRGC8g6jZXiDwV5eJcAp7rIHL31bnTWmN0dft")
]
for coded,key in tests:
print(deobfuscate_text(coded, key))
which gives
actionattraction#comcast.net
actionattraction#comcast.net
actionattraction#comcast.net
anybody#home.com
Note that all three key strings contain &; replacing it with & fixes the problem. Presumably at some point the javascript was mistakenly html-code-escaped; Python has a module which will unencode html special characters like so:
# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)
First of all, index doesn't return None, but throws an exception. In your case, W appears instead of a dot because the index returned is 0, and not inkey (which is also wrong) mistakenly beleive that a character is not present in the key.
Second, presence of & suggests that you indeed may have to find and decode HTML entities.
Finally, I'd recommend to rewrite it like
len0 = len(code)
len1 = len(key)
link = ''
for ch in code:
try:
ch = key[(key.index(ch) - len0 + len1) % len1]
except ValueError: pass
link += ch
return link

Can someone help edit Javascript string with both a for loop and the splice command?

I'm writing a script that's going to take some information about the website that you visit. I have copied this small portion of my code that I'm struggling with. This part of the code is supposed check if the visited website is using the www prefix and remove that prefix, then there is another part of the code that I haven't pasted stores the domain name in the variable website.
var website = location.hostname;
document.getElementById("displayBefore").innerHTML = website; //test to see the variable
if (website[0] == 'w' && website[1] == 'w' && website[2] == 'w' && website[3] == '.') {
document.getElementById("displayTrue1").innerHTML = "true"; //test to see if the conditional was met
for (i = 4; i < website.length; i++) {
website[i - 4] = website[i]; //this is not rewriting anything
document.getElementById("displayPos0").innerHTML = website[i]; //test to see if the for loop has run
}
document.getElementById("displayDuring").innerHTML = website; //test to see the variable
website.splice(0, 4); //this is breaking everything after it
document.getElementById("displayAfter").innerHTML = website; //test to see the variable
}
Here is what's actually being displayed when in those tests when I pull it up in a browser:
WebsiteBeforeFix: www.example.com
True1: true
website[i]: m
WebsiteDuringFix: www.example.com
WebsiteAfterFix:
The two parts of the code that aren't working are the following:
website[i - 4] = website[i];
This is supposed to pretty much shift the letters over 4 spaces to the left(eliminating "www.").
website.splice(0,4);
This is actually causing nothing after it to display at all in any of the code that does work. Can anyone tell me what I may be doing wrong?
splice is an array method, not for strings (they're immutable). Make the variable an array to manipulate it using the split method, and join it back together using the join method:
var websiteStr = location.hostname;
var website = websiteStr.split('');
console.log("displayBefore: " + website.join(''));
if (websiteStr.indexOf("www.") === 0) {
console.log("true");
/*for (var i = 4; i < website.length; i++) {
website[i - 4] = website[i];
console.log("displayPos0: " + website[i]);
}*/
console.log("displayDuring: " + website.join(''));
website.splice(0, 4);
console.log("displayAfter: " + website.join(''));
}
Instead of manipulating HTML, you can use console.log to do basic logging at particular points, which will show up in your browser's console. Anyway, it seems that your for loop doesn't do what you want it to -- splice already removes the "www." prefix.
You can also change this:
if (website[0] == 'w' && website[1] == 'w' && website[2] == 'w' && website[3] == '.') {
to this:
if (websiteStr.indexOf("www.") === 0) {
which performs the same thing much more concisely.
With the fixed code, it now displays:
displayBefore: www.google.com
true
displayDuring: www.google.com
displayAfter: google.com

Why script will sometimes launch ExtendScript Toolkit & just stall

Any ideas why sometimes when the script is called to run it will launch ExtendScript Toolkit & just stall? I think it maybe when there is a lot of text to go through. Not sure that is the case every time. See Pic below script.
If it stops it stops on the line: var new_string = this_text_frame.contents.replace(search_string, replace_string);
// Version 3
function myReplace(search_string, replace_string) {
var active_doc = app.activeDocument;
var text_frames = active_doc.textFrames;
if (text_frames.length > 0)
{
for (var i = 0 ; i < text_frames.length; i++)
{
var this_text_frame = text_frames[i];
var new_string = this_text_frame.contents.replace(search_string, replace_string);
if (new_string != this_text_frame.contents)
{
this_text_frame.contents = new_string;
}
}
}
}
myReplace(/^PRG.*/i, "");
myReplace(/.*EBOM.*/i, "");
myReplace(/^PH.*0000.*/i, "");
myReplace(/^PH.*00\/.*/i, "");
// N or W & 6 #'s & -S_ EX. N123456-S_ REPLACE with: N123456-S??? (THIS NEEDS TO BE ABOVE _ REPLACED BY SPACE)
myReplace(/([NW]\d{6}-S)_/i, "$1??? ");
myReplace(/_/gi, " ");
// 6 #'s & - or no - & 7 #'s & 1 to 3 #'s & - EX: 123456-1234567/123- REPLACE with: -123456-
myReplace(/(\d{6})-?\d{7}\/\d\d?\d?-/i, "-$1-");
myReplace(/(\d{6})-?\d{7}-\/\d\d?\d?-/i, "-$1-");
myReplace(/([NW]\d{6}-S)-INS-\d\d\/\d\d?-/i, "$1??? ");
myReplace(/-INS-\d\d\/\d\d?-/i, "* ");
// - That is only followed by one more - & Not having PIA & - & 2 to 3 #'s & / & 1 to 3 #'s & - EX: -7NPSJ_RH-001/9- REPLACE with * & Space
myReplace(/-[^-]*-\d\d\d?\/\d\d?\d?-/i, "* ");
myReplace(/ ?ASSEMBLY/gi, " ASY");
myReplace(/ ASS?Y+$| ASS?Y - | ASS?Y -| ASS?Y | ASS?Y- | ASS?Y-/gi, " ASY - ");
myReplace(/(MCA-|DS-?C1-?)/i, "-");
myReplace(/^DS-|^DI-|^PH-|MCA|^PAF-|^PAF|^FDR-|^FDR/i, "");
myReplace(/VIEW ([a-z])/i, "TTEMPP $1");
myReplace(/ ?\(?V?I?EW\)| ?\(?VIE[W)]?|^W\)| ?\(VI+$|^ ?\(VI| ?\(V+$|^ ?\(V| ?\(+$|^ ?\)/i, "");
myReplace(/TTEMPP ([a-z])/i, "VIEW $1");
myReplace(/([NW]\d{6}-S)-/i, "$1??? ");
myReplace(/([NW]\d{6}-S)\/.-/i, "$1??? ");
// Needs to be in this order
myReplace(/ AND /i, "&");
myReplace(/WASHER/i, "WSHR");
myReplace(/BOLT/i, "BLT");
myReplace(/STUD/i, "STU");
myReplace(/([SCREW|SC|NUT|BLT|STU])&WSHR/i, "$1 & WSHR");
myReplace(/\?\?\? SCREW &/i, "??? SC &");
myReplace(/\?\?\? SC [^&]/i, "??? SCREW ");
myReplace(/(\?\?\? SC & WSHR).*/i, "$1");
myReplace(/(\?\?\? SCREW).*/i, "$1");
myReplace(/(\?\?\? NUT & WSHR).*/i, "$1");
myReplace(/\?\?\? NUT [^&].*/i, "??? NUT");
myReplace(/(\?\?\? BLT & WSHR).*/i, "$1");
myReplace(/\?\?\? BLT [^&].*/i, "??? BLT");
myReplace(/(\?\?\? STU & WSHR).*/i, "$1");
myReplace(/\?\?\? STU [^&].*/i, "??? STU");
myReplace(/--/gi, "-");
if ( app.documents.length > 0 && app.activeDocument.textFrames.length > 0 ) {
// Set the value of the word to look for
searchWord1 = "*";
//searchWord2 = "The";
// Iterate through all words in the document
// the words that match searchWord
for ( i = 0; i < app.activeDocument.textFrames.length; i++ ) {
textArt = activeDocument.textFrames[i];
for ( j = 0; j < textArt.characters.length; j++) {
word = textArt.characters[j];
if ( word.contents == searchWord1 ) {
word.verticalScale = 120;
word.horizontalScale = 140;
word.baselineShift = -3;
}
}
}
}
[img]http://i.imgur.com/9IRy9.jpg[/img]
This javascript is call to run from a applescript.
set Apps_Folder to (path to applications folder as text)
set Scripts_Path to "Adobe Illustrator CS5:Presets.localized:en_US:Scripts:"
set JS_FileName to "Text Find & Replace.jsx"
--
try
set JS_File to Apps_Folder & Scripts_Path & JS_FileName as alias
tell application "Adobe Illustrator"
do javascript JS_File show debugger on runtime error
end tell
on error
display dialog "Script file '" & JS_FileName & "' NOT found?" giving up after 2
end try
Did you search for your Errorcode?
1346458189 ('MRAP')
It is at the bottom of the ESTK. have a look here http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/illustrator/scripting/cs6/Readme.txt
Its not "MRAP", its "PARM" but the number fits.
"An Illustrator error occurred: 1346458189 ('PARM')" alert (1459349)
Affects: JavaScript
Problem:
This alert may appear when carelessly written scripts are repeatedly
run in Illustrator from the ExtendScript Toolkit.
Each script run is executed within the same persistent ExtendScript
engine within Illustrator. The net effect is that the state of
the ExtendScript engine is cumulative across all scripts that
ran previously.
The following issues with script code can cause this problem:
Reading uninitialized variables.
Global namespace conflicts, as when two globals from different
scripts have the same name.
In your script are some uninitialized variables
searchWord1 = "*";
textArt = activeDocument.textFrames[i];
word = textArt.characters[j];
Its not "MRAP", its "PARM" but the number fits.
actually, on MacOS "MRAP" is the correct sentence returned. "PARM" is for windows.
My experience with this error :
I run a 2 000 lines' javascript.
I have to check for 700 folders contained each between 1 - 15 differents .ai files.
on MACOS 10.7, I got this error 2 times for 15 folders, never the same file. (CS6)
on Win8 I got this error 1 time for 5 folders, never the same file. (CC 2014)
on win7 I got this error 1 time for 100 folders, never the same file. (CC 2014 or CS6)
and finally I run it on a just installed win7 and I got no error, the script was running 10 hours without interruption. (CC 2014 or CS 6)
While I'm sure #fabianmoronzirfas has the technically correct and most likely answer, my recent experience with error 1346458189 is that is appears to be Illustrator's equivalent of Microsoft's infamous "Unknown error." That is, it appears to be the catch-all error for anything Adobe didn't write a more informative error trap for.
For me, this unhelpful error was a result of trying to set the artboard too small (below 1 point). Clearly, Illustrator doesn't do enough bounds checking. For others, as far as I could tell from searching the net, it comes from a smattering of reasons. Including, possibly, memory and other errors from within Illustrator's script processor, which would be one way to account for it's randomness in some scenarios. Most likely, however, I suspect it is usually something solvable with more durable code.

Categories