Any ideas why sometimes when the script is called to run it will launch ExtendScript Toolkit & just stall? I think it maybe when there is a lot of text to go through. Not sure that is the case every time. See Pic below script.
If it stops it stops on the line: var new_string = this_text_frame.contents.replace(search_string, replace_string);
// Version 3
function myReplace(search_string, replace_string) {
var active_doc = app.activeDocument;
var text_frames = active_doc.textFrames;
if (text_frames.length > 0)
{
for (var i = 0 ; i < text_frames.length; i++)
{
var this_text_frame = text_frames[i];
var new_string = this_text_frame.contents.replace(search_string, replace_string);
if (new_string != this_text_frame.contents)
{
this_text_frame.contents = new_string;
}
}
}
}
myReplace(/^PRG.*/i, "");
myReplace(/.*EBOM.*/i, "");
myReplace(/^PH.*0000.*/i, "");
myReplace(/^PH.*00\/.*/i, "");
// N or W & 6 #'s & -S_ EX. N123456-S_ REPLACE with: N123456-S??? (THIS NEEDS TO BE ABOVE _ REPLACED BY SPACE)
myReplace(/([NW]\d{6}-S)_/i, "$1??? ");
myReplace(/_/gi, " ");
// 6 #'s & - or no - & 7 #'s & 1 to 3 #'s & - EX: 123456-1234567/123- REPLACE with: -123456-
myReplace(/(\d{6})-?\d{7}\/\d\d?\d?-/i, "-$1-");
myReplace(/(\d{6})-?\d{7}-\/\d\d?\d?-/i, "-$1-");
myReplace(/([NW]\d{6}-S)-INS-\d\d\/\d\d?-/i, "$1??? ");
myReplace(/-INS-\d\d\/\d\d?-/i, "* ");
// - That is only followed by one more - & Not having PIA & - & 2 to 3 #'s & / & 1 to 3 #'s & - EX: -7NPSJ_RH-001/9- REPLACE with * & Space
myReplace(/-[^-]*-\d\d\d?\/\d\d?\d?-/i, "* ");
myReplace(/ ?ASSEMBLY/gi, " ASY");
myReplace(/ ASS?Y+$| ASS?Y - | ASS?Y -| ASS?Y | ASS?Y- | ASS?Y-/gi, " ASY - ");
myReplace(/(MCA-|DS-?C1-?)/i, "-");
myReplace(/^DS-|^DI-|^PH-|MCA|^PAF-|^PAF|^FDR-|^FDR/i, "");
myReplace(/VIEW ([a-z])/i, "TTEMPP $1");
myReplace(/ ?\(?V?I?EW\)| ?\(?VIE[W)]?|^W\)| ?\(VI+$|^ ?\(VI| ?\(V+$|^ ?\(V| ?\(+$|^ ?\)/i, "");
myReplace(/TTEMPP ([a-z])/i, "VIEW $1");
myReplace(/([NW]\d{6}-S)-/i, "$1??? ");
myReplace(/([NW]\d{6}-S)\/.-/i, "$1??? ");
// Needs to be in this order
myReplace(/ AND /i, "&");
myReplace(/WASHER/i, "WSHR");
myReplace(/BOLT/i, "BLT");
myReplace(/STUD/i, "STU");
myReplace(/([SCREW|SC|NUT|BLT|STU])&WSHR/i, "$1 & WSHR");
myReplace(/\?\?\? SCREW &/i, "??? SC &");
myReplace(/\?\?\? SC [^&]/i, "??? SCREW ");
myReplace(/(\?\?\? SC & WSHR).*/i, "$1");
myReplace(/(\?\?\? SCREW).*/i, "$1");
myReplace(/(\?\?\? NUT & WSHR).*/i, "$1");
myReplace(/\?\?\? NUT [^&].*/i, "??? NUT");
myReplace(/(\?\?\? BLT & WSHR).*/i, "$1");
myReplace(/\?\?\? BLT [^&].*/i, "??? BLT");
myReplace(/(\?\?\? STU & WSHR).*/i, "$1");
myReplace(/\?\?\? STU [^&].*/i, "??? STU");
myReplace(/--/gi, "-");
if ( app.documents.length > 0 && app.activeDocument.textFrames.length > 0 ) {
// Set the value of the word to look for
searchWord1 = "*";
//searchWord2 = "The";
// Iterate through all words in the document
// the words that match searchWord
for ( i = 0; i < app.activeDocument.textFrames.length; i++ ) {
textArt = activeDocument.textFrames[i];
for ( j = 0; j < textArt.characters.length; j++) {
word = textArt.characters[j];
if ( word.contents == searchWord1 ) {
word.verticalScale = 120;
word.horizontalScale = 140;
word.baselineShift = -3;
}
}
}
}
[img]http://i.imgur.com/9IRy9.jpg[/img]
This javascript is call to run from a applescript.
set Apps_Folder to (path to applications folder as text)
set Scripts_Path to "Adobe Illustrator CS5:Presets.localized:en_US:Scripts:"
set JS_FileName to "Text Find & Replace.jsx"
--
try
set JS_File to Apps_Folder & Scripts_Path & JS_FileName as alias
tell application "Adobe Illustrator"
do javascript JS_File show debugger on runtime error
end tell
on error
display dialog "Script file '" & JS_FileName & "' NOT found?" giving up after 2
end try
Did you search for your Errorcode?
1346458189 ('MRAP')
It is at the bottom of the ESTK. have a look here http://wwwimages.adobe.com/www.adobe.com/content/dam/Adobe/en/devnet/pdf/illustrator/scripting/cs6/Readme.txt
Its not "MRAP", its "PARM" but the number fits.
"An Illustrator error occurred: 1346458189 ('PARM')" alert (1459349)
Affects: JavaScript
Problem:
This alert may appear when carelessly written scripts are repeatedly
run in Illustrator from the ExtendScript Toolkit.
Each script run is executed within the same persistent ExtendScript
engine within Illustrator. The net effect is that the state of
the ExtendScript engine is cumulative across all scripts that
ran previously.
The following issues with script code can cause this problem:
Reading uninitialized variables.
Global namespace conflicts, as when two globals from different
scripts have the same name.
In your script are some uninitialized variables
searchWord1 = "*";
textArt = activeDocument.textFrames[i];
word = textArt.characters[j];
Its not "MRAP", its "PARM" but the number fits.
actually, on MacOS "MRAP" is the correct sentence returned. "PARM" is for windows.
My experience with this error :
I run a 2 000 lines' javascript.
I have to check for 700 folders contained each between 1 - 15 differents .ai files.
on MACOS 10.7, I got this error 2 times for 15 folders, never the same file. (CS6)
on Win8 I got this error 1 time for 5 folders, never the same file. (CC 2014)
on win7 I got this error 1 time for 100 folders, never the same file. (CC 2014 or CS6)
and finally I run it on a just installed win7 and I got no error, the script was running 10 hours without interruption. (CC 2014 or CS 6)
While I'm sure #fabianmoronzirfas has the technically correct and most likely answer, my recent experience with error 1346458189 is that is appears to be Illustrator's equivalent of Microsoft's infamous "Unknown error." That is, it appears to be the catch-all error for anything Adobe didn't write a more informative error trap for.
For me, this unhelpful error was a result of trying to set the artboard too small (below 1 point). Clearly, Illustrator doesn't do enough bounds checking. For others, as far as I could tell from searching the net, it comes from a smattering of reasons. Including, possibly, memory and other errors from within Illustrator's script processor, which would be one way to account for it's randomness in some scenarios. Most likely, however, I suspect it is usually something solvable with more durable code.
Related
Currently it is not possible in confluence to have the headings of the document structure numbered automatically. I am aware that there are (paid) 3rd party plugins available.
How can I achieve continuous numbered headings?
TL;DR
Create a bookmark for the following javascript and click it in edit mode in confluence to renumber your headings.
javascript:(function()%7Bfunction%20addIndex()%20%7Bvar%20indices%20%3D%20%5B%5D%3BjQuery(%22.ak-editor-content-area%20.ProseMirror%22).find(%22h1%2Ch2%2Ch3%2Ch4%2Ch5%2Ch6%22).each(function(i%2Ce)%20%7Bvar%20hIndex%20%3D%20parseInt(this.nodeName.substring(1))%20-%201%3Bif%20(indices.length%20-%201%20%3E%20hIndex)%20%7Bindices%3D%20indices.slice(0%2C%20hIndex%20%2B%201%20)%3B%7Dif%20(indices%5BhIndex%5D%20%3D%3D%20undefined)%20%7Bindices%5BhIndex%5D%20%3D%200%3B%7Dindices%5BhIndex%5D%2B%2B%3BjQuery(this).html(indices.join(%22.%22)%2B%22.%20%22%20%2B%20removeNo(jQuery(this).html()))%3B%7D)%3B%7Dfunction%20removeNo(str)%20%7Blet%20newstr%20%3D%20str.trim()%3Bnewstr%20%3D%20newstr.replace(%2F%5B%5Cu00A0%5Cu1680%E2%80%8B%5Cu180e%5Cu2000-%5Cu2009%5Cu200a%E2%80%8B%5Cu200b%E2%80%8B%5Cu202f%5Cu205f%E2%80%8B%5Cu3000%5D%2Fg%2C'%20')%3Bif(IsNumeric(newstr.substring(0%2Cnewstr.indexOf('%20'))))%7Breturn%20newstr.substring(newstr.indexOf('%20')%2B1).trim()%3B%7Dreturn%20newstr%3B%7Dfunction%20IsNumeric(num)%20%7Bnum%20%3D%20num.split('.').join(%22%22)%3Breturn%20(num%20%3E%3D0%20%7C%7C%20num%20%3C%200)%3B%7DaddIndex()%7D)()
Result
How to use
After changes to the structure have been made, clicking the bookmarked javascript renumbers the document.
Limitations are that it only provides n.n.n. numbering, but for many cases that's sufficient. The script can also be customized as required.
Background, explanation and disclosure
I tried this TaperMonkey script that apparently resulted from this post, but it didn't work as is. So I took its source code and stripped it of the integration code, old version compatibility and made some minor adjustments to get this:
function addIndex() {
var indices = [];
jQuery(".ak-editor-content-area .ProseMirror").find("h1,h2,h3,h4,h5,h6").each(function(i,e) {
var hIndex = parseInt(this.nodeName.substring(1)) - 1;
if (indices.length - 1 > hIndex) {
indices= indices.slice(0, hIndex + 1 );
}
if (indices[hIndex] == undefined) {
indices[hIndex] = 0;
}
indices[hIndex]++;
jQuery(this).html(indices.join(".")+". " + removeNo(jQuery(this).html()));
});
}
function removeNo(str) {
let newstr = str.trim();
newstr = newstr.replace(/[\u00A0\u1680​\u180e\u2000-\u2009\u200a​\u200b​\u202f\u205f​\u3000]/g,' ');
if(IsNumeric(newstr.substring(0,newstr.indexOf(' ')))){
return newstr.substring(newstr.indexOf(' ')+1).trim();
}
return newstr;
}
function IsNumeric(num) {
num = num.split('.').join("");
return (num >=0 || num < 0);
}
addIndex();
(I'm not a JavaScript developer, I'm sure it can be written nicer/better)
Then I used bookmarklet to convert it into the javascript bookmark at the top, which can be clicked to trigger the functionality.
So, I never ever programmed JavaScript and never did anything with Google Script before either. I have a fairly good understanding of Visual Basic and macros in Excel and Word. Trying to make a fairly basic program: Plow through a list of variables in a spreadsheet, make a new sheet for each value, insert a formula in this new sheet, cell (1,1).
Debug accepts my program, no issues - however, nothing at all is happening when I run the program:
function kraft() {
var rightHere =
SpreadsheetApp.getActiveSpreadsheet().getActiveSheet().getRange("A1:A131");
var loopy;
var goshDarn = "";
for (loopy = 1; loopy < 132; loopy++) {
celly = rightHere.getCell(loopy,1);
vaerdi = celly.getValue();
fed = celly.getTextStyle();
console.log(vaerdi & " - " & fed);
if (vaerdi != "" && fed.isBold == false) {
SpreadsheetApp.getActiveSpreadsheet().insertSheet(vaerdi);
var thisOne = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(vaerdi);
thisOne.deleteRows(500,500);
thisOne.deleteColumns(5, 23);
thisOne.getRange(1,1).setFormula("=ArrayFormula(FILTER('Individuelle varer'!A16:D30015,'Individuelle varer'!A16:A30015=" & Char(34) & vaerdi & Char(34) & ")))");
}
}
}
activeSheet could be called by name, so could activeSpreadsheet, I guess. But range A1:A131 has a ton of variables - some times there are empty lines and new headers (new headers are bold). But basically I want around 120 new sheets to appear in my spreadsheet, named like the lines here. But nothing happens. I tried to throw in a log thingy, but I cannot read those values anywhere.
I must be missing the most total basic thing of how to get script connected to a spreadsheet, I assume...
EDIT: I have tried to update code according to tips from here and other places, and it still does a wonderful nothing, but now looks like this:
function kraft() {
var rightHere = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet().getRange("A1:A131");
var loopy;
var goshDarn = "";
for (loopy = 1; loopy < 132; loopy++) {
celly = rightHere.getCell(loopy,1);
vaerdi = celly.getValue();
fed = celly.getFontWeight();
console.log(vaerdi & " - " & fed);
if (vaerdi != "" && fed.isBold == false) {
SpreadsheetApp.getActiveSpreadsheet().insertSheet(vaerdi);
var thisOne = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(vaerdi);
thisOne.deleteRows(500,500);
thisOne.deleteColumns(5, 23);
thisOne.getRange(1,1).setFormula("=ArrayFormula(FILTER('Individuelle varer'!A16:D30015,'Individuelle varer'!A16:A30015=" + "\"" + vaerdi + "\"" + ")))");
}
}
}
EDIT2: Thanks to exactly the advice I needed, the problem is now solved, with this code:
function kraft() {
var rightHere = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet().getRange("A1:A131");
var loopy;
for (loopy = 1; loopy < 132; loopy++) {
celly = rightHere.getCell(loopy,1);
vaerdi = celly.getValue();
fed = celly.getFontWeight()
console.log(vaerdi & " - " & fed);
if (vaerdi != "" && fed != "bold") {
SpreadsheetApp.getActiveSpreadsheet().insertSheet(vaerdi);
var thisOne = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(vaerdi);
thisOne.deleteRows(500,499);
thisOne.deleteColumns(5, 20);
thisOne.getRange(1,1).setFormula("=ArrayFormula(FILTER('Individuelle varer'!A16:D30015;'Individuelle varer'!A16:A30015=" + "\"" + vaerdi + "\"" + "))");
}
}
}
There are multiple issues with your script, but the main one is that you never actually call the isBold() function in your 'if' statement.
if (value && format.isBold() == false) {
//do something
}
Because you omitted the parentheses in 'fed.isBold', the expression never evaluates to 'true'. 'isBold' (without the parentheses) is of type Object as it's a function.
There are other issues that prevent the script from running properly:
Not using the 'var' keyword to declare variables and polluting the global scope. As a result, all variables you declare within your 'for' loop are not private to your function. Instead, they are attached to the global object and are accessible outside the function. https://prntscr.com/kjd8s5
Not using the built-in debugger. Running the function is not debugging. You should set the breakpoints and click the debug button to execute your function step-by-step and examine all values as it's being executed.
Deleting the non-existent columns. When you create the new sheet, you call the deleteColums(). There are 26 columns in total. The 1st parameter is the starting column while the 2nd one specifies how many columns must be deleted. Starting from column 5 and telling the script to remove 23 columns will throw an exception. Always refer to the documentation to avoid such errors.
console.log doesn't exist within the context of the Script Editor. You are NOT executing the scripts inside your browser, so Browser object model is not available. Use Logger.log(). Again, this is detailed in the documentation.
Your formula is not formatted properly.
JS is a dynamically typed language that's not easy to get used to. If you don't do at least some research prior to writing code, you'll be in for a lot of pain.
I have a long string that needs to be sliced into separated chunks inside an array, with a predefined length limit the chunks. Some rules apply:
If the limit cuts a word, then the word is separated for the next chunk.
Slices must be trimmed (no spaces at the beginning or end of the array item).
Special punctuation .,!? should stay with the word, and not be sent to the next chunk.
Original text: I am totally unappreciated in my time. You can run this whole park from this room with minimal staff for up to 3 days. You think that kind of automation is easy? Or cheap? You know anybody who can network 8 connection machines and debug 2 million lines of code for what I bid for this job? Because if he can I'd like to see him try.
Result with current code ["I am totally", " unappreciated in my time", ". You can run this whole", " park from this room with", " minimal staff for up to ", "3 days. You think that", " kind of automation is ea", "sy? Or cheap? You know", " anybody who can network ", "8 connection machines", " and debug 2 million line", "s of code for what I bid", " for this job? Because if", " he can I'd like to see h", "im try."]
...it should actually be:
["I am totally", "unappreciated in my time.", "You can run this whole", "park from this room with", "minimal staff for up to 3", "days. You think that kind", "of automation is easy?", "Or cheap? You know anybody", "who can network 8", "connection machines and", "debug 2 million lines of", "code for what I bid for", "this job? Because if he", "can I'd like to see him", "try."]
As you can see, I'm still having trouble with rules 2 and 3.
This is my current code (you can check the working demo in jsfiddle):
function text_split(string, limit, pos, lines) {
//variables
if(!pos) pos = 0;
if(!lines) lines = [];
var length = string.val().length;
var length_current;
//cut string
var split = string.val().substr(pos, limit);
if(/^\S/.test(string.val().substr(pos, limit))) {
//check if it is cutting a word
split = split.replace(/\s+\S*$/, "");
}
//current string length
length_current = split.length;
//current position
pos_current = length_current + pos;
//what to do
if(pos_current < length) {
lines.push(split);
return text_split(string, limit, pos_current, lines);
} else {
console.log(lines);
return lines;
}
}
$(function(){
$('#button').click(function(){
text_split($('#textarea'), 25);
});
});
The html form for the demo:
<textarea id="textarea" rows="10" cols="80">I am totally unappreciated in my time. You can run this whole park from this room with minimal staff for up to 3 days. You think that kind of automation is easy? Or cheap? You know anybody who can network 8 connection machines and debug 2 million lines of code for what I bid for this job? Because if he can I'd like to see him try.</textarea>
<button id="button">demo</button>
Example for 25 characters max, you can use this pattern:
/\S[\s\S]{0,23}\S(?=\s|$)/g
demo
code example:
var text = " I am totally unappreciated in my time. You can run this whole park from this room with minimal staff for up to 3 days. You think that kind of automation is easy? Or cheap? You know anybody who can network 8 connection machines and debug 2 million lines of code for what I bid for this job? Because if he can I'd like to see him try.";
var myRe = /\S[\s\S]{0,23}\S(?=\s|$)/g;
var m;
var result = new Array();
while ((m = myRe.exec(text)) !== null) {
result.push(m[0]);
}
console.log(result);
Note: if you need to choose dynamically the max size, you must use the alternative syntax to define your RegExp object:
var n = 25;
var myRe = new RegExp("\\S[\\s\\S]{0," + (n-2) + "}\\S(?=\\s|$)", "g");
pattern details:
\S # a non-space character (it is obviously preceded by a space
# or the start of the string since the previous match
# ends before a space)
[\s\S]{0,23} # between 0 or 23 characters
\S(?=\s|$) # a non-space character followed by a space or the end of the string
Note that (?=\s|$) can be replaced with (?!\S).
I needed to convert the following function to python to deobfuscate a text extracted while web scraping:
function obfuscateText(coded, key) {
// Email obfuscator script 2.1 by Tim Williams, University of Arizona
// Random encryption key feature by Andrew Moulden, Site Engineering Ltd
// This code is freeware provided these four comment lines remain intact
// A wizard to generate this code is at http://www.jottings.com/obfuscator/
shift = coded.length
link = ""
for (i = 0; i < coded.length; i++) {
if (key.indexOf(coded.charAt(i)) == -1) {
ltr = coded.charAt(i)
link += (ltr)
}
else {
ltr = (key.indexOf(coded.charAt(i)) - shift + key.length) % key.length
link += (key.charAt(ltr))
}
}
document.write("<a href='mailto:" + link + "'>" + link + "</a>")
}"""
here is my converted python equivalent:
def obfuscateText(coded,key):
shift = len(coded)
link = ""
for i in range(0,len(coded)):
inkey=key.index(coded[i]) if coded[i] in key else None
if ( not inkey):
ltr = coded[i]
link += ltr
else:
ltr = (key.index(coded[i]) - shift + len(key)) % len(key)
link += key[ltr]
return link
print obfuscateText("uw#287u##Guw#287Xw8Iwu!#W7L#", "WXYVZabUcdTefgShiRjklQmnoPpqOrstNuvMwxyLz01K23J456I789H.#G!#$F%&E'*+D-/=C?^B_`{A|}~")
actionattraction$comcastWnet
but I am getting a slightly incorrect output instead of actionattraction#comcast.net I get above. Also many a times the above code gives random characters for the same html page,
The target html page has a obfuscateText function in JS with the coded and key, I extract the function signature in obsfunc and execute it on the fly:
email=eval(obsfunc)
which stores the email in above variable, but the problem is that it works most of the time but fails certain times , I strongly feel that the problem is with the arguments supplied to the python function , they may need escaping or conversion as it contains special characters? I tried passing raw arguments and different castings like repr() but the problem persisted.
Some examples for actionattraction#comcast.net wrongly computed and correctly computed using the same python function(first line is email):
#ation#ttr#ationVaoma#st!nct
obfuscateText("KMd%Y#Kdd8KMd%Y#IMY!MKcdJ#*d", "utvsrwqxpyonzm0l1k2ji3h4g5fe6d7c8b9aZ.Y#X!WV#U$T%S&RQ'P*O+NM-L/K=J?IH^G_F`ED{C|B}A~")
}ction}ttr}ction#comc}st.net
obfuscateText("}ARGML}RRP}ARGMLjAMKA}QRiLCR", "}|{`_^?=/-+*'&%$#!#.9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA~")
actionattraction#comcast.net
obfuscateText("DEWLRQDWWUDEWLRQoERPEDVWnQHW", "%&$#!#.'9876*54321+0zyxw-vutsr/qponm=lkjih?gfed^cbaZY_XWVUT`SRQPO{NMLKJ|IHGFE}DCBA~")
I've rewritten the deobfuscator:
def deobfuscate_text(coded, key):
offset = (len(key) - len(coded)) % len(key)
shifted_key = key[offset:] + key[:offset]
lookup = dict(zip(key, shifted_key))
return "".join(lookup.get(ch, ch) for ch in coded)
and tested it as
tests = [
("KMd%Y#Kdd8KMd%Y#IMY!MKcdJ#*d", "utvsrwqxpyonzm0l1k2ji3h4g5fe6d7c8b9aZ.Y#X!WV#U$T%S&RQ'P*O+NM-L/K=J?IH^G_F`ED{C|B}A~"),
("}ARGML}RRP}ARGMLjAMKA}QRiLCR", "}|{`_^?=/-+*'&%$#!#.9876543210zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA~"),
("DEWLRQDWWUDEWLRQoERPEDVWnQHW", "%&$#!#.'9876*54321+0zyxw-vutsr/qponm=lkjih?gfed^cbaZY_XWVUT`SRQPO{NMLKJ|IHGFE}DCBA~"),
("ZUhq4uh#e4Om.04O", "ksYSozqUyFOx9uKvQa2P4lEBhMRGC8g6jZXiDwV5eJcAp7rIHL31bnTWmN0dft")
]
for coded,key in tests:
print(deobfuscate_text(coded, key))
which gives
actionattraction#comcast.net
actionattraction#comcast.net
actionattraction#comcast.net
anybody#home.com
Note that all three key strings contain &; replacing it with & fixes the problem. Presumably at some point the javascript was mistakenly html-code-escaped; Python has a module which will unencode html special characters like so:
# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)
First of all, index doesn't return None, but throws an exception. In your case, W appears instead of a dot because the index returned is 0, and not inkey (which is also wrong) mistakenly beleive that a character is not present in the key.
Second, presence of & suggests that you indeed may have to find and decode HTML entities.
Finally, I'd recommend to rewrite it like
len0 = len(code)
len1 = len(key)
link = ''
for ch in code:
try:
ch = key[(key.index(ch) - len0 + len1) % len1]
except ValueError: pass
link += ch
return link
I have the following code(gdaten[n][2] gives an URL, n is the index):
try:
p=urlparse(gdaten[n][2])
while p.scheme == "javascript" or p.scheme == "mailto":
p=urlparse(gdaten[n][2])
print(p," was skipped (", gdaten[n][2],")")
n += 1
print ("check:", gdaten[n][2])
f = urllib.request.urlopen(gdaten[n][2])
htmlcode = str(f.read())
parser = MyHTMLParser(strict=False)
parser.feed(htmlcode)
except urllib.error.URLError:
#do some stuff
except IndexError:
#do some stuff
except ValueError:
#do some stuff
Now I have the following error:
urllib.error.URLError: <urlopen error unknown url type: javascript>
in line 8. How is that possible? I thought with the while-loop I skip all those links with the scheme javascript? Why does the except not work? Where's my fault?
MyHTMLParserappends the links found on the website to gdaten like that [[stuff,stuff, link][stuff,stuff, link]
This is an off by one error.
In other words, n and p are out of sync.
To fix this, add one to n before setting p.
Why wasn't this working?
Assuming n is set to zero at the start (could start at 42, it doesn't matter), let's say gdaten is laid out like so:
gdaten[0][2] = "javascript://blah.js"
gdaten[1][2] = "http://hello.com"
gdaten[2][2] = "javascript://moo.js"
Upon checking the first while condition, p.scheme is 'javascript' so we enter the loop. p gets set to urlparse("javascript://blah.js") again and n is increased to 1. Since we're checking urlparse("javascript://blah.js") again, we continue into the loop again.
p now gets set to urlparse("http://hello.com") and n gets set to 2.
Since urlparse("http://hello.com") passes the check, the while loop ends.
Meanwhile, since n is two, the url that gets opened is gdaten[2][2] which is "javascript://moo.js"
Code fix
try:
p=urlparse(gdaten[n][2])
while p.scheme == "javascript" and p.scheme == "mailto" and not p.scheme:
print(p," was skipped (", gdaten[n][2],")")
# Skipping to the next value
n += 1
p=urlparse(gdaten[n][2])
print ("check:", gdaten[n][2])
f = urllib.request.urlopen(gdaten[n][2])
htmlcode = str(f.read())
...