Hello am trying to retrieve the XPATH of a document via the ID of the node, I saw this to be a simple matter when looking at the code a simple:
//iframe[#id=key];
But it won't work, I've tried every combination of syntax I can think of and not luck, the print out is "undefined". This is my code
function getNode(htmlURL, filename, xpath, callback){
var key = filename+xpath;
var frame = document.getElementById(key);
frame.addEventListener("load", function(evt){
var value= frame.contentDocument.evaluate(xpath, frame.contentDocument, null,9,null);
if(value!=null){
callback(value);
}
}, false);
frame.contentWindow.location.href = htmlURL;
}
function init(htmlURL, filename, xpath){
getNode(htmlURL, filename, xpath, function(node) { console.debug(node.outerHTML); });
}
Am trying to get the xpath inside the iframe, so I thought I'd have to point it to the xpath to the iframe first
If you want to find an element inside of an iframe then you need to use the evaluate method on the document in the iframe e.g.
frame.contentDocument.evaluate(xpath, frame.contentDocument, null,9,null)
But if you set the src to load an HTML document don't expect to be able to access nodes in the document in your code setting the src, instead set up an onload event handler to wait until the document has been loaded. In the event handler you can then access the nodes in the loaded document.
You are getting undefined because your xpath syntax translates to a code comment in JavaScript.
What you probably want is:
var finalxpath = "//iframe[#id=key]";
Note the quote characters which makes your xpath a string literal value.
Related
im trying to use puppeteer to get property content of an element, edit it and run the edited version.
For example:
There is this element:
What I need is to get the onclick content, remove the _blank parameter and run the rest of the function... Any ideas?
maybe not the most powerful solution out there but if you only need to do this on this specific tag you could set onclick's attribute with JavaScript within page.evalauate like this:
await page.evalauate(() => {
document
.querySelector(".btn.btn-info")
.setAttribute(
"onclick",
document
.querySelector(".btn.btn-info")
.onclick.toString()
.split("\n")[1]
.replace(",'_blank'", "")
);
});
await page.click(".btn.btn-info");
what's going on here?
we run plain JavaScript within the page's scope with page.evaluate
we select the tag with document.querySelector
and set its onclick attribute (not its property!)
getting the node value of onclick as string:
mojarra.jsfcljs(document.getElementById('j_idt58'),{'j_idt58:j_idt201:0:j_idt203':'j_idt58:j_idt201:0:j_idt203'},'_blank');return false
using the 2nd line of the function (as we don't need the 1st 'function onclick(event) {' line when we reassign it as the attribute)
and replacing ,'_blank' parameter from the original function (string).
the result will be:
mojarra.jsfcljs(document.getElementById('j_idt58'),{'j_idt58:j_idt201:0:j_idt203':'j_idt58:j_idt201:0:j_idt203'});return false
finally clicking the button with page.click executes the new function
alternatively, you can use attributes.onclick.nodeValue if you are not comfortable with toString().split("\n")[1] above:
document.querySelector(".btn.btn-info").attributes.onclick.nodeValue.replace(",'_blank'", "")
I'm working on a chrome extension that uses jquery to parse the source of a page for specific things. In example I'm looking through Wikipedia to get the categories.
I get the source of the page via
chrome.tabs.executeScript(tabId, {
code: "chrome.extension.sendMessage({action: 'getContentText', source: document.body.innerHTML, location: window.location});"
}, function() {
if (chrome.extension.lastError)
console.log(chrome.extension.lastError.message);
});
I am then listening for this message (successfully) and then use jquery to parse the source key of the object, like so
if (request.action == "getContentText")
{
//console.log(request.source);
$('#mw-normal-catlinks > ul > li > a', request.source).each(function()
{
console.log("category", $(this).html());
});
}
This works as expected and logs a list of all the category links innerHTML. However the issue happens from that jQuery selector that it tries to load the images that are contained in request.source. This results in errors such as
GET chrome-extension://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Padlock-silver.svg/20px-Padlock-silver.svg.png net::ERR_FAILED
These are valid links, however they are being called (unneeded) from my extension with the chrome-extension:// prefix (which is invalid). I'm not sure why jquery would try to evaluate/request images from within source using a selector
I guess this is happening because Wikipedia uses relative paths on their images (instead of https:// or http://, simply // - so the content loaded is relative to the server). The requests are being made by jQuery and you can see here how to fix this issue (in future, please make sure to search SO more thoroughly).
A huge thank you to #timonwimmer for helping me in the chat. We both happened to find different solutions at the same time.
My solution was to use a regex to remove any occurances of the images. Via
var source = request.source.replace(/.*?\.wikimedia\.org\/.*?/g, "");
His was an answer on stack overflow already, that was derived from another answer. If you are interested this answer works perfectly
If you give jQuery a string with a complete element declaration it actually generates a new DOM element, similar to calling document.createElement(tagName) and setting all of the attributes.
For instance: var $newEl = $("<p>test</p>") or in your case img tag elements with $("<img/>"). That would get parsed and created as a new DOM HTML element and wrapped by jQuery so you can query it.
Since you are passing a complete and valid HTML string, it is parsing it into an actual DOM first. This is because jQuery uses the built in underlying document.querySelector methods and they act on the DOM not on strings -- think of the DOM as a database with indexes for id and class and attributes for querying. For instance, MongoDB cannot perform queries on a raw JSON string, it needs to first process the JSON into BSON and index it all and the queries are performed on that.
Your problem is less with jQuery and more so with how elements are created and what happens when attributes change for those elements. For instance, when the img elements are created with document.createElement('img') and then the src attribute is set with imgElement.src = "link to image" this automatically triggers the load for the image at location src.
You can test this out for yourself by running this in your JavaScript Developer Console:
var img = document.createElement('img');
img.src = "broken-link";
Notice that this will likely show and errors in your console after running stating that the image cannot be found.
So what you want, to ensure so it does not resolve the image's src, is to either
1) apply jQuery on an existing DOM (document.body, etc), or
2) let it parse and evaluate the string into a DOM and clean the string before hand (remove the img tags using Regex or something). Take a look at https://stackoverflow.com/a/11230103/2578205 for removing HTML tags from string.
Hope it works out!
Here is the JS code:
var wrap = document.createElement("div");
wrap.innerHTML = '<script type="text/javascript" src="'+scriptUrl+'"></script>';
var wrapscript = wrap.childNodes[0];
document.body.appendChild(wrapscript)
The body did insert the script element, but the JS resource wasn't loaded, there isn't even an http request.
Could someone explain why this is happening?
The problem is with Zeptojs's $ method
$('<script type="text/javascript" src="'+scriptUrl+'"></script>').appendTo($("bdoy"))
It works like the code above, and causes the bug.
This one was trivial.
As stated in spec (8.4 Parsing HTML fragments and 8.2.3.5 Other parsing state flags,) quote:
when using innerHTML the browser will
Create a new Document node, and mark it as being an HTML document.
If there is a context element, and the Document of the context element is in quirks mode, then let the Document be in quirks mode.
Otherwise, if there is a context element, and the Document of the
context element is in limited-quirks mode, then let the Document be in
limited-quirks mode. Otherwise, leave the Document in no-quirks mode.
Create a new HTML parser, and associate it with the just created Document node.
...
and when parsing a <script> inside
The scripting flag is set to "enabled" if scripting was enabled for
the Document with which the parser is associated when the parser was
created, and "disabled" otherwise.
The scripting flag can be enabled even when the parser was originally
created for the HTML fragment parsing algorithm, even though script
elements don't execute in that case.
So it won't be executed, as long as you inject it with innerHTML.
And using innerHTML will prevent the <script> element created from being executed permanently.
As stated in spec (4.3.1 The script element,) quote:
Changing the src, type, charset, async, and defer attributes dynamically has no direct effect; these attribute are only used at specific times described below.
Concluding the described below is that, it only parse the src attribute when injecting the <script> to the document (no matter which, including the temporary one created when using innerHTML.)
So, as long as you want to inject a script to the document and make it executed, you have to use script = document.createElement('script').
Set its attributes like src and type, possibly the contents inside (by using script.appendChild(document.createTextNode(content))), then append it to the document.body.
You can try this instead:
var wrap = document.createElement('div');
var scr = document.createElement('script');
scr.src = scriptUrl;
scr.type = 'text/javascript';
wrap.appendChild(scr);
document.body.appendChild(wrap);
By creating the script tag explicitly you're telling JS that the innerHTML is not a text but instead it's an executable script.
A possible solution, when you don't have control over the insertion mechanism and you are forced to use innerHTML with script beacons, is to rebuild DOM Nodes from the "ghost" ones.
This is a recurring problem in the ad-tech industry, in a which many automated systems duplicate arbitrary HTML code (aka. adservers ^^).
works fine in Chrome:
var s = wrap.getElementsByTagName('script');
for (var i = 0; i < s.length ; i++) {
var node=s[i], parent=node.parentElement, d = document.createElement('script');
d.async=node.async;
d.src=node.src;
parent.insertBefore(d,node);
parent.removeChild(node);
}
(you can test it in JSFiddle)
Using jQuery to create new DOM elements from text.
Example:
jQuery('<div><img src="/some_image.gif"></img></div>');
When this statement is executed, it causes the browser to request the file 'some_img.gif' from the server.
Is there a way to execute this statement so that the resulting jQuery object can be used from DOM traversal, but not actually cause the browser to hit the server with requests for images and other referenced content?
Example:
var jquery_elememnts = jQuery('<div><img class="a_class" src="/some_image.gif"></img></div>');
var img_class = jquery_elememnts.find('img').attr('class');
The only idea I have now is to use regex to remove all of the 'src' tags from image elements and other things that will trigger the browser requests before using jQuery to evaluate the HTML.
How can jQuery be used to evaluate HTML without triggering the browser to make requests to the server for referenced content inside the evaluated HTML?
Thanks!
if you do the regexp way, maybe a simple one like
htmlString.replace(/[ ]src=/," data-src=");
will do the job ?
so instead of looking for :
jquery_elememnts.find('img').attr('src');
you will have to look for :
jquery_elememnts.find('img').data('src');
That's not possible afaik. When the browser loads an HTML fragment, such as one containing an img, which references another resource via src, it will try to fetch it.
However, if you only need to get the class attribute from the img, you can use $.parseXML() to obtain an XMLDocument, and then process it to get the required attribute. This way, the HTML fragment will never be loaded, and thus the image will not be fetched:
var jquery_elememnts = $.parseXML('<div><img class="a_class" src="http://4.bp.blogspot.com/-B6PMBqTyqpk/UC7syX2eRLI/AAAAAAAABL0/SEoLWoxgApo/s1600/google10.png"></img></div>');
var img = jquery_elememnts.getElementsByTagName("img")[0];
var img_class = img.getAttribute("class");
DEMO.
You can use parseXML in jQuery.
var elements = jQuery.parseXML('<div><img class="a_class" src="/some_image.gif"></img></div>');
var img_class = $(elements).find('img').attr('class');
alert(img_class);
The string should be a perfect XML. This is a workaround I used for a similar purpose, but don't know whether this will solve your issue.
var jquery_elememnts = (new DOMParser()).parseFromString('<div><img class="a_class" src="/some_image.gif"></img></div>', 'text/html');
var img_class = jQuery(jquery_elememnts).find('img').attr('class');
Does anyone know how you can access the DOM from an HTML page you got from the ckeditors getdata function, using javascript?
For instance:
function Delete_ftv_from_text(mfv_id)
{
var content=CKEDITOR.instances.editor1.getData();
content.getelementbyid(mfv_id);
}
So rather then accessing the document.getElementById I want to get the element out of the text from ckeditor, in this case the div element with an id that I get into my function.
The code above of course doesn't function.
You can access the DOM directly from the containing page:
var el = CKEDITOR.instances['editor1'].document.$.getElementById(id);
// do what you want with el