jQuery on external text requesting images - javascript

I'm working on a chrome extension that uses jquery to parse the source of a page for specific things. In example I'm looking through Wikipedia to get the categories.
I get the source of the page via
chrome.tabs.executeScript(tabId, {
code: "chrome.extension.sendMessage({action: 'getContentText', source: document.body.innerHTML, location: window.location});"
}, function() {
if (chrome.extension.lastError)
console.log(chrome.extension.lastError.message);
});
I am then listening for this message (successfully) and then use jquery to parse the source key of the object, like so
if (request.action == "getContentText")
{
//console.log(request.source);
$('#mw-normal-catlinks > ul > li > a', request.source).each(function()
{
console.log("category", $(this).html());
});
}
This works as expected and logs a list of all the category links innerHTML. However the issue happens from that jQuery selector that it tries to load the images that are contained in request.source. This results in errors such as
GET chrome-extension://upload.wikimedia.org/wikipedia/commons/thumb/f/fc/Padlock-silver.svg/20px-Padlock-silver.svg.png net::ERR_FAILED
These are valid links, however they are being called (unneeded) from my extension with the chrome-extension:// prefix (which is invalid). I'm not sure why jquery would try to evaluate/request images from within source using a selector

I guess this is happening because Wikipedia uses relative paths on their images (instead of https:// or http://, simply // - so the content loaded is relative to the server). The requests are being made by jQuery and you can see here how to fix this issue (in future, please make sure to search SO more thoroughly).

A huge thank you to #timonwimmer for helping me in the chat. We both happened to find different solutions at the same time.
My solution was to use a regex to remove any occurances of the images. Via
var source = request.source.replace(/.*?\.wikimedia\.org\/.*?/g, "");
His was an answer on stack overflow already, that was derived from another answer. If you are interested this answer works perfectly

If you give jQuery a string with a complete element declaration it actually generates a new DOM element, similar to calling document.createElement(tagName) and setting all of the attributes.
For instance: var $newEl = $("<p>test</p>") or in your case img tag elements with $("<img/>"). That would get parsed and created as a new DOM HTML element and wrapped by jQuery so you can query it.
Since you are passing a complete and valid HTML string, it is parsing it into an actual DOM first. This is because jQuery uses the built in underlying document.querySelector methods and they act on the DOM not on strings -- think of the DOM as a database with indexes for id and class and attributes for querying. For instance, MongoDB cannot perform queries on a raw JSON string, it needs to first process the JSON into BSON and index it all and the queries are performed on that.
Your problem is less with jQuery and more so with how elements are created and what happens when attributes change for those elements. For instance, when the img elements are created with document.createElement('img') and then the src attribute is set with imgElement.src = "link to image" this automatically triggers the load for the image at location src.
You can test this out for yourself by running this in your JavaScript Developer Console:
var img = document.createElement('img');
img.src = "broken-link";
Notice that this will likely show and errors in your console after running stating that the image cannot be found.
So what you want, to ensure so it does not resolve the image's src, is to either
1) apply jQuery on an existing DOM (document.body, etc), or
2) let it parse and evaluate the string into a DOM and clean the string before hand (remove the img tags using Regex or something). Take a look at https://stackoverflow.com/a/11230103/2578205 for removing HTML tags from string.
Hope it works out!

Related

can a PHP file injected by JQuery load() into a DIV find out anything about this DIV?

Suppose I go
$( '#' + subform_id ) .load( "subform.php" );
where '#' + subform_id is the ID of a DIV
... is there any way the PHP in subform.php can find out, within its PHP code, the identity of the DIV? (e.g. using its own JS code <script> section)
Or otherwise refer to it by some mechanism without knowing its ID? (e.g. to use JQ's append())
Obviously I could pass the subform_id as a param of the data object (2nd param of load()). But I'm just wondering...
later
followed up on what I thought Victor2748 was suggesting... but in fact it was the ID of a <SCRIPT> block in the injected file which I used to gain access to the existing JS DOM.
Victor2748: if you read this, I'm not sure how you could know the "id of the parent container of your subform.php page" without somehow passing this id as a param in the load() function's data object...
even later
Every comment in this thread says something intelligent! In fact, concerning the question of specifying that this is a PHP file, I'm still trying to get my head around something: obviously it is possible to access the DOM when JS runs in the client. But if your PHP code needs to know the name of the DIV into which it's being loaded I believe you do indeed have to pass this through _POST or _GET. I think there are many reasons why injected PHP code might need this sort of info, e.g. so it can contain code which at some point injects more PHP into the same DIV...
Although... clearly that injection code will inevitably use a JS/JQ script, so maybe that would be the appropriate time to find out what you need about where you are in the DOM.
In JavaScript, you can use this.parentNode to get the parent container, and use this.parentNode.id to get the parent div's id.
Here is an example how your loaded block can get itself as an object/node:
var loadedBlock = document.getElementById("nameOfYourDownloadedParentContainer")
Then you can use loadedBlock.parentNode to get its parent element, then you can get any parameter from it, to identify the element/div.
Update:
First you need to get the node of the current executing <script> tag:
var arrScripts = document.getElementsByTagName('script');
var currentScriptTag = arrScripts[arrScripts.length - 1];
Then, to get the parent of the script tag, use: currentScriptTag.parentNode
(I did not test it yet, please tell me if it helped)
I think so... if you have a script tag in subform.php and the file has the following HTML: Submit form, you should be able to:
var subformId = $('#mydiv').parent().id;
It would work because the script tag executes when the PHP file is included. Put the script tag at the end of subform.php to be sure.

How to identify a hidden file element in selenium webdriver

Team,
I am trying to automate a file upload functionality but webdriver doesn't recognize the file object. Here is the thing:
The file object is in a modalbox (xpath is of the modal box is //*[#id='modalBoxBody']/div[1]). The type and name of the file object are file and url respectively.
When i see the html content, there are two objects with the same attributes. One of them is visible and another is invisible. But the hierarchy they belong to are different. So I am using the hierarchy where the element is visible.
Following is my code. I have tried all possible solutions provided in the stackoverflow (as much as I could search), but nothing worked. Commented out sections mean that they too are tried and failed.
wbdv.findElement(By.xpath("//*[#id='left-container']/div[4]/ul/li/ul/li[2]/a")).click();
wbdv.switchTo().activeElement();
System.out.println(wbdv.findElement(By.xpath("//*[#id='modalBoxBody']/div[1]")).isDisplayed()); **//This returns true**
List<WebElement> we = wbdv.findElement(By.xpath("//*[#id='modalBoxBody']/div[1]")).findElement(By.className("modalBoxBodyContent")).findElements(By.name("url")); **//There is only one element named url in this hierarchy**
System.out.println(we.isEmpty()); //This returns false meaning it got the element named url
//((JavascriptExecutor) wbdv).executeScript("document.getElementsByName('url')[0].style.display='block';"); **//This didn't work**
for(WebElement ele: we){
String js = "arguments[0].style.height='auto'; arguments[0].style.visibility='visible';";
((JavascriptExecutor) wbdv).executeScript(js, ele);
System.out.println(ele.isDisplayed()); **//This returns FALSE**
System.out.println(ele.isEnabled()); **//This returns TRUE**
System.out.println(ele.isSelected()); **//This returns FALSE**
ele.click(); **//This throws org.openqa.selenium.ElementNotVisibleException exception**
}
Now, if you look at the 3 methods above, it seems that the element is NOT displayed, NOT selected but IS enabled. So when it is not displayed, selenium cannot identify it. The java script to make it visible also came to no rescue.
Could anyone please help me solve this. It ate my entire day today?
In your last example, it looks to me like you have the right idea with using the 'style.visibility' tag. Another thing that I would recommend trying is using "ExpectedConditions.visibilityOfElementLocatedBy" method. Usually I use "presenceOfElementLocatedBy", but if you are talking about the css visibility property, I think using "visibilityOfElementLocatedBy" is the way to go. I think what might be happening for you is that you need the wait condition on the element object you are trying to get a hold of and the "ExpectedCondtions" method should give you what you need. I see that you have tried a few things but you haven't listed using a Wait condition. No guarantees, but you should try it:
WebDriverWait wait = new WebDriverWait(driver, 60);
wait.until(ExpectedConditions.visibilityOfElementLocated(
By.xpath(".//whatever")))

Cleaning up location.pathname to load files and select elements with href attributes

I have a script where essentially I'm trying to find the location of a .php file using javascript/jquery (with location.pathname). So, my problem is basically that if the user inputs something weird like:
url.com/ or url.com//// or url.com////index.php//// or url.com////index.php.////, then I need a way of dealing with this so I can obtain /index.php so I can select that file and load some content from it (using ajax), as well as selecting an element that has href = "/index.php" so I can make it an active link.
There's also the additional problem of something like this:
url.com/projects/index.php, url.com////projects//index.php for which I'd like to have an output of /projects/index.php to properly select the file once again.
Is there a standard way for doing this? I'd like to avoid using some regex or string replace method because I'm not sure if it will be able to handle all cases, although if the proper way to go about his is to do that then I'll go ahead and implement it. The browser uses a parser to determine what file to load, so if there's a solution that uses something similar (maybe built in or something) then that would be great. I tried searching for jquery url parsers or cleanups but I'm actually not sure what the term is that I should be looking for so my searches came up short.
EDIT: Just as some background, I'm implementing this basically: http://css-tricks.com/rethinking-dynamic-page-replacing-content/, but I need a way to tweak it so it can find files in sub directories as well.
EDIT2: Here's an example of what I mean:
EDIT3: Here's the ajax call I'm using, which fires on popstate:
var file = location.pathname;
$("#content").load(file + " #content", function() {
$("#menu ul a").removeClass("current");
$("#menu ul a[href="+file+"]").addClass("current");
});
When I do this for weird browser entries, the load and href fail obviously, since the href attribute is set to href = "/index.php" in my website. The load function also fails for weird inputs, even though the page can be loaded.
Replace multiple forward slashes with a single forward slash, then remove the domain name if necessary. Example:
$(document).ready(function() {
var userStr = 'url.com////projects//index.php';
userStr = userStr.replace(/\/{2,}/g, '/').replace(/url.com/, '');
// userStr = /projects/index.php
});
This obviously won't work for every combination of URL that your user might provide, but then I doubt you'll find a regular expression that handles every possibility either. If a user sends you "url.com////index.php.////" then give them a 404 in return.

using JQuery to fetch an html document and parse it into a DOM tree

So essentially I'm trying to build my own version of GitHub's tree slider. The relevant Javascript/JQuery code is:
// handles clicking a link to move through the tree
$('#slider a').click(function() {
history.pushState({ path: this.path }, '', this.href) // change the URL in the browser using HTML5 history module
$.get(this.href, function(data) {
$('#slider').slideTo(data) // handle the page transition, preventing full page reloads
})
return false
})
// binds hitting the back button in the browser to prevent full page reloads
$(window).bind('popstate', function() {
$('#slider').slideTo(location.pathname)
}
Ok, hopefully that's understandable. Now here's my interpretation of what's going on here, followed by my problem/issue:
The callback function for the GET request when navigating through the tree is the slideTo method, and an HTML string is passed in as an argument to that function. I'm assuming that slideTo is a function defined elsewhere in the script or in a custom library, as I can't find it in the JQuery documentation. So, for my purposes, I'm trying to build my own version of this function. But the argument passed into this function, "data", is just the string of HTML returned from the GET request. However, this isn't just a snippet of HTML that I can append to a div in the document, because if I perform the same GET request (e.g. by typing the url into a web browser) I would expect to see a whole webpage and not just a piece of one.
So, within this callback function that I am defining, I would need to parse the "data" argument into a DOM so that I can extract the relevant nodes and then perform the animated transition. However, this doesn't make sense to me. It generally seems like a Bad Idea. It doesn't make sense that the client would have to parse a whole string of HTML just to access part of the DOM. GitHub claims this method is faster than a full page reload. But if my interpretation is correct, the client still has to parse a full string of HTML whether navigating through the tree by clicking (and running the callback function) or by doing full page loads such as by typing the new URL in the browser. So I'm stuck with either parsing the returned HTML string into a DOM, or ideally only fetching part of an HTML document.
Is there a way to simply load the fetched document into a Javascript or JQuery DOM object so I can easily manipulate it? or even better, is there a way to fetch only an element with an arbitrary id without doing some crazy server-side stuff (which I already tried but ended up being too spaghetti code and difficult to maintain)?
I've also already tried simply parsing the data argument into a JQuery object, but that involved a roundabout solution that only seems to work half the time, using javascript methods to strip the HTML of unwanted things, like doctype declarations and head tags:
var d = document.createElement('html');
d.innerHTML = data;
body = div.getElementsByTagName("body")[0].innerHTML;
var newDOM = $(body);
// finally I have a JQuery DOM context that I can use,
// but for some reason it doesn't always seem to work quite right
How would you approach this problem? When I write this code myself and try to make it work on my own, I feel like no matter what I do, I'm doing something horribly inefficient and hacky.
Is there a way to easily return a JQuery DOM object with a GET request? or better, just return part of a document fetched with a GET request?
Just wrap it; jQuery will parse it.
$(data) // in your callback
Imagine you want to parse a <p> tag in your normal HTML web page. You probably would use something like:
var p = $('<p>');
Right? So you have to use the same approach to parse an entire HTML document and then, navigate through the DOM tree to get the specific elements you want. Therefore, you just need to say:
$.get(this.href, function(data) {
var html = $(data);
// (...) Navigating through the DOM tree
$('#slider').slideTo( HTMLportion );
});
Notice that it also works for XML documents, so if you need to download via AJAX a XML document from the server, parse the inner information and display it on the client-side, the method is exactly the same, ok?
I hope it helps you :)
P.S: Don't ever forget to put semicolons at the end of each JavaScript sentence. Probably, if you don't put them, the engine would work but it is better to be safe and write them always!

Prevent "jQuery( html )" from triggering the browser to request images and other referenced content

Using jQuery to create new DOM elements from text.
Example:
jQuery('<div><img src="/some_image.gif"></img></div>');
When this statement is executed, it causes the browser to request the file 'some_img.gif' from the server.
Is there a way to execute this statement so that the resulting jQuery object can be used from DOM traversal, but not actually cause the browser to hit the server with requests for images and other referenced content?
Example:
var jquery_elememnts = jQuery('<div><img class="a_class" src="/some_image.gif"></img></div>');
var img_class = jquery_elememnts.find('img').attr('class');
The only idea I have now is to use regex to remove all of the 'src' tags from image elements and other things that will trigger the browser requests before using jQuery to evaluate the HTML.
How can jQuery be used to evaluate HTML without triggering the browser to make requests to the server for referenced content inside the evaluated HTML?
Thanks!
if you do the regexp way, maybe a simple one like
htmlString.replace(/[ ]src=/," data-src=");
will do the job ?
so instead of looking for :
jquery_elememnts.find('img').attr('src');
you will have to look for :
jquery_elememnts.find('img').data('src');
That's not possible afaik. When the browser loads an HTML fragment, such as one containing an img, which references another resource via src, it will try to fetch it.
However, if you only need to get the class attribute from the img, you can use $.parseXML() to obtain an XMLDocument, and then process it to get the required attribute. This way, the HTML fragment will never be loaded, and thus the image will not be fetched:
var jquery_elememnts = $.parseXML('<div><img class="a_class" src="http://4.bp.blogspot.com/-B6PMBqTyqpk/UC7syX2eRLI/AAAAAAAABL0/SEoLWoxgApo/s1600/google10.png"></img></div>');
var img = jquery_elememnts.getElementsByTagName("img")[0];
var img_class = img.getAttribute("class");
DEMO.
You can use parseXML in jQuery.
var elements = jQuery.parseXML('<div><img class="a_class" src="/some_image.gif"></img></div>');
var img_class = $(elements).find('img').attr('class');
alert(img_class);
The string should be a perfect XML. This is a workaround I used for a similar purpose, but don't know whether this will solve your issue.
var jquery_elememnts = (new DOMParser()).parseFromString('<div><img class="a_class" src="/some_image.gif"></img></div>', 'text/html');
var img_class = jQuery(jquery_elememnts).find('img').attr('class');

Categories