Scraping webpage with multiple HTMLs and bodies - javascript

I want to click an element in a list on a website through VBA macro. However, the website is build in a way that it has another "html", "head" and "body" tags build into the ones that are on top, possibly because it loads a list of clickable items only after I choose "open items" from a dropdown.
I think this is the reason I cannot find a way to reference the button in my code, as it is sitting in the second html. For example, using IE.Document.GetElementsByTag("html") will display 0 even though there are more than one html tags on this page (two to be exact). The furthest I can get with my scraping is right before the second html tag in the page. Also when i try to use Debug.Print IE.Document.GetElementsByTagName("body")(0).innerText, the code after second HTML tag is omitted there.
I am not posting any more code because the problematic part is visible only under "Inspect element" in IE, not when I view "Source" in IE. Therefore I am just hoping for more general suggestions on how to get to an element in such embedded HTML document.
Thanks, Bartek

Related

Inject HTML to be displayed on top of page at runtime

I am looking for a way to inject HTML into an already loaded page in response to a user clicking the page.
Precisely, what I need to do is:
Capture a click
Generate a HTML string to be displayed
Inject that HTML string into the page, to be displayed on top of the current markup.
Of these, 1 is done and 2 is partially complete, so now I need to display the HTML string on top of the current markup (like a pop-up box). The injected markup should disappear when the markup behind it is clicked. How can I do this?
This is being developed as a feature of an Angular2 web app, so I'd like to achieve this using only typescript, HTML and CSS for styling, rather than use an existing library.
A few pointers to make things a little clearer:
Generated HTML will be interactive, so it may be clicked on, not just a simple popup or alert.
Generated HTML should be removed from the DOM when an area outside of that HTML is clicked.

How to dynamically click on all list items without displaying the results of their default behavior but invisibly loading their contents?

I want to dynamically click on a set of unordered list items without displaying the results of the clicks. The intent is to pre-load dynamic content such as images and text that clicking on the list items will normally load and display; I want to preload the content as much as possible before the user begins to click on list items. (I've inherited code from another developer and I'm having to work within the constraints of his routine, so please bear with me.)
Each list item has an ID, like:
id="w273"
id="w175"
id="w123"
These would be my references. The list items are generated dynamically, and each contains an HREF to content that will be displayed in a hashed area of the page (the content consists of server-loaded images and text extracted from a SQL database using a query).
Normally, clicking on a list item changes the content in that area of the page, but it takes time to load. Once it's loaded, though, it can be redisplayed without reloading, of course, and so revisiting it is instantaneous...it's the initial visit that takes time.
I'd therefore like to pre-load all of that content by dynamically clicking on each of the list items in succession without displaying the resulting content, all done in the background, leaving the default content (which is automatically retrieved using the first ID in the click-list) in place. (I mention this detail to explain that the page loads initially with the first list item's content displayed, and that behavior should remain unchanged.)
How could I accomplish this with Javascript or JQuery?
MORE INFO
Okay, here's the skinny. The content is informed by a major containing php script that houses content from an inner php script. The outer script creates an image carousel whose thumbnails reference the inner script as hashes. At any given point, the URL will take this form:
g.php?g_id=45#a.php?a_id=238
The outer script is the "g.php" script. It references a thumbnail image in the carousel identified by the "a.php" script whose GET value is the key to loading the inner content on the page.
The individual thumbnails in the carousel are HREF'd like this:
<li><img src="thumb_image.jpg" /></li>
So clicking on this one would revise the previous URL to:
g.php?g_id=45#a.php?a_id=467
Notice, though, that the content generated by the "g.php" script doesn't change, therefore. The inner "a.php" content switches as a hash change when its corresponding thumbnail in the carousel is clicked. It's a surprisingly effective solution, with a few caveats.
The main caveat is that nothing is preloaded except the content referenced by the first link (which corresponds to the first thumbnail in the carousel), and that behavior is hard coded into the routine and is fine.
I simply want to dynamically click each link in the list to load all of the content, and to do it in the background after the page has loaded with the first link's content exposed (which is its default behavior, and, as I've said, which is fine). And it must be done invisibly.
It also doesn't matter in what order it happens, because the user might immediately advance the carousel and click on the 14th element in it rather than the 2nd element. So, I don't want to preload the content in batches of 10 or similar increments, waiting for the user to interact with the carousel to load more content; that makes no sense, provided the design of the carousel and how it is intended to be used in any non-sequential manner.
I simply need to loop through all of the list item links and load them invisibly—in whatever sequence they should happen to load, provided the asynchronous nature of AJAX. More than likely, the user will click on one of the links that has been preloaded by the preloading routine, but if the user jumps ahead and selects something that's still in the process of preloading, that's not a problem; by the time the user has examined that content, the rest of the content will have been preloaded.
So, that's more info. I hope this provides a better backdrop for understanding what I'm up against. Without completely rewriting the entire routine, the best bet seems to be to accept its own mechanism and accommodate it by looping through an AJAX/JQuery routine that dynamically clicks and preloads all the data in the background once the page has displayed its initial content. And I do have access to the IDs of the links in the unordered list; other identifying information could easily be added to it.
Text is not an issue, what could be an issue is the async loading of many large images - that might not start loading in the desired order.
It would be a nice idea not to load your images somewhere hidden inside the document, but instead get from the server a JSON holding all the needed data.
You don't need to emulate clicks on all your list one by one,
you need to simply get i.e: the first 10 images, and as the user advances, load more and more (here the idea is to avoid loading stuff that the user might never explore/see/use) - but it all depends on the User Interface you have.
JSON example:
[
{
"id" : "w125",
"image" : "path/to/image1.jpg",
"content" : "HTML or whatever"
},{
"id" : "w275",
"image" : "path/to/image2.jpg",
"content" : "HTML or whatever is the content"
}
]

url changing anchor links in polymer

I have 2 Polymer pages sharing one navigation menu structure. Only one menu item links to page two, the others are linking to id's on page one. How would I write the menu links from page two to id's on page one with JavaScript?
(Hashtags in a links are not relative to URLs in custom Polymer elements, and neither same page anchor links like p1 nor url changing links like work therefore.)
This seems to be a general problem with custom elements, and there is not much documentation on how to fix this in the Polymer-project page, therefore.
I have got help with same page id linking, but unfortunately am still really struggling with JavaScript, and have no idea how I would go about referencing a target element on a previous page by DOM methods, and attaching the necessary event-listeners and scroll functions.
This JS Bin from Frankie Fu is showing how to do this with same page links, but I would need to do it with url changing links.
My pages are therapie-jetzt.de/index.html and therapie-jetzt.de/Aktuelles.html (page two). The menu items are all linking to id's on page one, except for "Aktuelles", which points to page two. So what I need to do is point all other menu items in the menu on Aktuelles.html to the corresponding paragraph id's in the first page, index.html.
I guess I would have to start with getting and storing the previous page, say with document.referrer, put it in a variable, and query my id's, and then go on from that? But that wouldn't work, as I can only query nodes in the current dom/window object, right?
The easiest way would probably be to not load a second page at all and just hide the other content away when the menu item Aktuelles is tapped...
Lex

What is the html container or element on the mymsn pages?

If you log in to mymsn, you are able to customize the content and layout of your webpage. What I want to know is what kind of container are they using? Do they use an html element, or is it javascript or something else?
There are a bunch of boxes with a title, menu option, and minimize and delete buttons. Inside the boxes are unordered list links to topics of that particular subject.
Since I don't have 10 reputation I can't post an image of what it looks like.
If you look at the source code, you can see that this is just HTML elements like div used as containers for all the news dynamically added via JavaScript.
Check it by right clicking on any element in the page and select inspect element, you will see all the scripts running the page and handling interactions.

Where is form response stored?

Suppose I have a radio button collection.
While I am selecting/deselecting certain options, Tick mark seems to be moving as well.
However if I see the HTML code, none of the options really show a "checked=true" in html.
1)So where is this information stored about my choices really ? DOM objects ?
Also using JS I change the checked attributes of DOM object- yet I dont see a change in HTML source of page. & I want this information to be present in HTML - coz I would be exporting pages in a way.
2)How do I use the Javascript to include "checked=true" in HTML itself ?
The HTML source is only the "blueprint" for the current page. The code gets loaded and parsed into DOM nodes, somewhere in the browser's memory. When the page is displayed, things take place in the browser's memory.
Therefore, dynamic changes will not automatically reflect to the default "view source" view in the browser.
Firebug's source view can show dynamic changes to the DOM - it translates them back into HTML "live". You won't see form values there, though.
As #Daniel points out, the "view selection source" function in Firefox also shows a "live" view of the selected area.
When you right click and select "view source", you will get the source that was sent to you by the server. Firefox has a "view selection source" that allows you to view the "updated" source, and the Firebug extension allows you to view this as well. Chrome has got "Developer Tools" built-in that allows you to inspect the HTML as well.

Categories