Using injected JavaScript to copy text from a web page

Using injected JavaScript to copy text from a web page - javascript

As part of a job I'm doing on a web site I have to copy a few thousand lines of text from several pages of the old site and paste them into the HTML for the new site. The long and painstaking way of going to the old page and copying the many lines of text and then going to my editor and pasting it there line by line is getting really old. I thought of using injected JavaScript to do this but I'm not quite sure where to start. Thanks in advance for any help.
Here are links to a page of the old site and a page of the new site. As you can see in the tables on each page it would take a ton of time to copy it all manually.
Old site: http://temp.delridgelegalformscom.officelive.com/macorporation1.aspx
New Site: http://ezwebsites.us/delridge/macorporation1.html

In order to do this type of work, you need two things: a way of injecting or executing your script on that page, and a good working knowledge of the Document Object Model for the target site.
I highly recommend using the Firefox plugin FireBug, or some equivalent tool on your browser of choice. FireBug lets you execute commands from a JavaScript console which will help. Hopefully the old site does not have a bunch of <FONT>, <OBJECT> or <IFRAME> tags which will make this even more tedious.
Using a library like Prototype or JQuery will also help selecting parts of the website you need. You can submit results using JQuery like this:
$(function() {
snippet = $('#content-id').html;
$.post('http://myserver/page', {content: snippet});
});
A problem you will very likely run into is the "same origination policy" many browsers enforce for JavaScript. So if your JavaScript was loaded from http://myserver as in this example, you would be OK.
Perhaps another route you can take is to use a scripting language like Ruby, Python, or (if you really have patience) VBA. The script can automate the list of pages to scrape and a target location for the information. It can just as easily package it up as a request to the new server if that's how pages get updated. This way you don't have to worry about injecting the JavaScript and hoping all works without problems.

I think you need Grease Monkey http://www.greasespot.net/

Related

How to save a website made with javascript to a file

A little info:
When 'inspected' (Google Chrome), the website displays the information I need (namely, a simple link to a .pdf).
When I cURL the website, only a part of it gets saved. This coupled with the fact that there are functions and <script> tags leads me to believe that javascript is the culprit (I'm honestly not 100% sure, as I'm pretty new at this).
I need to pull this link periodically, and it changes each time.
The question:
Is there a way for me, in bash, to run this javascript and save the new HTML code it generates to a file?

Not trivially.
Typically, for that approach, you need to:
Construct a DOM from the HTML
Execute the JavaScript in the context of that DOM while resolving URLs relative to the URL you fetched the HTML from
There are tools which can help with this, such as Puppeteer, PhantomJS, and Selenium, but they generally lend themselves to being driven with beefier programming languages than bash.
As an alternative, you can look at reverse engineering the page. It gets the data from somewhere. You can probably work out the URLs (the Network tab of a browser's developer tools is helpful there) and access them directly.

If you want to download a web page that generates itself with JavaScript, you'll need to execute this JavaScript in order to load the page. To achieve this you can use libraries that do this like puppeteer with NodeJS. There's a lot of other libraries, but that's the most popular.
If you're wondering why does this happens, it's because web developers often use frameworks like React, Vue or Angular to quote the most popular ones which only generates a JavaScript output that's not executed by common HTTP requesting libraries.

How to architect a HUD in a Google Chrome extension?

I am trying to make a Google Chrome extension using content script.
My goal is to have a display at the top of the page (which is already working on my own pages) that can interact with the page.
I need things which are very complicated to put together in an extension, due to security policies :
Using require.js on the extension (that works for now, using this Github repo)
Using a templating engine to describe my display : I need to add a lot of content to the page and I don't think writing HTML in javascript would be a good workflow.
For my current version I use jade with my server, but this is not possible with an extension. I think I need to use something like Angular.js or Backbone.js, but I can't make them work on the content script.
I need a lot of communication between my extension and the page : For example I need to detect almost constantly mouse moves
I need communication with my server using socket.io
Every bit of functionality of my extension have been developed and tried in a standalone web page, but now I need to integrate it in a real extension and I am really stuck
So due to these requirements, I am wondering what would be the right approach for building this : putting it all in an iFrame (would the server-side communication work? And how to communicate with the page ?), or a way to make a templating engine work nicely in there, or a solution I didn't think of?

Try this:
Develop the HUD part as a standalone page that the content script will include in an iframe. You should be able to use Angular.js etc. with this, but you will need local copies of as much as possible and you'll need appropriate entries in the manifest.json to get it working in the extension. See/create other questions for the details.
Have your content script inject the code to monitor mouse-moves, etc. into the target page. Have this code digest and summarize the data, so it's not spamming the system. Maybe message the summaries to the HUD page and/or content script five or six times a second.
After that, it should just be a matter of getting the pieces working, one at a time. Break it down to specific problems and ask a question on one specific problem at a time (If you can't find the answers in previous questions).
I'm pretty sure what you appear to want is do-able, but the details are too broad for a single Stack Overflow question.

Modular approach to client-side applications

This is a follow-up to my previous question.
Suppose there is a single web page with a login form and sign-up link. When a user clicks on the link a new sign-up form is displayed. Suppose also I create separate HTML, CSS, and JavaScript files for both forms for modularity.
Now the web page should contain some JavaScript code to load the login form, when the page is loaded, and load the sign-up form upon click on the link.
Does this approach make sense? Are there any frameworks/libraries, which implement this approach? How would you suggest implement it?

I think the idea has some issues. First you should know that there are some old fashion ways to load another completely separated page in the main document. Using "iframe" tag is one of the most popular and unsecure ways to do such a thing. Showing popups and use "window.open" is another way that can show a new window and load the specific url completely separated. BUT...
There are many reasons that I'm now gonna suggest you not to do that in any of mentioned ways. You can simply use some libraries like "JQuery" to load another html in the current page without any need to load new resources that cause performance issues for you. I believe you should search for "JQuery $.get" and you will see how easy it would be.
Hope it helps.
Cheers

Yes that makes sense to me. I really like this approach as I think breaking an app into smaller chunks will make the development & maintenance much easier.
Basicly you need to load the css and js files by appending a link and script tag respecticly into the head section of the html. For loading the html part of the module you can simply use jQuery.get() method as suggested by other answer.
I have tried to implement it. I recently released my work on this. a small code base. actually in my approach each module has its own folder with its js, html and css files and optionally a server-side file too like a php or aspx file that will be called by javascript to query the server.
here is the project page in github called Yuva
take a look and let me know if this makes sense to you.

Chrome, firefox, or opera preload changes

Is there any way to "edit" a "server side" javascript file in one of the mentioned browsers that will save the js edits on the client side and replace the server side scripts?
Basically I want to edit the javascripts on the server. Obviously I can't save them on the server so they need to be saved on the client side(my computer) and the browser needs to load my scripts instead.
It shouldn't be hard to do at all but I've not been able to find any way to accomplish this.
Edit:
I want to modify the javascript's from a site I do not own or have write access too. e.g.,
Html page uses some javascript page on server. I want to modify this javascript file(the actual file).
I can download and save the javascript file BUT the html page will always use the one on the server because that is what is in the script tag. I need to modify the script tag of the html page to point to the local javascript file BEFORE the html page's scripts are executed(else the javascript from the server will be used).
here, for example, is a script tag from SE:
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
It uses a non-local javascript file. I need to replace this line with my own line before any javascript is executed. It would like like:
<script type="text/javascript" src="file://C:/temp/myjquery.min.js"></script>
or whatever. (this way, I can modify the jquery file and have it execute my own version of the one on the server)
I, could, ofcourse, download the html file and modify it BUT then php code may not work among other things. (for example, relative links will be broke)

this is usually very easy in Opera: Just view source, edit what you want and use the special "Tools > Advanced > Reload from cache" command instead of a normal reload. Voila, you'll be running the site with your modified scripts..
(There are some exceptions, related to specific no-caching techniques some sites use it won't work 100% for all files - but it certainly should work for anything served from googleapis.com)

I think what you're looking for is something like LiveReload
It allows you to edit css files and have the browser apply the changes without refreshing the browser.
The windows version is in alpha right now but the Mac version works quite well for CSS.
I don't know if it does Javascript but I think it might.
You could also try the Chrome DevTools. It's a chrome extension that does just what you want with javascript and css.

No problem, you want to use bookmark-lets for this. Indeed it is easy, just remember to use an anonymous autoexecuting function: javascript:(function(){ //commands })();
In the sane good old day's one could even place this javascript directly into your addresses, but nowaday's some browser-builders (like firefox we coders USED to trust in the old day's) are being a 'good boy' and listen to facebook's 'demands' to kill normal standard functionality in favor of their lack on comprehending closures... But alas..
Ofcourse you could also create a bookmark to fix firefox's insanity, again reclaiming power to the user :)
Every time you visit the site, you click your bookmarklet. Done.
One can even make it 'memory resistant' for as long als you are on the same page (if you really want to). Naturally power is with the user/visitor AS IT SHOULD BE, not with the webmaster (who already publicly shared whatever info).
You might also look into greasemonky on firefox and comparable solutions.
Good luck

Build a string on the server side to write all your javascript code on the server side.

Web Development Best Practices - How to Support Javascript Disabled

What is the best thing to do when a user doesn't have JavaScript enabled? What is the best way to deliver content to that kind of user? What is the best way to keep a site readable by search engines?
I can think of two ways to achieve this, but do not know what is better (or if a 3rd option is better):
Rely on the meta-refresh tag to redirect users to a non-javascript version of site. Wrap the meta-refresh tag in a noscript tag so it will be ignored by those with javascript.
Rely on an iframe tag located within the body tag to deliver a non-javascript version of site. Wrap the iframe tag in a a noscript tag so it will be ignored by those with javascript.
I would also appreciate high-profile examples of the correct or incorrect way to do this.
--------- ADDITION TO QUESTION -----------
Here is an example of what I have done in the past to address this: http://photocontest.highpoint.edu/
I want to make sure there aren't better ways to do this.

You are talking about graceful degradation: Designing and making the site to work with javascript, then making the site still work with javascript turned off. The easiest thing to do is include the html "noscript" tag somewhere near the top of your page that gives a message saying that the site REQUIRES javascript or things won't work right. SO is a perfect example of this. Most of the buttons at the top of the screen run via javascript. Turn it off and you get a nice red banner and the drop down js effects are gone.
I prefer progressive enhancement development. Get the site working in it's entirety without javascript / flash / css3 / whatever, THEN enhance it bit by bit (still include the noscript tag) to improve the user experience. This ensures you have a fully working, readable website regardless if you're a disabled user with a screen reader or search engine, whilst providing a good user experience for users with newer browsers.
Bottom line: for any dynamically generated content (for example page elements generated via AJAX) there has to be a static page alternative where this content must be available via a standard link. If you are using javascript for tabbed content, then show all the content in a way that is consistent with the rest of the webpage.
An example is http://www.bbc.co.uk/news/ Turn off javascript and you have a full page of written content, pictures, links etc. Turn on javascript and you get scrolling news stories, tabbed content, scrolling pictures and so on.
I'm going to be naughty and post links to wikipedia:
Progressive Enhancement
Graceful Degredation

You have another option, just load the same page but make it work for noscript users (progressive enhancement/gracefull degradation).
A simple example:
You want to load content into a div with ajax, make an <a> tag linking to the full page with the new content (noscript behavior) and bind the <a> tag with jQuery to intercept clicks and load with ajax (script behavior).
$('a.ajax').click(function(){
var anchor = $(this);
$('#content').load(anchor.attr('href') + ' #content');
return false;
});

I'm not entirely sure if Progressive Enhancement is considered to be best practice these days but it's the approach I personally favour. In this case you write your server side code so that it functions like a standard web 1.0 web app (no JavaScript) to provide at least enough functionality for the system to work without JavaScript. You then start layering JavaScript functionality on top of this to make the system more user friendly. If done properly you should end up with a web app that at least provides enough functionality to be useful for non-JavaScript users.
A related process is known as Graceful Degradation, which works in a similar way but starts with the assumption that a user has JavaScript enabled and build in workarounds for cases where they don/t. This has a drawback, however, in that if you overlook something you can leave a non-JavaScript user without a fallback.
Progressive Enhancement example for a search page: Build your search page so that it normally just returns a HTML page of search results, but also add a flag that can be set via GET that when set, it returns XML or JSON instead. On the search page, include a script that does an AJAX request to the search page with the flag appended onto the query string and then replaces the main content of the page with the result of the AJAX call. JavaScript users get the benefit of AJAX but those without JavaScript still get a workable search page.
http://en.wikipedia.org/wiki/Progressive_enhancement

If your application must have javascript to function then there's nothing you can do except show them a polite message in a noscript tag.
Otherwise, you should be thinking the other way around.
Build your site without JS
Give awesome user experience and make it full functional
Add JS and make the UX even more functional. Layer the JS on top.
So if the user doesn't have JS, your site will still revert to step two of your site state.
As for crawling. If your site depends on AJAX and a lot of JS to work, you can make gogole aware of it : http://code.google.com/web/ajaxcrawling/docs/getting-started.html

One quick tip that may help you: just install lynx, a command-line web browser, and you'll immediately see how google and other seo see your site (and blind people too). This is very useful. Of course, in a command line windows, there's no graphics and javascript is disabled.

If you're doing "serious" Ajax (e.g. client side-routing) the following technique could be useful:
Use Urls without GET/"?"-parameters (it makes your life easier later on)
Use http://baseurl.com/#!/path/to/resource for client side-routing
Implement rendering of non-script HTML-version of your site (HTML snapshot is what Google calls it) at http://baseurl.com/path/to/resource
Wrap the whole content of your HTML snapshot in noscript-tags and redirect via top.location.href to the full version of the site
Handle http://baseurl.com/?_escaped_fragment=/path/to/resource - it should redirect via 301-response to http://baseurl.com/path/to/resource
Use a-tags only for GET-links, use forms for POST/PUT/DELETE-links - unstyle the hell out of them if necessary
A nice example code for links I found while researching "How to write proper Ajax-code":
Resource
This is of course a pretty complex solution but it should enable both SEO (including non-search engine crawlers) and accessibility. The problem is that you have to be able to render your page server- AND client side.
One solution could be to use a templating framework like mustache where implementations for different platforms exist.
Use something like {{#pagelet}}/path/to/partial{{/pagelet}} for dynamic parts of your page - example: {{#pagelet}}/image/{{image_id}}/preview{{/pagelet}}
In your client-side rendering, pagelet would be implemented to be dynamically replaced with something loaded via Ajax (for example: render )
In your server-side rendering, pagelet would just be rendered directly (in doubt just curl the pagelet and render it right away - or if you can write the code asynchronously do it just as you would do it client side: write some temporary span into a buffer, start fetching all the pagelets, replace the temporary spans as the pagelets arrive and flush the buffer once all pagelets have been rendered.
That's the best general design I found so far. You can deep link into your app, it's search engine friendly and it should force you to build a page that gracefully degrades.
P.S.: One advantage of the techniques described above is that both the Ajax- and the "Web 1.0"-rendering of a page could profit from memcached-caching of whole pagelets.

I would prefer to code the page without javascript and then if javascript is enabled, we redirect users to a similar page with javascript. (same concept as progressive enhancement)
redirecting with javascript

We Keep Coding

JavaScript is the programming language of the Web.