I want to be able to extract a table from a website and put it into my own HTML page. For example, I want the information contained in the table class 'tbbox' on this website: http://www.timeanddate.com/worldclock/astronomy.html?n=24 inserted into my own HTML page. I want to avoid executing any kind of server side code like PHP. Perhaps use JavaScript for this? All the examples I have come across so far only provide details on how to extract the information into a CSV or text file.
Sorry if this question seems a bit vague but I know very little about how javascript is run on webpages and am not a web developer. I am just trying to setup a dashboard for personal use that will extract astronomical information from various websites into a single page, which I can open to find information at a glance.
Thanks for taking the time.
What you want to do has nothing to do with web-scraping. The problem you have can easily be solved with an <iframe> on your page that fetches the desired info from another source on load. Here is a reference that might help you with that: http://www.w3.org/TR/html401/present/frames.html
NOTE! Only display this information on your site if you are allowed to do so!
Related
I'm working on an html page for my department at work. Just html and css nothing fancy. Now, we are trying to get data from another webpage to be displayed in the new one we are working on. I assume that I would need to use JavaScript and a parser of some sort but I'm not sure how to do this or what really to search for.
The solution I assume would exist is to have a function, feed it a link of the webpage we want to mine, and it would return (for example) the number of times a certain word was repeated in that webpage.
The best way to go for it is by using node.js and then installing cheerio (parser) and request (http request) module. There are many detailed tutorials showing how to do this (for e.g. this one at digital ocean).
But, if you don't want to have nodejs setup and want to work with plain web setup. Then, download cheerio and request js libraries and include them in your html page in tag and then follow above example. I hope it helps.
Say I have a news website with articles, I have a blank article page with everything BUT the headline, photos, and the text article itself that I would ordinarily fill in manually. Instead of filling it in, say I have the entire div class ripped from a web page already. I want to import this content directly onto the page and publish it with minimal steps.
(I hope I'm giving you the picture. Imagine I have cars fully built aside from missing engines and I want the monkeys I've hired to steal engines to not leave the engines piling up outside, but instead to also bring them inside and install them into the cars and drive them to the car dealer.)
I will be web scraping something like a Wikipedia page on golf and putting that into my page. I don't want to have to copy, paste and click publish over and over. I want the web scraper, which I already know how to build, to go another step and do a find and replace of a certain div class on my blank page website INSTEAD of writing the data on a file on my computer's hard drive (though maybe writing on my hard drive with Python, then having JS or something read the HTML file on my hard drive THEN writing it to my web page would be a way to do it.
Are there programs that will do this? Do you know of modules that will do this through Python? Do you know of anything like this somebody wrote and put up on GitHub?
I'm not planning on ripping off news websites, but just to give a simpler example with one object... If I had the entire div class "content" from here...
http://www.zerohedge.com/news/2017-02-18/merkel-says-there-problem-euro-blames-mario-draghi
saved as an HTML file on my hard drive (which you could look at by clicking 'inspect' anywhere on the text of the main article> right clicking copy> copy as outerHTML> and pasting as an HTML in your text editor (again, something I would have done with a web scraper), how could I get this pasted into a blank 'new article' page and published on my website with the push of a button automatically? I'm fine with having to click a few buttons but not copying and pasting.
I'll be doing this (legally) with parts of web pages over and over and over again and I'm sure this can be automated in some way. I've heard financial news websites have been writing articles from data so something like what I need probably exist. I might be running the text I scrape through a basic neural net or feeding it to GANs. I think some interesting things can be made this way in case you are curious what I'm up to.
If you're using Python to do this, the quickest way I feel would be to have the web crawler save it's findings to either a JSON file or SQL database that your website front-end shares access to (storing the HTML you pulled as a string of text).
If you go the JSON route, just send an AJAX request to it for the website and place it in using innerHTML on the element you're dumping the code into.
If you go the SQL route, just have a python script with the website that you can send a POST request to and have the python script pull the website data you want from the database and return it to the browser as JSON and do the same as the above.
The benefit of going straight to JSON is not having to setup connection to an SQL server and deal with the SQL query to JSON conversion step. However, the benefit of the SQL database is not having to worry about any issues writing to the JSON file if your crawler is working with multiple threads and may have write conflicts if you don't lock the file correctly.
I wonder if I can use an external form of a webpage in the web from my own website. I am creating an application using the WAMP tool and I need some PHP/JavaScript/whatever script to do it.
Basically I'd have the same exactly form in my website (or at least a similar one) as the one in the external webpage - the goal of this form is only to perform a search. The user would be capable of doing the search and seeing the results posted in my website aswell and all this happening in a hidden way.
I really have no idea how to do this as I searched stackoverflow and the web looking for a solution but it really seems a little bit complicated.
Here's an image to ilustrate more or less what I want:
edit: I don't want any script or code! I Just want to know what is the best way to do this. I will eventually come to a solution (I hope!). Thanks!
The concept is called Web Scraping, you can find more about it in https://en.wikipedia.org/wiki/Web_scraping
As an answer, you can write simple web scraper using PHP CURL.
CURL can submit a remote form for you and get the results of form submit, that results may need to be handled by your script to be displayed in the form you need.
This stackoverflow question and answer may clear it more for you.
How to submit form on an external website and get generated HTML?
I have a web platform that performs data processing in relation to visits recived by my client's websites.
For capture data I need, in each page to monitor, to insert the call to my javascript that makes the collection of data in a db.
For now, the only thing that came to my mind was to provide the javascript code to be inserted to the customer in the section of specific page like this:
<script src="https://www.example.com/datascan/tk/SL-TK51124897-ME.js"></script>
Google, with analytics, for example, performs a similar operation but in a different way, by providing the javascript code to be inserted at the end of the <body>.
Someone have direct experience in this area? Which solution in terms of safety, performance and comfort you suggest? Is possible to enter the code on one page so that it can handle all the pages in that domain without repeating it in each of them?
Thanks in advance
If the pages are statically generated (each page is hardcoded in different files) you have to include the line in each file between the tags
If the head module is dynamically created (works as a module which is included in each page) you can just put it in the said module.
e.g. Wordpress https://github.com/WordPress/WordPress/blob/master/wp-content/themes/twentyeleven/header.php
If you provide more info about the structure of the site it will be easier to help.
I have a horrible feeling this will be a really simple one that I've overlooked, but I can't seem to find an answer anywhere, so here goes.
I've got a messaging system as part of a web app which is written in PHP using a MySQL database. I have used the function described at http://buildinternet.com/2010/05/how-to-automatically-linkify-text-with-php-regular-expressions/ to automatically detect and linkify URLs within the message. This is then stored to the database.
The problem is that when the user opens the message (done using jQuery, Ajax and PHP) the links are displayed including the HTML tags, rather than as a hyperlink. So for example, instead of:
Visit www.hereismylink.com for more information
I get
Visit www.hereismylink.com for more information
Does anyone know why this is? I've tried wrapping the message in <pre> tags but it hasn't helped.
Any answers, using either jQuery or PHP would be welcome.
Thanks!!