Rails 3 pulling data from another site

Rails 3 pulling data from another site - javascript

I have a client request on one of my projects where they want to be able to enter a url and have it pull in some information form the site who's url they entered and save it in the database.
So the user enters: http://www.example.com/2342342 and my controller visits that site, and gets the content of the first <h1>Tag</h1> on the site and saves this in the database. Is this possible? If so, how would I go about doing it? Would I use some rails commands to do it, or something else, like jQuery?

Nokogiri is a great parser and can work directly with an url.
So two steps there:
Instantiate a Nokogiri object with the url as param
Parse the html page to get what you expect
Find instructions here: http://nokogiri.org/tutorials/parsing_an_html_xml_document.html
Because you'll work with another website, keep in mind two advice:
wrap your queries so that you can rescue if the website is down
consider using ajax request because it could be long

I would checkout the Railscast here:
http://railscasts.com/episodes/190-screen-scraping-with-nokogiri
It's explained very well on how to use Nokogiri and scrape content from other sites.

Related

How to convert JavaScript dynamic data into HTML and render?

We have developed a website and it uses JavaScript library to query database and display the data in HTML page. When you go to the website, you need to search for something in order to retrieve the data.
so by default website doesn't display any data and it needs users to perform action.
The search result data is not visible in HTML view source as it uses JavaScript.
So, the search engines have no visibility as to what our website used for and data used in order to redirect more visitors.
Secondly, I wonder how search bots/engine crawl the websites with non-static content and understand enough about the website to redirect users.

from what i see from your question what you need to do is send requests to your server to query data from your database and show it to you client in real-time.For that i would recommend that you use web sockets(such as socket.io) or AJAX so that you could update your website seamlessly

From what I have researched, crawlers actually don't read dynamic content. Instead, they use this technique called dynamic rendering.
Dynamic rendering has to do with the server itself. It checks each request and if it determines it to be a bot, then it will send static HTML content to the bot. Otherwise, it will send normal dynamic content to the user.
Also, google and other search engines make use of meta tags. With meta tags you can define a short description of the webpage which will oftentimes be shown in the search results page.
As for question in the title, you would need to send the search information to a server. From there, you would process the data server-side and send the results back to the client where JavaScript would render it based off of the results.
You should use AJAX for this.
Resources:
https://ignitevisibility.com/dynamic-rendering-seo-details-need-know/
https://developer.mozilla.org/en-US/docs/Web/Guide/AJAX
https://developer.mozilla.org/en/docs/Web/HTML/Element/meta

How to search inside excel file using HTML/JavaScript form

I have an excel file that contains around 3000 unique ids. I am creating a HTML form where the user will select a code from one of these ids. Similar to something where they click on the code field, which takes them to a new web page where they can either search a specific code or just select to show all and then manually select one code.
I am assuming that I need to use JavaScript or something similar over here to connect the excel file to the form, but I cannot find a specific answer to the question. I have seen a lot of websites like job application websites where they have you search for the university do something similar. But I am not sure where to begin. It would be great if someone pushed me into the right direction.

Javascript on the frontend (in a web browser) does not have access to the filesystem. You should use NodeJS (Javascript Runtime) which does run on the server side, and therefore has access to the filesystem.
Your specific issues can be solved with a simple HTTP GET or POST request to a small NodeJS server which will take the code as a query parameter and proceed according to the requirements you have.
I hope this helps.

Auto fill a form from a different website

We have a webshop and using the shop's API we're able to fetch orders. On the other hand we have our wholesaler but they don't have an API, so we can't post orders to their system. They do have an online ordering form. Is there any way to auto fill this form from our script on our server to their script on their server?
I was googling the topic but I came across scripts that fill forms from the same page, not a different page on a different server.
p.s. this question is not intended to create spamming bots ;)

if i really understand that what u asked i think u must should use Selenium Web Driver that u can get information from a link and use them to auto fill the desired form , Selenium available on Java, Python , Ruby and ...

How does Asana handle URLs without a #

You may have seen app.asana.com.
If not you should check it out, it is a very nice designed webapp.
But I can't figure out how they handle the whole URL management.
Backbone.js or Knockout.js handles the URL with the #, and everything after that is just generated.
But asana doesn't have a hash and can modify the URL, how are they doing this?

Looks like they're using HTML5 history.pushState(); so they don't have to refresh the page and so they don't have to use # (hashes) in the URL to go to a certain part in a web app.
Here's a good tutorial about history.pushState();: https://developer.mozilla.org/en/DOM/Manipulating_the_browser_history
This is the what Google+ and Facebook uses to change the URL without refreshing.
I hope this helps.

HTML5 Push State: http://spoiledmilk.dk/blog/html5-changing-the-browser-url-without-refreshing-page
The big benefit here is that if you paste an Asana URL directly into the browser (or click on a link from an email), the server sees the full URL and can immediately send the appropriate task data to the client. We used to use url fragments, but we needed to do a second round trip after the application loaded to read the fragment in JavaScript and pass it to the server.

How to extract some information such as the meta and the topic using url or the page and JQUery

Consider I have a URL, now I want to have some information associated with the URL on my page same way as Facebook or other websites such as LinkedIn do. You submit a URL and the data about the website is retrieved to be submitted. I am using JQuery and HTML for an application and want to know how to do this thing. My application has few URL's retrieved from the different sources. I want to show some of the information instead of plane URL's. How is it possible to make such a thing using JQuery?

You cannot access external URL's directly by AJAX calls because of the Same Origin Policy. What you'll have to do is to submit a request to your own server, and have some serverside code request the external URL and retreive information.
How that is best achieved depends on what serverside setup you're running.
.NET example
PHP example
(basically just google "Screen scraping" + your language of choice)

You need to process the whole page to search for images or useful information.

We Keep Coding

JavaScript is the programming language of the Web.

Rails 3 pulling data from another site - javascript

I would checkout the Railscast here: http://railscasts.com/episodes/190-screen-scraping-with-nokogiri It's explained very well on how to use Nokogiri and scrape content from other sites.

Related

How to convert JavaScript dynamic data into HTML and render?

How to search inside excel file using HTML/JavaScript form

Auto fill a form from a different website

How does Asana handle URLs without a #

How to extract some information such as the meta and the topic using url or the page and JQUery

Categories

Resources