I am developing a PHP based website. I would like to grab the web page title, content and thumb from any URL submitted by a user. I am not sure how I should proceed. If possible I would like to avoid any third party platform such as Embed. Could you please help?
Cheers.
You must get source of entered web url.
And parse it with regex.
For get source code:
file_get_contents();
After use regex parse:
preg_match();
detail of functions:
http://php.net/manual/en/function.preg-match.php
http://www.w3schools.com/php/func_filesystem_file_get_contents.asp
Related
I have used php simple html dom to no success on this issue.
Now I have gone to DOMDocument and DOMXpath and this does seem promising.
Here is my issue:
I am trying to scrape data from a page which is loaded via a web service request after the page initially shows. It is only milliseconds but because of this, normal scraping shows a template value as opposed to the actual data.
I have found the endpoint url using chrome developer network settings. So if I enter that url into the browser address bar the data displays nicely in JSON format. All Good.
My problem arises because any time the site is re-visited or the page refreshed, the suffix of the endpoint url is randomly-generated so I can't hard-code this url into my php file. For example the end of the url is "?=253648592" on first visit but on refresh it could be "?=375482910". The base of the url is static.
Without getting into headless browsers (I tried and MY head hurts!) is there a way to have Xpath find this random url when the page loads?
Sorry for being so long-winded but I wanted to explain as best I could.
It's probably much easier and faster to just use a regex if you only need one item/value from the HTML. I would like to give an example but therefor I would need a more extended snippet of how the HTML looks like that contains the endpoint that you want to fetch.
Is it possible to give a snippet of the HTML that contains the endpoint?
I have obtained a URL in a variable. Using the url I would like to get a particular content from that HTML page.
The URL is http://www.linkedin.com/profile/view?id=1112465
From this page I would like to get the current company data using JavaScript.
So please help me with this.
Assuming you don't work for linked in, here's the simplest answer: you can't.
There are cross-origin limitations that disallow fetching content from a domain other than the one that's requesting it. What's this mean? abc.com can't request content from xyz.com--at least not without special permission.
I am creating a website where each user will have their uniq page. users can visit other user's pages by
http://website/user?user=<username>&session=<session>
Now I want to simplify above URL to
http://website/user/<username> (something like pinterest or facebook)
I thought I can use mod_rewrite. However, mod_rewrite is for server side. I do not want to include any PHP code. What I do to get data for a user :
load the basic HTML template and then based on which user we are talking about, load user's data asynchronously.
Can I achieve above in JS? If yes, how?
-Ajay
Unfortunately, you can't do exactly this.
But possible solution would be to place your HTML hub page to http://website/user/ and form user URLs like this: http://website/user/#username. JS can get the user name simply by var username = location.href.split("#")[1].
By the way, you said that you are not using PHP. How do you parse URL arguments then?
I thought this would be easy but I guess I was wrong. I have a url;
http://www.example.com/aa/bb.html?uid=123
using javascript jquery and html, I am able to retrieve data from a json api with the uid in the sample url above. However, I don't want that url displayed like that in the address bar after the data has been parsed. Rather, I need it to display as;
http://www.example.com/aa/item-title
where item-titleis the title of the data referenced by uid=123.
A php mod-rewrite would have been ideal, but this project does not make use of server side scripting.
Thanks in advance
If you change or modify the URL then the browser try to fetch data fro the new URL. You can do something with the part of URL after # mark.
Like :
http://www.example.com/aa/bb.html?uid=123#old_part
to
http://www.example.com/aa/bb.html?uid=123#newpart
I can see only one solution to your problem as you don't want to use mod_rewrite. You can redirect from first page just changing the URL based on the given uid value to the next page you want to display.
First page - read parameter uid and build the redirect URL based on the title(not any other stuff in your first URL page)
Redirect to the built URL
In redirected page do the rest of page specific stuff.
I'm using the Google Feed API to get feeds and the results contain an link field which is great however a feedburner feed returns something to the effect of google.feedproxy.blahblahblah and I need the actual url for the html. The google.feedproxy url will redirect. Is there a way using Javascript for me to get the url that is being redirected to? Or even just a more elegant way of getting the html url?
Any help is appreciated.
Grab the text in <feedburner:origLink>. it's available in most of the feedburner feeds.
Make sure your Accept header is limited to the RSS and Atom MIME types. If by HTML URL, you mean the site for the feed, look for a <link rel="alternate"/> in the feed itself, as its href is the relevant site.