I would like to download some data from national animal remedy dept. Since it's a dynamic website, i choose the selenium to do it. But it still returns none from the website. Could anyone help? thanks
website address : http://124.126.15.169:8081/cx/
the problem is that the url above stayed the same no matter which field i click in the left side of the tab.
I don't know how to construct a request url to the server so that i can get the json file in the response data.
Could anyone point out how am I able to get the data?
Thanks a lot
What I gonna do is to capture a user video from webcamera in HTML than send this video to python which is to do the emotion analyzing.
I have get the video in by others work (RTC) which is a blob in javascript. My problem is how could I receive the video properly?
I try to use websocket to send the blob and the problem is how could i parse the received data back to a video in python?
I try to find some stuff but nothing. Maybe I choose the wrong idea. It would be great thanks if anyone give some tips.
I have used php simple html dom to no success on this issue.
Now I have gone to DOMDocument and DOMXpath and this does seem promising.
Here is my issue:
I am trying to scrape data from a page which is loaded via a web service request after the page initially shows. It is only milliseconds but because of this, normal scraping shows a template value as opposed to the actual data.
I have found the endpoint url using chrome developer network settings. So if I enter that url into the browser address bar the data displays nicely in JSON format. All Good.
My problem arises because any time the site is re-visited or the page refreshed, the suffix of the endpoint url is randomly-generated so I can't hard-code this url into my php file. For example the end of the url is "?=253648592" on first visit but on refresh it could be "?=375482910". The base of the url is static.
Without getting into headless browsers (I tried and MY head hurts!) is there a way to have Xpath find this random url when the page loads?
Sorry for being so long-winded but I wanted to explain as best I could.
It's probably much easier and faster to just use a regex if you only need one item/value from the HTML. I would like to give an example but therefor I would need a more extended snippet of how the HTML looks like that contains the endpoint that you want to fetch.
Is it possible to give a snippet of the HTML that contains the endpoint?
I am developing a PHP based website. I would like to grab the web page title, content and thumb from any URL submitted by a user. I am not sure how I should proceed. If possible I would like to avoid any third party platform such as Embed. Could you please help?
Cheers.
You must get source of entered web url.
And parse it with regex.
For get source code:
file_get_contents();
After use regex parse:
preg_match();
detail of functions:
http://php.net/manual/en/function.preg-match.php
http://www.w3schools.com/php/func_filesystem_file_get_contents.asp
I am trying to hide the URL of my audio streams for my HTML5 player and was really struggling to think of a way to do so and then I realised, soundcloud must hide the URL's of their streams. So i went onto soundcloud, opened up the console and played a track but I couldn't see any obvious way that the URL is hidden. After this I took a look at the DOM tree to see if there was any kind of audio information in there but I found nothing! There's not even an ID for the player/audio so i'm very confused as to how soundcloud have done it.
Now I have done as best as I can so far with hiding the audio URL. I have put an ID in the DOM for the track, got that ID when the play button is clicked and retrieved the URL for that ID from the database. The obvious problem with that is that anyone willing enough can just go to the console and get the URL from the network events.
I am not trying to break past soundcloud's security to download tracks I shouldn't be. I'm just curious as to how they've hidden the URL. Now i'm also curious as to how each track is distinguished as there's nothing in the DOM distinguishing them (not that I found on my brief look anyway).
So, in short, does anyone have any ideas on how soundcloud has achieved this or how this could be achieved?
Soundcloud is pretty much a pure JS site.
As you said, there is no ID of the song loaded with the HTML. The way songs are recognized is by the page URL. The is done via. this url (example):
https://api.sndcdn.com/resolve?url=https%3A//soundcloud.com/hoodinternet/joywave-tongues-hood-internet-remix&_status_code_map%5B302%5D=200&_status_format=json&client_id=YOUR_CLIENT_ID
This returns something like this:
{"status":"302 - Found","location":"https://api.soundcloud.com/tracks/100270342?client_id=YOUR_CLIENT_ID"}
Next up it loads the location URL, from the JSON above. This returns a bunch of information about the track, including:
stream_url: "https://api.soundcloud.com/tracks/100270342/stream"
Then it loads this URL:
https://api.sndcdn.com/i1/tracks/100270342/streams?client_id=YOUR_CLIENT_IT
Which returns a response like this:
{"http_mp3_128_url":"https://ec-media.soundcloud.com/2gNVBYiZ06bU.128.mp3?ff61182e3c2ecefa438cd021sdf02d0e385713f0c1faf3b0339595664fe070de810d30a8e3a1186eda958909e9ed97799adfeceabc135efac83aee4271217a108450591db3b88\u0026AWSAccessKeyId=AKIAsdfJ4IAZE5EOIdsf7PA7VQ\u0026Expires=1374883403\u0026Signature=%2B1%2B7dfdfLN4NWP3C3bNF3gizSEVIU%3D"}
So that's how they hide their stream URL's. The only non obvious part is that they find the song ID, by hitting an API with the URL as a parameter. Same can be done with download URL's on tracks that support it.
If you goto SoundCloud.com and open up your debugger (Chrome is what I'm using). Look at the "Network" tab and you'll see a script calling audio?anonymous_id#########.
This is structured like a REST call, meaning they pass an id to a service on their backend, and that returns the audio output anonymously.
They Changed the Media center address and now they are streaming from the link like below.But the access to this url is restricted.
https://cf-media.sndcdn.com/Exbr0RDsakIP.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vRXhicjBSRHNha0lQLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE0NzgyOTg5NTV9fX1dfQ__&Signature=niisPQ5NHUqclFI9Mb-eH1BJvOC~0zUZme8CLMkocXMs2zCWe2U2~chPYDydJuYYV3iFtUjqwCK~t~~kQg2o5TKx0~iUSZ1E4ZNBbhvHJWqBliILbEd2gZzBnrHtS0nBNCMfIuUVWmkMtWAEWXI7NyvOBPqJab8KZR8qkFnleyzefHfssxPGWV8sW09en1VkjDRPasHRmc~w22lSpF3dWqZAFbocRFZGLS-h5eXj~Qin-kxMo2DgxHE0K-Svg4BPAJ83s408SkruRq3q3B46IBxmR4mDfx4U8T~tN1mvQZGWtXESm~rIY8K40ZSwdTlOE8eMiogFsjH5HzXvc3pBFA__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ