I am trying to hide the URL of my audio streams for my HTML5 player and was really struggling to think of a way to do so and then I realised, soundcloud must hide the URL's of their streams. So i went onto soundcloud, opened up the console and played a track but I couldn't see any obvious way that the URL is hidden. After this I took a look at the DOM tree to see if there was any kind of audio information in there but I found nothing! There's not even an ID for the player/audio so i'm very confused as to how soundcloud have done it.
Now I have done as best as I can so far with hiding the audio URL. I have put an ID in the DOM for the track, got that ID when the play button is clicked and retrieved the URL for that ID from the database. The obvious problem with that is that anyone willing enough can just go to the console and get the URL from the network events.
I am not trying to break past soundcloud's security to download tracks I shouldn't be. I'm just curious as to how they've hidden the URL. Now i'm also curious as to how each track is distinguished as there's nothing in the DOM distinguishing them (not that I found on my brief look anyway).
So, in short, does anyone have any ideas on how soundcloud has achieved this or how this could be achieved?
Soundcloud is pretty much a pure JS site.
As you said, there is no ID of the song loaded with the HTML. The way songs are recognized is by the page URL. The is done via. this url (example):
https://api.sndcdn.com/resolve?url=https%3A//soundcloud.com/hoodinternet/joywave-tongues-hood-internet-remix&_status_code_map%5B302%5D=200&_status_format=json&client_id=YOUR_CLIENT_ID
This returns something like this:
{"status":"302 - Found","location":"https://api.soundcloud.com/tracks/100270342?client_id=YOUR_CLIENT_ID"}
Next up it loads the location URL, from the JSON above. This returns a bunch of information about the track, including:
stream_url: "https://api.soundcloud.com/tracks/100270342/stream"
Then it loads this URL:
https://api.sndcdn.com/i1/tracks/100270342/streams?client_id=YOUR_CLIENT_IT
Which returns a response like this:
{"http_mp3_128_url":"https://ec-media.soundcloud.com/2gNVBYiZ06bU.128.mp3?ff61182e3c2ecefa438cd021sdf02d0e385713f0c1faf3b0339595664fe070de810d30a8e3a1186eda958909e9ed97799adfeceabc135efac83aee4271217a108450591db3b88\u0026AWSAccessKeyId=AKIAsdfJ4IAZE5EOIdsf7PA7VQ\u0026Expires=1374883403\u0026Signature=%2B1%2B7dfdfLN4NWP3C3bNF3gizSEVIU%3D"}
So that's how they hide their stream URL's. The only non obvious part is that they find the song ID, by hitting an API with the URL as a parameter. Same can be done with download URL's on tracks that support it.
If you goto SoundCloud.com and open up your debugger (Chrome is what I'm using). Look at the "Network" tab and you'll see a script calling audio?anonymous_id#########.
This is structured like a REST call, meaning they pass an id to a service on their backend, and that returns the audio output anonymously.
They Changed the Media center address and now they are streaming from the link like below.But the access to this url is restricted.
https://cf-media.sndcdn.com/Exbr0RDsakIP.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vRXhicjBSRHNha0lQLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE0NzgyOTg5NTV9fX1dfQ__&Signature=niisPQ5NHUqclFI9Mb-eH1BJvOC~0zUZme8CLMkocXMs2zCWe2U2~chPYDydJuYYV3iFtUjqwCK~t~~kQg2o5TKx0~iUSZ1E4ZNBbhvHJWqBliILbEd2gZzBnrHtS0nBNCMfIuUVWmkMtWAEWXI7NyvOBPqJab8KZR8qkFnleyzefHfssxPGWV8sW09en1VkjDRPasHRmc~w22lSpF3dWqZAFbocRFZGLS-h5eXj~Qin-kxMo2DgxHE0K-Svg4BPAJ83s408SkruRq3q3B46IBxmR4mDfx4U8T~tN1mvQZGWtXESm~rIY8K40ZSwdTlOE8eMiogFsjH5HzXvc3pBFA__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ
Related
You would think my problem would be so commonplace that there would be solutions all over the internet for it. But I can't find anything that really answers my question.
Let me summarise my situation:
I am using Open UI5.
I am coding an app which retrieves documents from various external websites. I want to display these documents inside my app, and not navigate to them, so I display the documents in an iframe. Haven't found any other way.
Some filetypes can be displayed natively, such as PDFs. Others, like Word, cannot - the easiest way I have found of displaying these is by using Google Docs, which implies changing the URL of the iframe's src from this :
http://example.com/my-target-doc.docx
to this:
http://docs.google.com/gview?url=example.com/my-target-doc.docx&embedded=true
Some of the external domains I retrieve the documents from require authentication. Therefore, I cannot set the iframe's src to http://docs.google.com/gview?url=example.com/my-target-doc.docx&embedded=true directly - Google docs would attempt to display the authentication page. I must keep the original URL, and then, once the user's authenticated, replace the document URL with the Google docs version of the same URL.
What I am trying to do, then, is use the iframe's "onload" event to get the currently loaded page's address and, if it is a .doc/.docx/.ppt etc, replace that same URL with the GD version of the URL.
The difficulty is that there is no extension at the end of the URL which points to the document - none of the URLs I need to use end with ".doc", ".ppt" or whatever, so parsing the URL is out.
So this is my question : Is there a way in Javascript to get the type of the content being returned? To be fair, I am pretty doubtful there is. Other ideas or alternatives are welcome. I am still actively looking for some.
Thanks!
Did you already look at the Content-type HTTP header? This can be read with JS, but you probably have to request the file asynchronously for that.
I would like to proof-of-concept something like the following: POST a message with a file name in it to my website (this POST could be from many types of devices, and from anywhere). My website 'gets' the message and loads the file that matches the incoming filename. It keeps 'listening' so that I can continue to pass filename strings for other images to load.
For the test, I will just be displaying jpegs in the website (assume all of the images would already exist on my website).
Code solution would be GREAT, but knowing that can be greedy - what are the terms I should be looking for to accomplish this so that I may Google and cobble together something?
I am attempting to make a request to the SoundCloud API. Then when I get the response I set the stream_url as the source of an < audio > element.
This works:
http://matthiasdv.org/beta/
But not always... When you search for 'Bonobo' for example, you can play the first few tracks without any issue. But when you try to play 'London Grammar - Hey Now (Bonobo remix)' - the 7th result - it won't play. It throws no errors whatsoever.
I've been tinkering around with Chrome's webdev-tools and under the network tab I see the requests being made. I found that tracks that DO play have a short Request Url, like this:
https://ec-media.sndcdn.com/vR5ukuOzyLbw.128.mp3?f10880d39085a94a0418a7ef69b03d522cd6dfee9399eeb9a522029f6bfab939b9ae57af14bba24e44e1542924c205ad28a52352010cd0e7dd461e9243ab54dc0f0bba897d
And the ones that don't look like this:
https://cf-media.sndcdn.com/8PCswwlkswOd.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vOFBDc3d3bGtzd09kLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE0MzM0Mjc2MDN9fX1dfQ__&Signature=cD-XVhnvQnIATkfrBDDVy0Q7996C8DymwxRLwBBduab0~L0MynF1ftcMky~21T8Q-gCZ2~dMK8dz7uVxvJTIJgXPxEZvhNtbvescMK6iFMg-xSAty-4OhJYjrIZJ2j8NE4uNA4Ml7MWbWcQw4KtUtpZitOQuguS3DPFDII3VF-dvzb2L~xG-G8Uu3uOnI1WhnAAfhf1QWMO7swwB89HtcCiuVBmfluG28ELrJEq-au8mqIMB3sLTno6nUuTtpHXR2ayXBsYcYLLJVXa3Ul8p1rhLS5XWHKWXY8xug4jwey27~C5PVAomK6Z5lJx-mz-0zYs4riUYtl0zACbZ1OfwTQ__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ
Now at first glance I figured it was an encoding issue, but wrapping a quick encodeURI() around the ajax url did not work.
Furthermore I do not understand where these urls come from. In my code I am directing my ajax request towards, for example:
https://api.soundcloud.com/tracks/140326936/stream?client_id=5c6ceaa17461a1c79d503b345a26a54e
Thus, the request url in the GET request (as found under 'network' in Chrome's webdev tools) makes no sense to me. Is SoundCloud redirecting get requests to a CDN-host? One more thing I've noticed is that each time TWO requests are fired instead of one. The first one is always canceled and contains a 'Provisional headers are shown' warning. I believe this is because I am setting crossOrigin = "anonymous", otherwise certain browsers would not load the content.
What I guess may cause the problem is that when the url is set as the src attribute of the element an evenListener is fired in the dancer.js library, which handles the Audio Api and the playback (https://github.com/jsantell/dancer.js/). It may be that encodeURI() is required somewhere in the library.
I decided to ask the question anyhow because I don't understand how the Request Urls's above are formed and why two, instead of one, requests are being fired and why the first is always cancelled.
Any hints which my solve the playback issue are more than welcome too...
When you run the request for
https://api.soundcloud.com/tracks/140326936/stream?client_id=5c6ceaa17461a1c79d503b345a26a54e
you get a HTTP 302 Found response from the server, which is a URL redirect (http://en.wikipedia.org/wiki/HTTP_302). This will cause your browser to load from the new URL that the server returns, and thus the two requests you see. The server basically says "yeah, I know where to find that file, ask that guy over there".
The reason why one works and the other not, I'd think, is that https://ec-media.sndcdn.com has the Access-Control headers set while https://cf-media.sndcdn.com doesn't. This is an issue with the server configuration and unfortunately nothing you can control from the client side. Dunno if it's a deliberate move by soundcloud or if it's something you could ask them about.
I have used php simple html dom to no success on this issue.
Now I have gone to DOMDocument and DOMXpath and this does seem promising.
Here is my issue:
I am trying to scrape data from a page which is loaded via a web service request after the page initially shows. It is only milliseconds but because of this, normal scraping shows a template value as opposed to the actual data.
I have found the endpoint url using chrome developer network settings. So if I enter that url into the browser address bar the data displays nicely in JSON format. All Good.
My problem arises because any time the site is re-visited or the page refreshed, the suffix of the endpoint url is randomly-generated so I can't hard-code this url into my php file. For example the end of the url is "?=253648592" on first visit but on refresh it could be "?=375482910". The base of the url is static.
Without getting into headless browsers (I tried and MY head hurts!) is there a way to have Xpath find this random url when the page loads?
Sorry for being so long-winded but I wanted to explain as best I could.
It's probably much easier and faster to just use a regex if you only need one item/value from the HTML. I would like to give an example but therefor I would need a more extended snippet of how the HTML looks like that contains the endpoint that you want to fetch.
Is it possible to give a snippet of the HTML that contains the endpoint?
Note: The question is not how to fix the problem, as that is documented elsewhere on SO (e.g., Integrating Facebook to the leads to blank pages on some browsers / fb_xd_fragment).
1) What causes this, and under what conditions is it triggered?
2) More importantly, does this affect end users at all? For instance, how does this bug affect the URL shared by someone who clicks the FB Like button? If someone clicks the FB Like button from URL A, does URL A still get shared (but with "fb_xd_fragment" appended), or does URL A become your root URL (with "fb_xd_fragment")? In our logs, all the URLs appear as the root URL with "fb_xd_fragment" appended, so we're not sure if this is because people are clicking the Like button from the home page, or if all the shared URLs get morphed into the root URL.
Basically, what happens is whenever you use the JS API it opens your site in another iframe to use as a cross-domain receiver. What you can do is set a custom channel URL and it will use that instead. If seeing this bothers you, you can set a custom channel url. More information on http://developers.facebook.com/docs/reference/javascript/FB.init/