You now have to pay to use the google translate api. I'm happy to pay for the service but I can't find a way to use the tts. This is what I'm doing
var GoogleTranslate = function(){
var key = "myapikey"
this.speak = function(words) {
var url = "http://translate.google.com/translate_tts?tl=es&q=" + escape(words) + "&key=" + key
new Audio(url).play();
}
}
but when I do new GoogleTranslate().speak("hola")
The requests to http://translate.google.com/translate_tts never return a response. How do I get this working?
I haven't tried your code yet, so I'm not sure if you should be waiting for the sound to load before you can play it (most likely), but I've written an article about this service recently. The part that matters here is the following:
...if your browser forwards a Referer header with any value other than an empty string (meaning it tells the service which page you clicked the link on) then [Google] will return a 404 (Not Found) http error...
Read the entire article here: Embedding text-to-speech into HTML5 games
So in fact, the service is still there, you just need to hide your referer header. One way to do that is through creating a small gateway script. There's the source for one right in the article.
Related
I'm trying to write a web extension that stops the requests from a url list provided locally, fetches the URL's response, analyzes it in a certain way and based on the analysis results, blocks or doesn't block the request.
Is that even possible?
The browser doesn't matter.
If it's possible, could you provide some examples?
I tried doing it with Chrome extensions, but it seems like it's not possible.
I heard it's possible on mozilla though
I think that this is only possible using the old webRequestBlocking API which Chrome is removing as a part of Manifest v3. Fortunately, Firefox is planning to continue supporting blocking web requests even as they transition to manifest v3 (read more here).
In terms of implementation, I would highly recommend referring to the MDN documentation for webRequest, in particular their section on modifying responses and their documentation for the filterResponseData method.
Mozilla have also provided a great example project that demonstrates how to achieve something very close to what I think you want to do.
Below I've modified their background.js code slightly so it is a little closer to what you want to do:
function listener(details) {
if (mySpecialUrls.indexOf(details.url) === -1) {
// Ignore this url, it's not on our list.
return {};
}
let filter = browser.webRequest.filterResponseData(details.requestId);
let decoder = new TextDecoder("utf-8");
let encoder = new TextEncoder();
filter.ondata = event => {
let str = decoder.decode(event.data, {stream: true});
// Just change any instance of Example in the HTTP response
// to WebExtension Example.
str = str.replace(/Example/g, 'WebExtension Example');
filter.write(encoder.encode(str));
filter.disconnect();
}
// This is a BlockingResponse object, you can set parameters here to e.g. cancel the request if you want to.
// See: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/BlockingResponse#type
return {};
}
browser.webRequest.onBeforeRequest.addListener(
listener,
// 'main_frame' means this will only affect requests for the main frame of the browser (e.g. the HTML for a page rather than the images, CSS, etc. that are loaded afterwards). You might want to look into whether you want to expand this.
{urls: ["*://*/*"], types: ["main_frame"]},
["blocking"]
);
Correction:
The above example only works properly if the response data fits in one chunk. If it is larger (and you still want to inspect the entirety of the response data), you would need to put all of the data into a buffer, and then work on it once all data has been received. See the document here for more information: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/StreamFilter/ondata#webextension_examples (the code section titled "This example combines all buffers into a single buffer" would be of most interest to you I think).
In terms of using this API to block responses, data is only returned from this URL if you call filter.write(), so if you don't like the response, you can simply not call it (just call filter.close()) and an empty response will be returned. You can also only return part of the full response body by filter.write()ing only the bits that you want to return.
I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
"2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.
I am writing a web crawler. I extracted heading and Main Discussion of the this link but I am unable to find any one of the comment (Ctrl+u -> Ctrl+f . Comment Text). I think the comments are written in JavaScript. Can I extract it?
RT are using a service from spot.im for comments
you need to do make two POST requests, first https://api.spot.im/me/network-token/spotim to get a token, then https://api.spot.im/conversation-read/spot/sp_6phY2k0C/post/353493/get to get the comments as JSON.
i wrote a quick script to do this
import requests
import re
import json
def get_rt_comments(article_url):
spotim_spotId = 'sp_6phY2k0C' # spotim id for RT
post_id = re.search('([0-9]+)', article_url).group(0)
r1 = requests.post('https://api.spot.im/me/network-token/spotim').json()
spotim_token = r1['token']
payload = {
"count": 25, #number of comments to fetch
"sort_by":"best",
"cursor":{"offset":0,"comments_read":0},
"host_url": article_url,
"canonical_url": article_url
}
r2_url ='https://api.spot.im/conversation-read/spot/' + spotim_spotId + '/post/'+ post_id +'/get'
r2 = requests.post(r2_url, data=json.dumps(payload), headers={'X-Spotim-Token': spotim_token , "Content-Type": "application/json"})
return r2.json()
if __name__ == '__main__':
url = 'https://www.rt.com/usa/353493-clinton-speech-affairs-silence/'
comments = get_rt_comments(url)
print(comments)
Yes, if it can be viewed with a web browser, you can extract it.
If you look at the source it is really an iframe that loads a piece of javascript, that then creates a new tag in the document with the source of that script tag loading bundle.js, which really contains the commenting software. This in turns then fetches the actual comments.
Instead of going through this manually, you could consider using for example webkit to create a headless browser that executes the javascript like an ordinary browser. Then you can scrape from that instead of having to manually make your crawler fetch the external resources.
Examples of such headless browsers could be Spynner, Dryscape, or the PhantomJS derived PhantomPy (the latter seems to be an abandoned project now).
Anyone figured out a way to get number of video views with YouTube API into Google Docs?
Up until recently I was successfully using the method outlined in the YouTube View Count in Google Spreadsheet blog post by David Toy.
But it stopped working recently and I began to get the below error:
Request failed for https://gdata.youtube.com/feeds/api/videos/Prv2-9U39K8?v=2&alt=json returned code 410.
Truncated server response: <errors xmlns='http://schemas.google.com/g/2005'>
<error><domain>GData</domain><code>NoLongerAvailableException</code>
<internalReason>No longer avai...
(use muteHttpExceptions option to examine full response) (line 2, file "Code")
Has anyone run into this? If so any ideas on how to fix it would be really helpful.
Thanks!
The script from that blog post uses v2 of the API which has now been deprecated and switched off. The new version would be:
function getYoutubeViews(videoId){
var url = "https://www.googleapis.com/youtube/v3/videos?part=statistics&id=" + videoId;
url = url + "&key={YOUR-API-KEY}";
var videoListResponse = UrlFetchApp.fetch(url);
var json = JSON.parse(videoListResponse.getContentText());
return json["items"][0]["statistics"]["viewCount"];
}
In order to make this work you will need an API key. Here is the guide on the YouTube developers site which explains how to get the key. Once you have obtained your key simply replace {YOUR-API-KEY} in the code above with it.
To call the function from a Google spreadsheet, select a cell and enter: =getYoutubeViews("{YOUR-VIDEO-ID}"). For instance, =getYoutubeViews("IiPJsI8pl8Q").
tl;dr: A bookmarklet that opens in a new tab: random link (with specified multiple html-classes) from a specified domain and code that works with current logins. Thank you.
short version of butchered code:
javascript:
(
var % 20 site = domain.com
function() {
window.location.host == site
void(window.open(document.links[Math.floor(document.querySelectorAll("a.class1, a.class2"))].href, '_blank'))
}();
//beautified with: http://jsbeautifier.org/
To whom it may concern:
I have searched around for a while and even considered switching services but although some come close or are similar to my particular request, none have served to address everything the request entails.
Execute the script on a specific domain even when no page from said domain is currently open. If login authentication for attaining the information or data for execution is required, read or work in conjunction with existing session.
Fetch from a specific domain host, a random link out of all links on that domain with a certain html-class (or indeed otherwise) using preferably, css-selectors.
Open the results in a new tab.
From butchering such similarities, the result became something like this:
//bookmarklet
javascript:
//anonymous function+ wrapped code before execution
(
// function global variables for quick substitution
var %20 site = domain.com
function(){
//set domain for script execution
window.location.host == site
//open new tab for
void(window.open(document.links
//random link
[Math.floor
//with specific classes (elements found with css selectors)
(document.querySelectorAll("a.class1, a.class2"))
]//end random-query
.href,'_blank' //end page-open
)//end link-open
)//end "void"
}//end function defintion
//execute
();
//(tried) checked with:
//http://www.javascriptlint.com/online_lint.php
Lastly, i have attained at most, basic css knowledge. I apologise if this request has anybody headdesking, palming or otherwise in gtfo mode. It is only too sad there is apparently no tag for "Warning: I DIY-ed this stuff" in StackExchange. However, i still would like answers that go into a bit of depth of explaining why and what each correction and modification is.
Thank you presently, for your time and effort.
Theoretically, the following code should do what you want:
window.addEventListener('load', function ( ) {
var query = 'a.class1[href], a.class2[href]';
var candidates = document.querySelectorAll(query);
var choice = Math.floor(Math.random() * candidates.length);
window.open(candidates.item(choice).href, 'randomtab');
}, true);
window.location.href = 'http://domain.com';
But it doesn't, because the possibility to retain event listeners across a page unload could be abused and browsers protect you against such abuse.
Instead, you can manually load the domain of your choice and then click a simpler bookmarklet with the following code:
var query = 'a.class1[href], a.class2[href]';
var candidates = document.querySelectorAll(query);
var choice = Math.floor(Math.random() * candidates.length);
window.open(candidates.item(choice).href, 'randomtab');
You could wrap the above in javascript:(function ( ) { ... })(); and minify as before, but it already works if you just minify it and only slap a javascript: in front.
I understand your situation of being an absolute beginner and posting "DIY" code, but I'm still not going to explain step-by-step why this code works and yours doesn't. The first version of the code above is complex to explain to a beginner, and the list of issues with the code in the question is too long to discuss all of them. You'll be better off by studying more Javascript; a good resource with tutorials is MDN.