I'm trying to create an application to transcribe some wav files using cloud functions and cloud speech API. The official document shows how to do this ( https://cloud.google.com/speech-to-text/docs/async-recognize). However, cloud functions have processing time limit (up to 540 seconds), and some long wav files might exceed the time for waiting transcription API. I'm searching for a resuming way.
The official document shows the following code. (I'm using node for cloud functions)
// Detects speech in the audio file. This creates a recognition job that you
// can wait for now, or get its result later.
const [operation] = await client.longRunningRecognize(request);
// Get a Promise representation of the final result of the job
const [response] = await operation.promise();
client.longRunningRecognize() sends a request and returns request information in a few seconds, and operation.promise() waits transcription API finishes. However, it may take more than 540 seconds for large files, and the process may be killed at this line. So somehow I want to resume processing using 'operation' object in another process. I tried serializing the 'operation' object to a file and loading it afterwards, but it can not include functions and operation.promise() is lost. How can I solve this problem?
Here is how to do it (the code is in PHP, but the idea classes are the same)
$client = new SpeechClient([
'credentials' => json_decode(file_get_contents('keys.json'), true)
]);
$operation = $client->longRunningRecognize($config, $audio);
$operationName = $operation->getName()
Now the job has started and you can save "$operationName" somewhere (say in DB) to be used in another process.
In another process
$client = new SpeechClient([
'credentials' => json_decode(file_get_contents('keys.json'), true)
]);
CloudSpeech::initOnce();
$newOperationResponse = $speechClient->resumeOperation($name, 'LongRunningRecognize');
if ($newOperationResponse->operationSucceeded()) {
$result = $newOperationResponse->getResult();
}
...
Notice: Make sure to put "LongRunningRecognize" as resume operation name and NOT "longRunningRecognize" (first letter should be uppercase - contrary to documentation https://github.com/googleapis/google-cloud-php-speech/blob/master/src/V1/Gapic/SpeechGapicClient.php#L312)
Otherwise the response will be protobuf encoded (https://github.com/googleapis/google-cloud-php-speech/blob/master/src/V1/Gapic/SpeechGapicClient.php#L135)
This answer helped to find the final solution https://stackoverflow.com/a/57209441/932473
If your job is going to take more than 540 seconds, Cloud Functions is not really the best solution for this problem. Instead, you may want to consider using Cloud Functions as just a triggering mechanism, then offload the work to App Engine or Compute Engine using pubsub to send it the relevant data (e.g. the location of the file in Cloud Storage, and other metadata needed to make the request to recognize speech.
Related
I have a client who has 130k books (~4 terabyte) and he wants a site to upload them into it and make an online library. So, how can I make it possible for him to upload them automatically or at least upload multiple books per time? I'll be using Node.js + mysql
I might suggest using Object Storage on top of MySQL to speed up book indexing and retrieval- but that is entirely up to you.
HTTP is a streaming protocol, and node, interestingly, has streams.
This means that, when you send a HTTP request to a node server, internally, node handles it as a stream. This means, theoretically, you can upload massive books to your web server while only having a fraction of it in memory at a time.
The first thing is- book can be very large. In order to efficiently process them, we must process the metadata (name, author, etc) and content seperately.
One example, using an express-like framework, could be: pseudo-code
app.post('/begin/:bookid', (Req, Res) => {
ParseJSON(Req)
MySQL.addColumn(Req.params.bookid,Req.body.name,Req.body.author)
})
app.put('/upload/:bookid', (Req, Res) => {
// If we use MySQL to store the books:
MySQL.addColumn(Req.params.bookid,Req.body)
// Or if we use object storage:
let Uploader = new StorageUploader(Req.params.bookid)
Req.pipe(Uploader)
})
If you need inspiration, look at how WeTransfer has created their API. They deal with lots of data daily- their solution might be helpful to you.
Remember- your client likely won't want to use Postman to upload their books. Build a simple website for them in Svelte or React.
I am working on usecase of generating schedule functionality where I am using lambda to generate specific report to write CSV to S3 bucket. (used one lambda for generating report - Generating reports have database lookups and performs some operation and generate CSV from it. And writing it on S3 bucket)
lambda is getting trigger from SQS and performing operation and generating file on S3, working fine.
Extending the same, Now I want to expose one api which will hit the lambda with the same parameters, to obtain report information and in return I have to send data(to frontend with generation of CSV file).
Now I am confuse to how to make it work in both the way, such like it handle SQS and this APIs as well, how can I make it work.
Like making this async to sync (processor)
because it should be quick, but at the same time SQS and this API too has to work without compromising timing for generation of report.
Below is my code where actually I am processing the report generation and able to make write on S3 bucket(works with SQS trigger)
Now looking for API solution as mentioned above.
/**
* handler function
*/
exports.handler = async (event, context) => {
for (const { receiptHandle, messageId, body } of event.Records) {
// --------
// Performing logic to s write CSV data to S3 Bucket and once its done
// sending trigger SQS queue to another server.
// --------
return { "meta": { "message": "success" }, "data": JSON.stringify('Sent to SQS Successfully...') };
}
};
1st thing How to make it work as I have to identify the call is from API or SQS..
[A note I am using react for frontend, the API call return might be having tons of data, which will process it on CSV.]
then process it accordingly. Can anyone help me with this? I am trying to achieve something so that I can avoid duplicating lambda, (I believe it can solve this issue..)
Can anyone help me with this??
Edit -
Meaning_:> If SQS trigger sends message data, carried reports name and IDs and lambda functions will be generating those reports IDs data and writing/uploading it on S3 bucket.(done by SQS on trigger)
Now, I am looking for the same with same lamdba, that I make API call with same params request and it will return the data, and I can generate CSV on frontend when every the API gets hit.
Hi there and thanks for reading this.
I'm learning how to work with Dialogflow and Firebase Realtime Database and I like these platforms a lot.
I created a very simple DB structure on Firebase with 7 fields and in my agent I query them with a very simple fulfillment.
It seems to be working but every "first query" that I do the next day seems to last about 5000ms so the DB doesn't respond: starting from the second query it works almost in real time so it seems to be sleeping or something.
In my today test at the first query I read this in the Dialogflow log: "webhook_latency_ms": 4663 but at least it worked, generally it doesn't.
It seems like there's some uncertainty about getting data from the DB.
Any suggestion would be very appreciated.
The realtime database structure is this:
serviceAccount
bitstream: "pluto"
cloud: "paperino"
data center: "gastone"
datacenter: "gastone"
ull: "bandabassotti"
vula: "minnie"
wlr: "pippo"
and this is how I query Firebase:
const servizi = agent.parameters.elencoServiziEntity;
return admin.database().ref("serviceAccount").once("value").then((snapshot) =>
{
var accountName = snapshot.child(`${servizi}`).val();
agent.add(`L'Account Manager del Servizio ${servizi} si chiama: ${accountName}`);
console.log(`${servizi}`);
});
The webhook latency isn't always related to the database call - it includes the time that may be required to start the webhook itself. If you're using Firebase Cloud Functions or the Dialogflow Built-In Code Editor (which uses Google Cloud Functions), there is a "cold start" time required to start the function. If your webhook is running somewhere else, on AWS Lambda for example, you may have network latency in addition to the cold start time.
There is very little you can do about this. If you're running with one of Google's Cloud Function solutions, make sure you're running them in the Central-1 region, which is close to where Dialogflow also runs. To avoid the cold start completely - run a server.
Usually, however, the latency and cold start time shouldn't be that long. Which suggests that your code is also taking a while to run. You may wish to look at your logs to see why execution time is taking so long - the call to the Firebase RTDB may be part of it, but there may be other things causing a slowdown that you don't show in your code.
One thing you are doing in your call to Firebase is pulling in the entire record, instead of just pulling in the one field that the user is asking for. This does require more data to be marshaled, which takes more time. (Is it taking a lot more time? Probably not. But milliseconds count.)
If you just need the one field from the record the user has asked for, you can get a reference to the child itself and then do the query on this reference. It might look like this:
const servizi = agent.parameters.elencoServiziEntity;
return admin.database()
.ref("serviceAccount")
.child(servizi)
.once("value")
.then((snapshot) => {
const accountName = snapshot.val();
agent.add(`L'Account Manager del Servizio ${servizi} si chiama: ${accountName}`);
console.log(`${servizi}`);
});
A web client should only expose some features when a backend API is up and running. Therefor, I'm looking for a clean way to monitor the availability of this backend.
As a quick fix, I made a timer-based function that performs a basic GET on the API root. It's not very clean, generates lots of traffic and pollutes the javascript console with errors (in case of server down).
How should one deal with such situation?
You can trigger something in the lines of this when you need it:
function checkServerStatus()
{
setServerStatus("unknown");
var img = document.body.appendChild(document.createElement("img"));
img.onload = function()
{
setServerStatus("online");
};
img.onerror = function()
{
setServerStatus("offline");
};
img.src = "http://myserver.com/ping.gif";
}
Make ping.gif small (1 pixel) to make it as fast as possible.
Ofc you can do it more smoothly by accessing the API that returns true and keeps a really small response time, but that requires you to do some coding in back-end this simply needs you to place a 1-pixel gif image in a correct directory on a server. You can use any picture already present on the server, but expect more traffic and time as image grows larger.
Now put this in some function that calls it with delay, or simply call this when you need to check status, it's up to you.
If you need a server to send to your app a notification when it's down then you need to implement this:
https://en.wikipedia.org/wiki/Push_technology
Ideally, you would have high-reliability server that has fast response rate and is really reliable to be pinging the desired server in some interval to determine whether it up then use the push to get that information to your app. This way that 3rd server would only send you a push if a status of your app server has changed. Ideally, this server's request has a high priority on your app server queue and servers are well connected and close to each other but not on the same network in case that fails.
Recommendation:
First approach should do you good since it's simple to implement and requires the least amount of knowledge.
Consider second if:
You need a really small interval of checking making your application slower and network traffic higher
You have multiple applications that need the same - making load heavier on both each application, network AND the server. The second approach lets you use single ping to determine truth for all apps.
In order to limit number of request, simple solution can be use of server-sent events. This protocol used on top of HTTP allow server to push multiple updates in response of the same client request.
Client side code (javascript) :
var evtSource = new EventSource("backend.php");
evtSource.onmessage = function(e) {
console.log('status:' + e.data);
}
evtSource.onerror = function(e) {
// add some retry then display error to the user
}
Backend code (PHP, also supported by other languages)
header("Content-Type: text/event-stream\n\n");
while (1) {
// Each 30s, send OK status
echo "OK\n";
ob_flush();
flush();
sleep(30);
}
In both case it will limit number of request (only 1 per "session") but you will have 1 socket per client opened, which can be also to heavy for your server.
If you really want to lower the workload, you should delegate it to external monitoring platform which can expose API to publish backend status.
Maybe it already exists if your backend is hosted on cloud platform.
I want to gather some information using the visitors of my websites.
What I need is for each visitor to ping 3 different hostnames and then save the following info into a DB.
Visitor IP, latency 1,latency 2, latency 3
Of course everything has to be transparent for the visitor without interrupting him in any way.
Is this possible? Can you give me an example? Are there any plugins for jQuery or something to make it easier
EDIT
This is what I have so far jsfiddle.net/dLVG6 but the data is too random. It jumps from 50 to 190
This is going to be more of a pain that you might think.
Your first problem is that Javascript doesn't have ping. Mostly what Javascript is good at is HTTP and a few cousin protocols.
Second problem is that you can't just issue some ajax requests and time the results (that would be way too obvious). The same origin policy will prevent you from using ajax to talk to servers other than the one the page came from. You'll need to use JSONP, or change the src of an image tag, or something else more indirect.
Your third problem is that you don't want to do anything that will result in a lot of data being returned. You don't want data transfer time or extensive server processing to interfere with measuring latency.
Fourth, you can't ask for URLs that might be cached. If the object happened to be in the cache, you would get really low "latency" measurements but it wouldn't be meaningful.
My solution was to use an image tag with no src attribute. On document load, set the src to point to a valid server but use an invalid port. Generally, it is faster for a server to simply reject your connection than to generate a proper 404 error response. All you have to do then is measure how long it takes to get the error event from the image.
From The Filddle:
var start = new Date().getTime();
$('#junkOne').attr('src', 'http://fate.holmes-cj.com:8886/').error(function () {
var end = new Date().getTime();
$('#timer').html("" + (end - start) + "ms");
});
The technique could probably be improved. Here's some ideas:
Use IP address instead of DNS host name.
Do the "ping" multiple times, throw out the highest and lowest scores, then average the rest.
If your web page has a lot heavy processing going on, try to do the tests when you think the UI load is lightest.
With jQuery you could:
$.ajax(url,settings)(http://api.jquery.com/jQuery.ajax/) and take the time from beforeSend and on complete via Date.now(), subtract those times -> then you have the time for the request (not excactly the "Ping" though)
2021:
Tried this again for a React app I'm building. I don't think the accuracy is too great.
const ping = () => {
var start = new Date().getTime();
api.get('/ping').then((res) => {
console.log(res)
var end = new Date().getTime();
console.log(`${end-start} ms`)
}, (err) => {
console.log(err)
})
};
Wrote my own little API, but I suppose there's just way too much going on during the request.
In terminal, I get about 23ms ping to my server.. using this it shoots up to like 200-500ms.