tensorflowjs - universal sentence encoder taking 8 seconds to encode input - javascript

I'm running some tests using tensorflow. I have this code to provide the inputData to train the model:
import * as use from '#tensorflow-models/universal-sentence-encoder';
const encodeData = async (data) => {
const texts = data.map(d => d.text.toLowerCase())
const trainingData = await use.load().then(async (model) => {
return model.embed(texts)
.then(embeddings => {
return embeddings
})
})
.catch((err) => console.log(err))
return trainingData
}
It works okay tho, the problem is when I try to predict something using a text, I also need to encode the text in order to predict. The problem is when I'm doing for example:
const dummyData = await encodeData([{ text: 'watching movie tonight?' }])
console.log(model.predict(dummyData).dataSync())
it takes like 8 seconds to encode this string to provide a prediction.
So let's say I want to put this website live, I can't let the user waiting 8 seconds to provide a result, I guess it is too long. Is it something I could do or maybe another faster library to use instead of this one? It is really slow.

Related

Is it possible to loop through an API, print separate results, and then combine them into a single variable?

I’m trying to read the sentiment of multiple Reddit posts. I’ve got the idea to work using 6 API calls but I think we can refactor it to 2 calls.
The wall I’m hitting - is it possible to loop through multiple APIs (one loop for each Reddit post we’re scrapping), print the results, and then add them into a single variable?
The last part is where I’m stuck. After looping through the API, I get separate outputs for each loop and I don’t know how to add them into a single variable…
Here’s a simple version of what the code looks like:
import React, { useState, useEffect } from 'react';
function App() {
const [testRedditComments, setTestRedditComments] = useState([]);
const URLs = [
'https://www.reddit.com/r/SEO/comments/tepprk/is_ahrefs_worth_it/',
'https://www.reddit.com/r/juststart/comments/jvs0d1/is_ahrefs_worth_it_with_these_features/',
];
useEffect(() => {
URLs.forEach((URL) => {
fetch(URL + '.json').then((res) => {
res.json().then((data) => {
if (data != null) setTestRedditComments(data[1].data.children);
});
});
});
}, []);
//This below finds the reddit comments and puts them into an array
const testCommentsArr = testRedditComments.map(
(comments) => comments.data.body
);
//This below takes the reddit comments and terns them into a string.
const testCommentsArrToString = testCommentsArr.join(' ');
console.log(testCommentsArrToString);
I've tried multiple approaches to adding the strings together but I've sunk a bunch of time. Anyone know how this works. Or is there a simpler way to accomplish this?
Thanks for your time and if you need any clarification let me know.
-Josh
const URLs = [
"https://www.reddit.com/r/SEO/comments/tepprk/is_ahrefs_worth_it/",
"https://www.reddit.com/r/juststart/comments/jvs0d1/is_ahrefs_worth_it_with_these_features/",
];
Promise.all(
URLs.map(async (url) => {
const resp = await fetch(url + ".json");
return resp.json();
})
).then((res) => console.log(res));
I have used Promise.all and got the response and attached a react sandbox with the same.
Based on your requirements, you can use state value or you can prepare your api response before setting it to state.

Pulling Articles from Large JSON Response

I'm trying to code something which tracks the Ontario Immigrant Nominee Program Updates page for updates and then sends an email alert if there's a new article. I've done this in PHP but I wanted to try and recreate it in JS because I've been learning JS for the last few weeks.
The OINP has a public API, but the entire body of the webpage is stored in the JSON response (you can see this here: https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body)
Looking through the safe_value - the common trend is that the Date / Title is always between <h3> tags. What I did with PHP was create a function that stored the text between <h3> into a variable called Date / Title. Then - to store the article body text I just grabbed all the text between </h3> and </p><h3> (basically everything after the title, until the beginning of the next title), stored it in a 'bodytext' variable and then iterated through all occurrences.
I'm stumped figuring out how to do this in JS.
So far - trying to keep it simple, I literally have:
const fetch = require("node-fetch");
fetch(
"https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body"
)
.then((result) => {
return result.json();
})
.then((data) => {
let websiteData = data.body.und[0].safe_value;
console.log(websiteData);
});
This outputs all of the body. Can anyone point me in the direction of a library / some tips that can help me :
Read through the entire safe_value response and break down each article (Date / Title + Article body) into an array.
I'm probably then just going to upload each article into a MongoDB and then I'll have it checked twice daily -> if there's a new article I'll send an email notif.
Any advice is appreciated!!
Thanks,
You can use regex to get the content of Tags e.g.
/<h3>(.*?)<\/h3>/g.exec(data.body.und[0].safe_value)[1]
returns August 26, 2020
With the use of some regex you can get this done pretty easily.
I wasn't sure about what the "date / title / content" parts were but it shows how to parse some html.
I also changed the code to "async / await". This is more of a personal preference. The code should work the same with "then / catch".
(async () => {
try {
// Make request
const response = await fetch("https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body");
// Parse response into json
const data = await response.json();
// Get the parsed data we need
const websiteData = data.body.und[0].safe_value;
// Split the html into seperate articles (every <h2> is the start of an new article)
const articles = websiteData.split(/(?=<h2)/g);
// Get the data for each article
const articleInfo = articles.map((article) => {
// Everything between the first h3 is the date
const date = /<h3>(.*)<\/h3>/m.exec(article)[0];
// Everything between the first h4 is the title
const title = /<h4>(.*)<\/h4>/m.exec(article)[0];
// Everything between the first <p> and last </p> is the content of the article
const content = /<p>(.*)<\/p>/m.exec(article)[0];
return {date, title, content};
});
// Show results
console.log(articleInfo);
} catch(error) {
// Show error if there are any
console.log(error);
}
})();
Without comments
(async () => {
try {
const response = await fetch("https://api.ontario.ca/api/drupal/page%2F2020-ontario-immigrant-nominee-program-updates?fields=body");
const data = await response.json();
const websiteData = data.body.und[0].safe_value;
const articles = websiteData.split(/(?=<h2)/g);
const articleInfo = articles.map((article) => {
const date = /<h3>(.*)<\/h3>/m.exec(article)[0];
const title = /<h4>(.*)<\/h4>/m.exec(article)[0];
const content = /<p>(.*)<\/p>/m.exec(article)[0];
return {date, title, content};
});
console.log(articleInfo);
} catch(error) {
console.log(error);
}
})();
I just completed creating .Net Core worker service for this.
The value you are looking for is "metatags.description.og:updated_time.#attached.drupal_add_html_head..#value"
The idea is if the last updated changes you send an email notification!
Try this in you javascript
fetch(`https://api.ontario.ca/api/drupal/page%2F2021-ontario-immigrant-nominee-program-updates`)
.then((result) => {
return result.json();
})
.then((data) => {
let lastUpdated = data.metatags["og:updated_time"]["#attached"].drupal_add_html_head[0][0]["#value"];
console.log(lastUpdated);
});
I will be happy to add you to the email list for the app I just created!

How to stay within 2 GET requests/second per seconds with Axios (Shopify API)

I have about 650 products and each product has a lot of additional information relating to it being stored in metafields. I need all the metafield info to be stored in an array so I can filter through certain bits of info and display it to the user.
In order to get all the metafiled data, you need to make one API call per product using the product id like so: /admin/products/#productid#/metafields.json
So what I have done is got all the product ids then ran a 'for in loop' and made one call at a time. The problem is I run into a '429 error' because I end up making more than 2 requests per second. Is there any way to get around this like with some sort of queuing system?
let products = []
let requestOne = `/admin/products.json?page=1&limit=250`
let requestTwo = `/admin/products.json?page=2&limit=250`
let requestThree = `/admin/products.json?page=3&limit=250`
// let allProducts will return an array with all products
let allProducts
let allMetaFields = []
let merge
$(document).ready(function () {
axios
.all([
axios.get(`${requestOne}`),
axios.get(`${requestTwo}`),
axios.get(`${requestThree}`),
])
.then(
axios.spread((firstResponse, secondResponse, thirdResponse) => {
products.push(
firstResponse.data.products,
secondResponse.data.products,
thirdResponse.data.products
)
})
)
.then(() => {
// all 3 responses into one array
allProducts = [].concat.apply([], products)
})
.then(function () {
for (const element in allProducts) {
axios
.get(
`/admin/products/${allProducts[element].id}/metafields.json`
)
.then(function (response) {
let metafieldsResponse = response.data.metafields
allMetaFields.push(metafieldsResponse)
})
}
})
.then(function () {
console.log("allProducts: " + allProducts)
console.log("allProducts: " + allMetaFields)
})
.catch((error) => console.log(error))
})
When you hit 429 error, check for Retry-After header and wait for the number of seconds specified there.
You can also use X-Shopify-Shop-Api-Call-Limit header in each response to understand how many requests left until you exceed the bucket size limit.
See more details here: REST Admin API rate limits
By the way, you're using page-based pagination which is deprecated and will become unavailable soon.
Use cursor-based pagination instead.

How many requests can Node-Express fire off at once?

I have a script that is pulling 25,000 records from AWS Athena which is basically a PrestoDB Relational SQL Database. Lets say that I'm generating a request for each one of these records, which means I have to make 25,000 requests to Athena, then when the data comes back I have to make 25,000 requests to my Redis Cluster.
What would be the ideal amount of requests to make at one time from node to Athena?
The reason I ask is because I tried to do this by creating an array of 25,000 promises and then calling Promise.all(promiseArray) on it, but the app just hanged forever.
So I decided instead to fire off 1 at a time and use recursion to splice the first index out and then pass the remaining records to the calling function after the promise has been resolved.
The problem with this is that it takes forever. I took about an hour break and came back and there were 23,000 records remaining.
I tried to google how many requests Node and Athena can handle at once, but I came up with nothing. I'm hoping someone might know something about this and be able to share it with me.
Thank you.
Here is my code just for reference:
As a sidenote, what I would like to do differently is instead of sending one request at a time I could send 4, 5, 6, 7 or 8 at a time depending on how fast it would execute.
Also, how would a Node cluster effect the performance of something like this?
exports.storeDomainTrends = () => {
return new Promise((resolve, reject)=>{
athenaClient.execute(`SELECT DISTINCT the_column from "the_db"."the_table"`,
(err, data) => {
var getAndStoreDomainData = (records) => {
if(records.length){
return new promise((resolve, reject) => {
var subrecords = records.splice(0, )[0]
athenaClient.execute(`
SELECT
field,
field,
field,
SUM(field) as field
FROM "the_db"."the_table"
WHERE the_field IN ('Month') AND the_field = '`+ record.domain_name +`'
GROUP BY the_field, the_field, the_field
`, (err, domainTrend) => {
if(err) {
console.log(err)
reject(err)
}
redisClient.set(('Some String' + domainTrend[0].domain_name), JSON.stringify(domainTrend))
resolve(domainTrend);
})
})
.then(res => {
getAndStoreDomainData(records);
})
}
}
getAndStoreDomainData(data);
})
})
}
Using the lib your code could look something like this:
const Fail = function(reason){this.reason=reason;};
const isFail = x=>(x&&x.constructor)===Fail;
const distinctDomains = () =>
new Promise(
(resolve,reject)=>
athenaClient.execute(
`SELECT DISTINCT domain_name from "endpoint_dm"."bd_mb3_global_endpoints"`,
(err,data)=>
(err)
? reject(err)
: resolve(data)
)
);
const domainDetails = domain_name =>
new Promise(
(resolve,reject)=>
athenaClient.execute(
`SELECT
timeframe_end_date,
agg_type,
domain_name,
SUM(endpoint_count) as endpoint_count
FROM "endpoint_dm"."bd_mb3_global_endpoints"
WHERE agg_type IN ('Month') AND domain_name = '${domain_name}'
GROUP BY timeframe_end_date, agg_type, domain_name`,
(err, domainTrend) =>
(err)
? reject(err)
: resolve(domainTrend)
)
);
const redisSet = keyValue =>
new Promise(
(resolve,reject)=>
redisClient.set(
keyValue,
(err,res)=>
(err)
? reject(err)
: resolve(res)
)
);
const process = batchSize => limitFn => resolveValue => domains =>
Promise.all(
domains.slice(0,batchSize)
.map(//map domains to promises
domain=>
//maximum 5 active connections
limitFn(domainName=>domainDetails(domainName))(domain.domain_name)
.then(
domainTrend=>
//the redis client documentation makes no sense whatsoever
//https://redis.io/commands/set
//no mention of a callback
//https://github.com/NodeRedis/node_redis
//mentions a callback, since we need the return value
//and best to do it async we will use callback to promise
redisSet([
`Endpoint Profiles - Checkin Trend by Domain - Monthly - ${domainTrend[0].domain_name}`,
JSON.stringify(domainTrend)
])
)
.then(
redisReply=>{
//here is where things get unpredictable, set is documented as
// a synchronous function returning "OK" or a function that
// takes a callback but no mention of what that callback recieves
// as response, you should try with one or two records to
// finish this on reverse engineering because documentation
// fails 100% here and can not be relied uppon.
console.log("bad documentation of redis client... reply is:",redisReply);
(redisReply==="OK")
? domain
: Promise.reject(`Redis reply not OK:${redisReply}`)
}
)
.catch(//catch failed, save error and domain of failed item
e=>
new Fail([e,domain])
)
)
).then(
results=>{
console.log(`got ${batchSize} results`);
const left = domains.slice(batchSize);
if(left.length===0){//nothing left
return resolveValue.conat(results);
}
//recursively call process untill done
return process(batchSize)(limitFn)(resolveValue.concat(results))(left)
}
);
const max5 = lib.throttle(5);//max 5 active connections to athena
distinctDomains()//you may want to limit the results to 50 for testing
//you may want to limit batch size to 10 for testing
.then(process(1000)(max5)([]))//we have 25000 domains here
.then(
results=>{//have 25000 results
const successes = results.filter(x=>!isFail(x));
//array of failed items, a failed item has a .reason property
// that is an array of 2 items: [the error, domain]
const failed = results.filter(isFail);
}
)
You should figure out what redis client does, I tried to figure it out using the documentation but may as well ask my goldfish. Once you've reverse engineered the client behavior it is best to try with small batch size to see if there are any errors. You have to import lib to use it, you can find it here.
I was able to take what Kevin B said to find a much quicker way to query the data. What I did was change the query so that I could get the trend for all domains from Athena. I ordered it by domain_name and then sent it as a Node stream so that I could separate out each domain name into it's own JSON as the data was coming in.
Anyways this is what I ended up with.
exports.storeDomainTrends = () => {
return new Promise((resolve, reject)=>{
var streamObj = athenaClient.execute(`
SELECT field,
field,
field,
SUM(field) AS field
FROM "db"."table"
WHERE field IN ('Month')
GROUP BY field, field, field
ORDER BY field desc`).toStream();
var data = [];
streamObj.on('data', (record)=>{
if (!data.length || record.field === data[0].field){
data.push(record)
} else if (data[0].field !== record.field){
redisClient.set(('Key'), JSON.stringify(data))
data = [record]
}
})
streamObj.on('end', resolve);
streamObj.on('error', reject);
})
.then()
}

Node.js, bitfinex-api-node: How-to set-up correctly to process data from the websocket-connection

please apologize the unprecise title for this question, I am not an experienced programmer and even less so in node.js
My intent is a simple one: I want to use the bitfinex-api-node package (a node.js wrapper for bitfinex cryptocurrency exchange) that I installed via npm to read price data of various currency-pairs from the exchange to calculate better trading strategies.
The example code provided in the readme.md works fine, this is a stripped down version that creates a BFX-object which subscribes to a ticker of a given currency-pair and constantly outputs ticker-data:
const BFX = require('bitfinex-api-node')
const API_KEY = 'secret'
const API_SECRET = 'secret'
const opts = {
version: 2,
transform: true
}
const bws = new BFX(API_KEY, API_SECRET, opts).ws
bws.on('open', () => {
bws.subscribeTicker('BTCUSD')
})
bws.on('ticker', (pair, ticker) => {
console.log('Ticker:', ticker)
})
bws.on('error', console.error)
so far so good. Now for the sake of a simple example let's say I want to get the current price of two currency pairs (BTC/USD, ETH/USD) and add them an display the result. My obviously naive approach is like this:
const BFX = require('bitfinex-api-node')
const API_KEY = 'secret'
const API_SECRET = 'secret'
const opts = {
version: 2,
transform: true
}
const bws1 = new BFX(API_KEY, API_SECRET, opts).ws
const bws2 = new BFX(API_KEY, API_SECRET, opts).ws
var priceBTCUSD;
var priceETHBTC;
bws1.on('open', () => {
bws1.subscribeTicker('BTCUSD')
})
bws2.on('open', () => {
bws2.subscribeTicker('ETHUSD')
})
bws1.on('ticker', (pair, ticker) => {
//console.log('Ticker1:', ticker.LAST_PRICE)
priceBTCUSD = ticker.LAST_PRICE
})
bws2.on('ticker', (pair, ticker) => {
//console.log('Ticker2:', ticker.LAST_PRICE)
priceETHBTC = ticker.LAST_PRICE
})
bws1.on('error', console.error)
bws2.on('error', console.error)
//HERE IT COMES:
console.log(priceBTCUSD+priceETHBTC)
where the resulting output of the last line is "NaN". It seems the last line that logs the desired result to the console is executed before the BFX-objects establish a connection and receive any data.
How do I set this up properly? How can I retrieve data from the received data-stream? Do I really need a BFX-websocket object per currency pair? How would I read the price-data once, close down the websocket connection (which is not needed after reading the price once) and reconnect to read the price for a different currency pair?
Thank you! Feel free to request more data if my question isn't clear enough.
Kind regards,
s
Oh, your console.log is too soon there. Try this (I skipped a few lines):
bws1.on('ticker', (pair, ticker) => {
//console.log('Ticker1:', ticker.LAST_PRICE)
priceBTCUSD = ticker.LAST_PRICE;
printResults();
})
bws2.on('ticker', (pair, ticker) => {
//console.log('Ticker2:', ticker.LAST_PRICE)
priceETHBTC = ticker.LAST_PRICE
printResults();
})
bws1.on('error', console.error)
bws2.on('error', console.error)
//HERE IT COMES:
function printResults() {
if (priceBTCUSD && priceETHBTC)
console.log(priceBTCUSD+priceETHBTC)
}
Now, this is not the best approach, but it gets you of the ground. A better way is to have both prices asked on the same websocket, so when you get both prices back, call this function to calculate your results.

Categories