I need to find a way to identify if an item in an array is a noun. I really can't think of any way to do this.
The first thing I did was ignore all words with "ly" in the end. But we know many words that aren't nouns doesn't have to have "ly" in the end. Is their a better way to do this? Is their a javascript library that can do this?
What you're looking for is called a pos tagger. The pos node.js module does exactly that.
The natural nodejs module also comes with a wordnet interface, which you can use.
I think you can use this
http://wordnet.princeton.edu/wordnet/man/wngloss.7WN.html
it has endpoints for JSON and XML (you can make a simple GET request for each word and get the type of the word from the responded json e.g.
http://chriscargile.com/dictionary/json/cow
Related
I am thinking of a implementing a new project that has import/export feature. First, I will have an array of around 45 objects. The object structure is simple like this.
{"id": "someId", "quantity": 3}
So, in order to make it exportable, I will have to change the whole array of these objects into one single string first. For this part, I think I will use JSON.stringify(). After that, I want to make the string as short as possible for the users to use it (copy the string and paste it to share to other users to import it back to get the original array). I know this part is not necessary but I really want to make it as short as possible. Hence, the question. How to convert array of objects to a shortest possible string?
Any techniques such as Encoding, Encryption, or Hashing are acceptable as long as it is reversible to get the original data.
By "shortest possible", I mean you can answer any solution that is shorter than just pure stringification. I will just accept the one that gives shortest string to import.
I tried text minification but it gives almost the same result as the original text. I also tried encryption but it still gives a relatively long result.
Note: The string for import (that comes from export) can be human-readable or unreadable. It does not matter.
Deleting json optional SPACE after : colon and , comma
is a no-brainer. Let's assume you have already minified
in that way.
xz compression is generally helpful.
Perhaps you know some strings that are very likely
to repeatedly appear in the input doc. That might include:
"id":
"quantity":
Construct a prefix document which mentions such terms.
Sender will compress prefix + doc,
strip the initial unchanging bytes,
and send the rest.
Receiver will accept those bytes via TCP,
prepend the unchanging bytes,
and decompress.
Why does this improve compression ratios?
Lempel-Ziv and related schemes maintain a dictionary,
and transmit integer indexes into that dictionary
in order to indicate common words.
A word can be fairly long, even longer than "quantity".
The longer it is, the greater the savings.
If sender and receiver both know a set of words
that belong in the dictionary, beforehand,
we can avoid sending the raw text of those words.
Your chrome browser
compresses web headers
in this way already, each time you do a google search.
Finally, you might want to base64 encode the compressed output.
Ignore compression, and use a database instead,
in the way that tinyurl.com has been doing for quite a long time.
Set serial to 1.
Accept a new object, or set of objects.
Ask the DB if you've seen this input before.
If not, store it under a brand new serial ID.
Now send the matching ID to the receiving end.
It can query the central database when it gets a new ID,
and it can cache such results to use in future.
You might opt for a simple CSV export. The export string becomes, if you use the pipe separator, something like:
id|quantity\nsomeId|3\notherId|8
which is the equivalent of
[{"id":"someId","quantity":3},{"id":"otherId","quantity":8}]
This approach will remove the redundant id and quantity tags for each record and remove the unnecessary double quotes.
The downside is that your records all should have the same data structure but that is generally the case.
I am trying to create a scenario that will work every time but I do not know how to deal with the uniquely hashed javascript and CSS. I could not find any answer in the documentation about that.
What I want specifically is the ability to pass a regex into my get but that is not possible since it only takes a string.
.get("/dist/precache-manifest.3efd6185a8d8559962673d45aed7ae98.js")
.headers(headers_0)
I expect a way to be able to somehow get the URL with a regex and then use it in my get above. Is there a way to do that in a Gatling scenario.
I found a way but its a hack and it takes a lot of time I am answering this because someone might want to use this way. However this could be considered a bug.
.get("").queryParam("", _ =>regex("""\/dist\/precache-manifest.[A-Za-z0-9]+.js"""))
.headers(headers_0),
I was working on a forum and thought of making a tag generator, something like Quora.com but simpler. So, first I "purified" the string – meaning removed some irrelevant words like "for", "in"...
But I couldn't figure out how to only get the nouns in the string. For example: In this thread's title "Is there a PHP or JS algorithm that can filter out the nouns on a string?" would give us:
PHP
JS
algorithm
nouns
string
This is more or less good and accurate. But I also don't want to use a noun-list because I don't want to waste half of my years writing it. I'll also be glad if you know any good noun-lists. Thank you.
You need a "lexical dictionary" (dictionary that maintains metadata about words and connections between them) like Princeton Wordnet. This is an english word semantic database you can use to query and compare things like nouns / verbs or even synonyms / hypernyms.
This obviously would run on your server. You would have to parse the strings on the server side (you could use Ajax if you want it to look like its on the client). There is no feasible way to maintain an entire english dictionary in browser memory, and to search through it, with anything resembling good performance.
I am working on a web app where I need to get the user's zip code and see if it matches one of the zips in csv file of 6000 zip codes (if yes, enter; else display error;) I was going to do this in sql and make a query with user's input but wanted to know if there is a better approach (speed and other suggestions). Javascript preferred. Thanks.
SQL's probably the easiest/fastest.
As a fun experiment, though, if you really wanted to do it in JavaScript, you could read in the file and create an index of sorts by splitting each number as a string into its characters, and then creating a multidimensional hash for the lookup.
You could then compare that performance to a simple Array.indexOf() call.
Those're probably reasonably fast but more work than needed, just go w/ the SQL. ;)
How to split the syllables in a word using JavaScript. Is there any API for that? Any help will be appreciated.
Now, back to being constructive, what I suggest is that you find a couple online dictionary sites and look at their APIs (I know dictionary.com has a free API) and see if you can use it to access just the word split into syllables from a lookup.
Unfortunately, from what I have read, it looks like you would really need a dictionary of words split already to check against and there aren't any standalone versions out there.
Be the first and post it somewhere! :)