Wikipedia (API) "List_of" page contents - Parse to JSON

Wikipedia (API) "List_of" page contents - Parse to JSON - javascript

My question is simple: how can I return a JSON structure for all list items on any wikipedia page that begins with "List of"? If that is not feasible through Wiki API, what is best way to parse wiki HTML/ XML to what I need? (note- parsing does not have to be perfect)
There is roughly 225,000 of these pages and they mostly seem to be one of these 4 styles. For example:
https://en.wikipedia.org/wiki/List_of_Star_Trek%3A_The_Next_Generation_episodes https://en.wikipedia.org/wiki/List_of_car_brands https://en.wikipedia.org/wiki/List_of_Presidents_of_the_United_States https://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_goalscorers
specifically I would like something I can use, like:
Star Trek: Next Generation episodes->
season 1->
Encounter at Farpoint
Encounter at Farpoint
The Naked Now
...
season 2->
The Child
Where Silence Has Lease
Elementary, Dear Data
...
...
...
The closest solutions I have come up with so far are Axios Wikipedia API parse calls that I would need to run for each section. Furthermore, despite setting JSON parameter I still receive list items as xml or HTML for "text" property of returned JSON. Parsing this becomes difficult for all the different page types. Any suggesting with how to parse multiple wiki type lists items would be helpful if JSON return is not possible.
Any suggestions to accomplish my goal? I am using VUE.js with nodejs.
Maybe their is a library that could help?
Maybe a get request on URL to get full html dump would work better?
Maybe their is a wikidump of just list pages that I could parse to firestore?

The concept of wiki data solves this issue, however it is still no where near maturity level to provide much value. In maybe 3-5 years it could avoid this problem all together.
At this time the quick and dirty way to answer this question is just grabbing all the links on a given wikipedia page through api, then either programmatically filter or have user do so. This works because the vast majority of star trek episodes, presidents, and car brands on a given list will be linked to their individual wikipedia pages.
I used the following API query to get all links on a wikipedia page (using pageid)
axios({
method: 'get',
url: 'https://en.wikipedia.org/w/api.php',
params: {
action: 'query',
format: 'json',
prop: 'pageterms|pageimages',
origin: '*',
generator: 'links',
gpllimit: '500',
redirects: 'true',
pageids: pageidIn,
piprop: 'thumbnail',
formatversion: 2
}
Then save off response.data.query.pages[i].terms.description and response.data.query.pages[i].title to class object of results
Then I added an additional search field for user to filter their prior results. If they enter "episode" it will get me what I need since the word "episode" is typically in the response.data.query.pages[i].terms.description field of the page.
Only drawback is this solution wont return list results that don't have their own wiki page. But for the sake of simplicity, I will accept that.

Related

How do I query information from the Trefle API

I didn't want to ask such a basic question but I seem to not be able to find the answer on my own.
How can I query a specific plant without knowing its id?
Such as using the common name or binomial name.
I see that according to the documentation the path is /api/species/{id}, it might be my inexperience on using APIs but I'm left clueless on how to query a specific plant without prior knowledge of their id.
Would anyone be able to give me an explanation of how it works or even better link me to an article to fill the gaps of my API knowledge?
My current knowledge stems from the freecodecamp JSON APIs and AJAX short course which doesn't help when faced with this sort of documentation.

I got the response from another forum and then complemented it by learning more about REST APIs - APIs for Beginners - How to use an API (Full Course / Tutorial)
At the basic level in the form of a URL:
You start with the provided URL - https://trefle.io/api
Select which type of data you want such as Kingdom, subKingdom,
division, etc All the way down to plants or species. -
https://trefle.io/api/plants. As it is mentioned, "Plants
are all main species, without all the varieties, cultivars,
subspecies and forms", and species will give you all that belongs
to that species meaning several plants.
You provide the parameters that you want, with some being required and others being optional.
To know which parameters to use you can read underneath here, such as token, page_size, page, etc.
The first parameters
would go in front of the former link like so with ? -
https://trefle.io/api/plants?q=strawberry and the following parameters
are separated with a & like so -
https://trefle.io/api/plants?q=strawberry&token=yourAPIkey and
you can add any parameters to that such as showing only items with complete
data, minimum ph, or others that might interest you.
But do keep in mind that the database still seems to be particularly incomplete.
Also, this is in the form of a URL, in practice, this would be done differently, I recommend watching the video I linked above where he goes through a lot of this information and showcases helpful tools such as Postman.

Get fixed number of JSON objects from third-party API

I'm working with this returned API from a third party:
(note: returns LARGE dataset)
https://www.saferproducts.gov/RestWebServices/Recall?format=json
I know I can get this giant object like so:
$.getJSON('https://www.saferproducts.gov/RestWebServices/Recall?format=json', function(json, textStatus) {
console.log(json);
});
This returns ~7000 objects. There is no way in the API call to dictate how many objects you want returned. It's all or nothing.
Here's the question...can I use getJSON (or similar) to only get the first 5 objects and stop without having to load the entire JSON file first?

I did something similar a while a go. I used PHP to fetch a webpage of an api. Then I would cache it. Via PHP logic, I stored a variable inside a text file which contained all the information from the webpage. I had another file that stored the timestamp. Then, when the page was called, php would check the timestamp to see how old it was. If it was too old, it'd recache the page and return relevant information. If it was still valid, it would just return the cached information. If you only want the last 5, the PHP logic wouldn't be too hard to write that in. Then, jQuery would query the PHP page.

They don't have anything called out in their documentation for limiting the returns. I think their intent is for you to narrow down your search so you're not fetching every single item. You could always email them and ask as what Mike McCaughan said, if they don't have a 'limit' baked in, then no, it's not possible.
It also looks like they offer weekly downloads that you can just create your own API and add a limit property:
https://www.fda.gov/%20Safety/Recalls/EnforcementReports/default.htm
Reference:
https://github.com/presidential-innovation-fellows/recalls-api/wiki/data-sources
https://www.cpsc.gov/Recalls/CPSC-Recalls-Application-Program-Interface-API-Information/
https://www.cpsc.gov/Global/info/Recall/CPSC-Recalls-Retrieval-Web-Services-Programmers-Guide_3-25-2015.pdf
If there really is no option for limiting that call, then I'd suggest caching, showing some kind of processing wheel while the call takes place or narrowing your query. They have options to for filtering that may work for you such as the following:
RecallNumber
RecallDateStart
RecallDateEnd
LastPublishDateStart
LastPublishDateEnd
RecallURL
RecallTitle
ConsumerContact
RecallDescription
ProductName
ProductDescription
ProductModel
ProductType
RecallInconjunctionCountry
ImageURL
Injury
ManufacturerCountry
UPC – see caveat below
Hazard
Manufacturer
Remedy
Retailer

How to get article's main category using Wikipedia API

I've been searching for a while about how to retrieve something like a "main category" from each Wikipedia article. I'm using the wikipedia API to retrieve the data but I'm getting multiple objects of categories within an array instead of one concise category.
I've seen people implement this, for example facebook in this page shows "Harry Potter and the Deathly Hallows: Part II" and if you see above this title there is a category that says "MOVIE" and it applies for everything, it could be "BOOKS", "MUSIC", "ARTISTS", "ANIMALS" which is what I'd like to get when using the API, I want this because I wanna make searches by using this specific category (I know that facebook is probably consuming the Wikipedia's API because the page says "FROM WIKIPEDIA, THE FREE ENCYCLOPEDIA" and it's like this everytime you find something which is like a copy and paste of the original wikipedia article.
Here an image if you don't wanna go to the link:
I've been reading for quite a while the Docs that the Wikipedia/Mediawiki API offers but haven't found anything that can help me so far, also I've read this question but the answer is not really helpful in my case and it's from two years ago.
Here is an example of how I'm consuming the API, for example here I made a search for "Harry Potter" and limit the request to get 3 results:
https://es.wikipedia.org/w/api.php?format=jsonfm&action=query&generator=search&gsrnamespace=0&gsrsearch=Harry%20Potter&gsrlimit=3&prop=pageimages|categories&pilimit=max&utf8=1&exlimit=max
Any help or recommendation about how to fulfill this approach is appreciated.

Wikipedia has no concept of one category being more main than the others, and the ordering does not help either (it reflects the order in the source, which typically means automatically generated categories first, important categories at the end). Your best bet is probably to use the Wikidata API and fetch the value of the "instance of" attribute. E.g. HPatDHp2 is an instance of movie.

How to get Wikipedia article data from Freebase suggest response

I'm looking for a simple way to get data about a university (name, native_name, city etc) from the Wikipedia infobox after a user selects a university from Freebase suggest. The dataset returned from freebase, however, is very small and doesn't include the wikipedia link unfortunately.
Currently I am using the "name" property and making an ajax request to http://live.dbpedia.org/data/"+name+".json. This often works but while doing some tests it turned out the name doesn't always map directly to the correct page. Let me split my question in a few to make myself clear:
Is it possible to configure the Freebase suggest plugin so that the
response includes the wikipedia link?
OR is there a similar plugin that queries DBpedia directly and is as
simple and user-friendly as Freebase's?
OR, as a plan B, is there a way to send a request to
"live.dbpedia.org" so that it only returns me the json after
redirects? On the Wikipedia API I can send a "redirects" variable that does this. But then I'd have to parse the data myself…
The problem with the plan B is that nothing guarantees that the freebase object's name will ever lead me to the correct Wikipedia page. Even after the redirects…
I swear I've read a lot of API documentation but everything is extremely confusing and I chose not to read long tutorials about RDF, SPARQL and MQL because I really don't think the solution should be so complicated. I'm asking here because I really hope I'm missing a simple solution…
UPDATE
{
id: "/en/babes-bolyai_university",
lang: "en",
mid: "/m/08c4bf",
name: "Babeş-Bolyai University",
notable: {
id: "/education/university",
name: "College/University"
},
score: 37.733559
}
This is the result I get after selecting "Babeş-Bolyai University" in the suggest widget.
SOLUTION
I assumed I can't configure the Suggest widget to return more data, so after getting the Freebase ID of the object I just send another request, this time with a query specifically asking for the Wikipedia ID. I didn't know any MQL and couldn't find the name of the Freebase field with the Wikipedia ID. Maybe I'm stupid but the Freebase documentation really confuses me. In any case Tom Morris' answer and this question helped me build the query that returned what I wanted:
https://www.googleapis.com/freebase/v1/mqlread?query={"id":"/en/babes-bolyai_university","key":{"namespace":"/wikipedia/en_title","value":null}}
The strings in the result come with numeric codes for special unicode characters though (in my case Babe$0219-Bolyai_University). I've been able to convert the code with String.fromCharCode(parseInt(219, 16)) but if someone knows of a way to convert the whole string that would be helpful. Otherwise I can just make my own function replacing the "$dddd" pattern with the corresponding character.
Thanks for the help!

There isn't a DBpedia autosuggest comparable to Freebase Suggest as far as I'm aware.
Anything that's in Freebase can be retrieved with Suggest by using an MQL expressions in the output parameter. For simple things, e.g. names, aliases, the MQL is basically just a JSON snippet containing the relevant property name.
EDIT: The output parameter doesn't actually appear to be documented in the context of Suggest, but anything that isn't a Suggest parameter gets passed through transparently to the Freebase Search API, so you can use all of the stuff described here: https://developers.google.com/freebase/v1/search-output You can get as much or as little information as you require returned with each suggestion.
If you do need to query DBpedia, you should be using the Wikipedia/DBpedia key, which is not necessarily the same as the name. For English Wikipedia, the key is in the namespace /wikipedia/en or if you want the numeric Wikipedia ID in the namespace `/wikipedia/en_id'. Replace the 'en' with the appropriate language code if you want to query other language Wikipedias. These keys have some non-ASCII characters escaped, so if you need to unescape them, you can use the documentation here: http://wiki.freebase.com/wiki/MQL_key_escaping

You can update the "flyout_service_path" parameter. Here there is a description in the freebase suggest documentation (https://developers.google.com/freebase/v1/suggest). I'm using this configuration for to get all the keys of a entity.
$(inputClass).suggest(
{
"key" : _FREEBASE_API_KEY_BROWSER_APP,
"flyout_service_path":"/search?filter=(all mid:${id})&output=(notable:/client/summary description type /type/object/key)&key=${key}"
}
).bind("fb-select", this.fbSelectedHandler);
In the freebase response I can see now in the "output" parameter the "/type/object/key" with all the keys of the entity (wikipedia,etc..).
My question now is how I can acquire this data from output ?. In the "fb-select" event the "data" variable, don't carry this fields.
Some help, please..

Controller and View for creating one-to-many object, both "container" and unlimited number of "content" objects?

Users will be able to write some documents. Those documents will consists of chapters (one-to-many relation).
Normally I would do this by creating separate views for creating chapter and document.
How to implement web page that allow to edit "composite" view? Where I can edit document details, but also create chapters, without visiting different pages? Also how can I ensure that I pass order of chapter user have arranged (by moving chapters freely up and down)?
(Sorry if that question already have be asked&answered but I do not even know how to search for it :| since I do not know proper keywords beyond "AJAX", so help in naming my requirement would also be welcomed!)

Backend servers applications based on REST principles work nicely with Ajax client-side implementations.
For example, your URLs could be:
/book/1
/book/1/chapters
/book/1/chapter/1
You could set it up so that a POST to /book/1/chapters would add a chapter. A GET on that same URL would return all chapters. A GET on /book/1/chapter/1/ would only return chapter 1. A PUT on /book/1/chapter/1/ would update an existing chapter. This is a "RESTful" architecture:
http://en.wikipedia.org/wiki/Representational_state_transfer
This is an interesting introduction: http://tomayko.com/writings/rest-to-my-wife
This is a big subject, but if you create the right backend server architecture you will find your job a lot easier. Hope this helps answer your question.

Ok Partial solution.
Just google Nested Forms Ruby on Rails. Plenty of examples, all in ajax, all easy.

We Keep Coding

JavaScript is the programming language of the Web.