I have a 10MB JSON file of the following structure (10k entries):
{
entry_1: {
description: "...",
offset: "...",
value: "...",
fields: {
field_1: {
offset: "...",
description: "...",
},
field_2: {
offset: "...",
description: "...",
}
}
},
entry_2:
...
...
...
}
I want to implement an autocomplete input field that will fetch suggestions from this file, as fast as possible while searching multiple attributes.
For example, finding all entry names,field names and descriptions that contain some substring.
Method 1:
I tried to flatten the nesting into an array of strings:
"entry_1|descrption|offset|value|field1|offset|description",
"entry_1|descrption|offset|value|field2|offset|description",
"entry2|..."
and perform case insensitive partial string match, query took about 900ms.
Method 2
I tried Xpath-based JSON querying (using defiant.js).
var snapshot = Defiant.getSnapshot(DATA);
found = JSON.search(snapshot, '//*[contains(fields, "substring")]');
query took about 600ms (just for a single attribute, fields).
Are there other options that will get me to sub 100ms? I have control of the file format so I can turn it into XML or any other format, the only requirement is speed.
Since you are trying to search for a substring of values it is not a good idea to use indexeddb as suggested. You can try flattening the values of the fields to text where fields seperated by :: and each key in the object is a line in the text file:
{
key1:{
one:"one",
two:"two",
three:"three"
},
key2:{
one:"one 2",
two:"two 2",
three:"three 2"
}
}
Will be:
key1::one::two::three
key2::one 2::two 2::three
Then use regexp to search for text after the keyN:: part and store all keys that match. Then map all those keys to the objects. So if key1 is the only match you'd return [data.key1]
Here is an example with sample data of 10000 keys (search on laptop takes couple of milliseconds but have not tested when throttling to mobile):
//array of words, used as value for data.rowN
const wordArray = ["actions","also","amd","analytics","and","angularjs","another","any","api","apis","application","applications","are","arrays","assertion","asynchronous","authentication","available","babel","beautiful","been","between","both","browser","build","building","but","calls","can","chakra","clean","client","clone","closure","code","coherent","collection","common","compiler","compiles","concept","cordova","could","created","creating","creation","currying","data","dates","definition","design","determined","developed","developers","development","difference","direct","dispatches","distinct","documentations","dynamic","easy","ecmascript","ecosystem","efficient","encapsulates","engine","engineered","engines","errors","eslint","eventually","extend","extension","falcor","fast","feature","featured","fetching","for","format","framework","fully","function","functional","functionality","functions","furthermore","game","glossary","graphics","grunt","hapi","has","having","help","helps","hoisting","host","how","html","http","hybrid","imperative","include","incomplete","individual","interact","interactive","interchange","interface","interpreter","into","its","javascript","jquery","jscs","json","kept","known","language","languages","library","lightweight","like","linked","loads","logic","majority","management","middleware","mobile","modular","module","moment","most","multi","multiple","mvc","native","neutral","new","newer","nightmare","node","not","number","object","objects","only","optimizer","oriented","outside","own","page","paradigm","part","patterns","personalization","plugins","popular","powerful","practical","private","problem","produce","programming","promise","pure","refresh","replace","representing","requests","resolved","resources","retaining","rhino","rich","run","rxjs","services","side","simple","software","specification","specifying","standardized","styles","such","support","supporting","syntax","text","that","the","their","they","toolkit","top","tracking","transformation","type","underlying","universal","until","use","used","user","using","value","vuejs","was","way","web","when","which","while","wide","will","with","within","without","writing","xml","yandex"];
//get random number
const rand = (min,max) =>
Math.floor(
(Math.random()*(max-min))+min
)
;
//return object: {one:"one random word from wordArray",two:"one rand...",three,"one r..."}
const threeMembers = () =>
["one","two","three"].reduce(
(acc,item)=>{
acc[item] = wordArray[rand(0,wordArray.length)];
return acc;
}
,{}
)
;
var i = -1;
data = {};
//create data: {row0:threeMembers(),row1:threeMembers()...row9999:threeMembers()}
while(++i<10000){
data[`row${i}`] = threeMembers();
}
//convert the data object to string "row0::word::word::word\nrow1::...\nrow9999..."
const dataText = Object.keys(data)
.map(x=>`${x}::${data[x].one}::${data[x].two}::${data[x].three}`)
.join("\n")
;
//search for someting (example searching for "script" will match javascript and ecmascript)
// i in the regexp "igm" means case insensitive
//return array of data[matched key]
window.searchFor = search => {
const r = new RegExp(`(^[^:]*).*${search}`,"igm")
,ret=[];
var result = r.exec(dataText);
while(result !== null){
ret.push(result[1]);
result = r.exec(dataText);
}
return ret.map(x=>data[x]);
};
//example search for "script"
console.log(searchFor("script"));
Update: (The answer below was correct and helped to change the code, the issue now lies with mongoose and mongoDB, where mongoose appears to attempt a writing creating the collection but not actually writing the documents)
Why can't I bind the item that is nested in the req.body.goe.coordinates.0 or .1 The req.body.geo.coordinates is showing as an object which has an array of two numbers (coordinates). So far I can get the server to write all the other fields accept the geo field.
I assumed i could use req.body.geo.coordinates.0 because zero would be the first number in the array, right? i must be thinking of the req.body incorrectly and trying to access it in a manner that isn't correct.
How can I access these coordinates so that I can place them in the correct spot in the mongoose schema. I tried loading it as a whole object but mongoose wouldn't write that geo portion and would use the default.
As always thank you for you responses.
relevant part of mongoose schema:
var itemsSchema = new Schema({item:{
user_name: {type: String, default: 'badboy for life'},
city: {type:String},
geo:{
gtype: {type:String, default: "Point"},
coordinates: {longitude: {type:Number, default: -97.740678},
latitude: {type:Number, default: 30.274026 }},
},
title: {type: String, match: /[a-z]/},
desc: {type:String},
cost:{type: Number, index: true},
//Still can't get the image to load into mongoose correctly I'm assuming
// i'm doing it wrong but that's a question for another day.
//image: {
// mime: {type:String},
// bin: Buffer,
// }
}});
Express app.post that handles the post to the mongoDB server. It currently spits out the default values.
app.post('/js/dish', parser, function(req, res) {
var a = req.body.user_name;
var b = req.body.title;
var c = req.body.cost;
var d = req.body.ingdts;
var e = req.body.location;
var f = req.body.geo.coordinates.0;
var g = req.body.geo.coordinates.1;
// create an object, information comes from AJAX request from Angular
console.log(req.body.geo["coordinates"]);
console.log("hey the object request has this in it: "+ req, req.method);
Item.create({"item.user_name":a, "item.title": b, "item.cost": c, "item.desc": d, "item.city": e, "item.geo.coordinates.latitude":f,"item.geo.coordinates.longitude":g},
function(err, item) {
if (err){
res.send(err);
}else{
// get and return all the items after you create another
getItems(res);
}
});
Map Service in Angular which gets and Sets GPS
(function(){
var mapSrvc = angular.module("mapService",[]);
console.log('map factory baby!')
var position = [];
var mapPopulation = [];
mapSrvc.service('Maping',['$http',function(){
return{
getUserGPS:function(){
return position;
},
setUserGPS:function(lat,long){
return position = [lat,long];
},
getMapPop:function(dbLocs){
return mapPopulation = dbLocs;
}
}
}])
})();
The Angular control which is setting the value and sending it to the express portion of the app
$scope.createItem = function(item){
if($scope.items.text != false){
var itemCord = Maping.getUserGPS();
$scope.items={
user_name: item.user_name,
title: item.title,
cost: item.cost,
location: item.location,
ingdts: item.ingdts,
geo:{coordinates:{longitude:itemCord[1],latitude:itemCord[0]}}
}
console.log("breadcrumb to see if item was defined: ",$scope.items)
$scope.loading = true;
Items.create($scope.items)
.success(function(data){
Items.get()
.success(function(data){
$scope.items= data;
$scope.loading = false;
console.log(data);
})
.error(function(err){
console.log('Error: ' + err);
});
$scope.loading=false;
$scope.items= {};
$scope.items= data;
})
.error(function(err){
console.log('Error: ' + err);
});
}
};
Your coordinates object in your schema is an object literal, not an array.
Array elements can be accessed by index, but not in the way you assumed. In an array, you'd access elements with bracket notation so consider the following:
var foo = ["bar", "baz"];
console.log(foo[0]);
For an object however, unless you're actually grabbing keys named 0 and 1, it doesn't work in the same way.
If you want to access longitude and latitude in the bracket notation above, the coordinates object in your Schema would have to be an array. Although, that kind of makes naming the values and putting them in the longitude and latitude objects kind of useless. Talking about this part here (I took out the comment to make it easier to follow):
coordinates: {longitude: {type:Number, default: -97.740678}, latitude: {type:Number, default: 30.274026 }},
You have another option which may be more suitable. Just access the values in the coordinates object by key. Do you know how object literals work in JavaScript? Instead of accessing values by number (or index) you access them by the names you gave them (or the keys). So you'd instead access the values like this:
var f = req.body.geo.coordinates.longitude;
var g = req.body.geo.coordinates.latitude;
... because that's what you named them.
To simply things, to access by index, either name the keys 0 and 1 instead of longitude and latitude in the coordinates object (you probably should not do this) or make the coordinates object into an array like in the above example with foo, or access the keys by the names that you gave them like that directly above.
I have two documents one with tree structure and the other one relation to the first doc. Im trying to join these two doc`s by fk and pk. I couldnt get the actual results and it displays all null values.
First doc
{
"name": "one",
"root": {
"level1" : {
"level2" : {
"level3" : {
"itemone": "Randomkey1",
"itemtwo": "Randomkey2
}
}
}
},
"type": "firstdoc"
}
Second doc
{
"name" : "two",
"mapBy" : "Randomkey1",
"type" : "senconddoc
}
I`ve written a map function, which lists all the keys given a level 1 or 2 or 3 . Now I want o join this first doc and second doc using the key. Ive tried two ways (first: Im getting all (Root, Randomkey), (docName, Randomkey1) but it doesnt do any join. Im looking for a result like
(Root, docName)
Could someone assist in fixing this
map
function(doc) {
if (doc.type === 'firstdoc' || doc.type === 'seconddoc' ) {
var rootObj = doc.Root;
for (var level1 in rootObj) {
var level2Obj = doc.Root[level1];
for (var level2 in level2Obj) {
var keys = new Array();
var level3Obj = level2Obj[level2];
for (var i in level3Obj) {
var itemObj = level3Obj[i];
for (var i in itemObj) {
keys.push(itemObj[i]);
emit(doc.name, [itemObj[i], 0]);
var firstDocName = doc.name;
//This is gives null values
if (doc.Type === 'senconddoc' && doc.mapBy === itemObj[i]) {
emit(firstDocName , doc);
}
}
}
}
}
}
//This just lists keys to me
if (doc.type === 'senconddoc') {
emit([doc.mapBy, 1] , doc);
}
}
To simulate joins you have to output a doc with an _id in it, the value of the _id needs to point to an actual _id of a document. Then you can make use of include_docs=true to pull in the related documents. Example with many-to-many here: http://danielwertheim.se/couchdb-many-to-many-relations/
If this is not applicable, you can make a two step manual join by first returning custom keys. Then make a second query against the all documents view, with multiple keys specified.
It is much late but For such kind of tree structure, documents should be kept separately such as
{
id="firstDoc",
type="rootLevel"
}
{
id="secondDoc",
type="firstLevel"
parent="firstDoc"
}
{
id="thirdDoc",
type="firstLevel",
parent="firstDoc"
}
Now different levels can be joined using the Map Reduce function, Make sure that you will use it in proper way, Also use Logging so that you will be able to know in which sequence map/reduce function are being called by CouchDB.
Further, map Function should only be used for emitting the required document suppose if you want to to emit your level3 then in emit's value part, root.level1.level2.level3 should be there.
For more detail about the join you can refer
CouchDB Join using Views (map/reduce)