ElasticSearch Query Creation - javascript

I am having an inordinate amount of trouble trying to structure an ElasticSearch query.
I need to get the top 25 tweets based on the sum of two fields, favoritesCount and retweetCount. I also need to be able to specify a date range on the field postedCount.
So far the closest I have been able to come is
query = {
'size' : 25,
'fields' : ['retweetCount', 'favoritesCount', 'preferredUsername', 'body'],
"query": {
"range": {
"postedTime" : {
'gte' : 'now-24M',
'lte' : 'now'
}
}
},
"sort" : {
"retweetCount" : {"order" : "desc"},
"type" : "number",
}
};
This query too many results and sorts on the total number of tweets with the same retweet count. I also cannot figure out why this query doesn't return just the fields specified in the query.
Ideally, the query would return only the fields '['retweetCount', 'favoritesCount', 'preferredUsername', 'body']

This is the query that you should trigger:
curl -X GET 'http://localhost:9200/index_name/type/_search' -d '{
"size" : 25,
"fields" : ["retweetCount", "favoritesCount", "preferredUsername", "body"],
"query": {
"range": {
"postedTime" : {
'gte' : 'now-24M',
'lte' : 'now'
}
}
},
"sort" : {
"retweetCount" : {"order" : "desc"},
"type" : "number",
}
}'
Your are embedding everything inside query hash. This is not needed

Related

Is there any way to take nearest location value from an elasticsearch index?

I have 2 elasticsearch index, one with userlocation and another "locationvalues"
"userlocation" : {
"aliases" : { },
"mappings" : {
"properties" : {
"_class" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"latitude" : {
"type" : "float"
},
"longitude" : {
"type" : "float"
},
"timestamp" : {
"type" : "long"
}
}
},
{
"locationvalues" : {
"aliases" : { },
"mappings" : {
"properties" : {
"LocationLat" : {
"type" : "double"
},
"LocationLong" : {
"type" : "double"
},
"Source" : {
"type" : "text"
},
"TimeStamp" : {
"type" : "date",
"format" : "epoch_millis"
},
"Value" : {
"type" : "double"
}
}
},
Is there anyway to bring from locationvalues just the nearest location from userlocation, on coresponding timestamp (+- 10 minutes)? I must to specify that the timestamp in the userlocation has different values compared to the one in the locationvalue.
Timestamp have format epoch UNIX milliseconds!
locationvalues have more than 100k elements/month and userlocation more than 5000.
It seems as you rather want to pull a report than the result of a single Elasticsearch query. What you want to achieve is not possible with a single query. You would need to write a client application that first queries for all existing "user locations" and then submit a query for every single "user location" to the location values index.
Elasticsearch can calculate the distance between geo-locations and also factor sort results based on geo-distance. Elasticsearch also supports data-math to easily query for date ranges such as "within 10 minutes", etc.
But Elasticsearch can only do so if the data is stored in the proper format. Independent to whom created the Elasticsearch index that mapping does not look ideal. Longitude and Latitude information should not be stored as 2 float fields, but rather as a geo_point field.
In order to support date_range queries you need to store timestamps as proper timestamps (and not as long). Via the format parameter you can control the valid date/time-formats you're planning to send to Elasticsearch (can be multiple format strings!). Elasticsearch will then not just use this information to validate the timestamps but also to properly convert them into the internally used epoch_millis representation for storing.
Even if you can't change any of the existing mappings to maintain backward compatibility, you could consider adding new "multi-fields" to your mapping to ensure your data being stored in the proper format. But you would need to get write privileges (along with the privilege to execute an _update_by_query-request) to get your index mappings fixed first.

Firebase realtime database - orderByChild not returning any result

I am using the Firebase Realtime Database functions .orderByChild() and .equalTo() to fetch data nodes to be updated. However, I cannot seem to get any results back from my queries.
My database structure
{
"4QEg0TWDbESiMX8Cu8cvUCm17so2" : {
"-LmGtXsgJbAvVS8gv5-E" : {
"createdAt" : 1565815876803,
"message" : "Hello",
"isSender" : false,
"sender" : {
"alias" : "",
"name" : "Person A",
"sid" : "mUO3DtYY2yRw3zkv4EmTlfldB3S2"
},
"sysStatus" : 0
},
"-LmGtt4nyuygG9B4s6__" : {
"createdAt" : 1565815967746,
"message" : "Hej!",
"isSender" : true,
"sender" : {
"alias" : "",
"name" : "Person B",
"sid" : "4QEg0TWDbESiMX8Cu8cvUCm17so2"
},
"sysStatus" : 0
},
"-LmJxcvL_Y7JmojxPiiK" : {
"createdAt" : 1565867281849,
"isSender" : true,
"message" : "111",
"sender" : {
"alias" : "",
"name" : "Person B",
"sid" : "4QEg0TWDbESiMX8Cu8cvUCm17so2"
},
"sysStatus" : 0
}
},
"mUO3DtYY2yRw3zkv4EmTlfldB3S2" : {
"222" : {
"createdAt" : 1565867281849,
"isSender" : true,
"message" : "Test",
"sender" : {
"alias" : "",
"name" : "Person A",
"sid" : "mUO3DtYY2yRw3zkv4EmTlfldB3S2"
},
"sysStatus" : 0
},
"333" : { <-- This is one node I wish to fetch
"createdAt" : 1565815967746,
"isSender" : false,
"message" : "123",
"sender" : {
"alias" : "",
"name" : "Person B",
"sid" : "4QEg0TWDbESiMX8Cu8cvUCm17so2" <-- This is the value I am trying to match
},
"sysStatus" : 0
}
},
"rKNUGgdKqdP68T0ne6wmgJzcCE82" : {
"-Lm_GTmNHyxqCqQl4D4Z" : { <-- This is one node I wish to fetch
"createdAt" : 1566140917160,
"isSender" : false,
"message" : "Sesame",
"sender" : {
"alias" : "",
"name" : "Person B",
"sid" : "4QEg0TWDbESiMX8Cu8cvUCm17so2" <-- This is the value I am trying to match
},
"sysStatus" : 5
}
}
}
My code
dbRoot.child('messages')
.orderByChild('sid')
.equalTo(userId)
.once('value', (senderSnapshot) => {
console.log('senderSnapshot', senderSnapshot.val())
console.log('senderSnapshot amount', senderSnapshot.numChildren())
senderSnapshot.forEach((sender)=>{
//Do the work!
})
})
The code logs
senderSnapshot null
senderSnapshot amount 0
I have manually checked that there are several nodes where "sid" is set to the "userId" I am looking for.
Why am I not getting any results back from my query?
It seems like I have to search dbRoot.child('messages/rKNUGgdKqdP68T0ne6wmgJzcCE82') to get my value. :/ (And then repeat the search for each user)
How much extra data overhead would it be to download/collect all contacts and then loop thru each users contact?
Firebase Database queries allow you to search through the direct child nodes at a certain location for a value at a fixed path under each child. So: if you know the user whose message you want to search through, you can do:
dbRoot
.child('messages')
.child('mUO3DtYY2yRw3zkv4EmTlfldB3S2') // the user ID
.orderByChild('sender/sid')
.equalTo(userId)
Note the two changes I made here from your code:
We now start the search from messages/mUO3DtYY2yRw3zkv4EmTlfldB3S2, so that it only searches the messages for that one user.
We now order/filter on the sender/sid property for each child under messages/mUO3DtYY2yRw3zkv4EmTlfldB3S2.
By combining these we are essentially searching a flat list of child nodes, for a specific matching value at a path under each child node.
There is no way in your current data structure to find all messages for a specific sender/sid value across all users. To allow that you'll have to add an additional data structure, where you essentially invert the current data.
"sender_messages": {
"mUO3DtYY2yRw3zkv4EmTlfldB3S2": {
"4QEg0TWDbESiMX8Cu8cvUCm17so2/-LmGtXsgJbAvVS8gv5-E": true,
},
"4QEg0TWDbESiMX8Cu8cvUCm17so2": {
"4QEg0TWDbESiMX8Cu8cvUCm17so2/-LmGtXsgJbAvVS8gv5-E": true,
"4QEg0TWDbESiMX8Cu8cvUCm17so2/-LmGtt4nyuygG9B4s6__": true,
"4QEg0TWDbESiMX8Cu8cvUCm17so2/-LmJxcvL_Y7JmojxPiiK": true,
"mUO3DtYY2yRw3zkv4EmTlfldB3S2/333": true,
"rKNUGgdKqdP68T0ne6wmgJzcCE82/Lm_GTmNHyxqCqQl4D4Z": true
},
"mUO3DtYY2yRw3zkv4EmTlfldB3S2": {
"mUO3DtYY2yRw3zkv4EmTlfldB3S2/222": true,
}
}
Now you can find the messages for a specific sender/sid by reading:
dbRoot
.child('sender_messages')
.child('4QEg0TWDbESiMX8Cu8cvUCm17so2') // the sender ID
And then looping over the results, and loading the individual messages from each path as needed.
This is quite common in NoSQL databases: you'll often have to modify your data structure to allow the use-cases you want to add to your app.
See also:
Firebase query if child of child contains a value
Firebase Query Double Nested
Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly (to learn why loading the additional messages is not nearly as slow as you may initially think)

MongoDB: Get all nearby with filter

In MongoDB, I have models of User, Token, and Boost.
A user can have one or more tokens and one or more boosts.
Token has a 2dsphere location field.
And Boost has startTime and stopTime Date fields.
A user is said to have an active boost if Date.now() is greater than boost.startTime() and less than boost.stopTime().
What Mongo aggregation can I write to fetch me all the tokens near a particular location that belong to users with at least one active boost?
Based on your question, I have created a mock data
token collection:
{
"_id" : ObjectId("5b97541c6af22cc65216ffd8"),
"userid" : "5b9753726af22cc65216ffd6",
"location" : {
"longitude" : 80.250875,
"latitude" : 13.052519
}
},
{
"_id" : ObjectId("5b97543a6af22cc65216ffd9"),
"userid" : "5b97537e6af22cc65216ffd7",
"location" : {
"longitude" : 80.249995,
"latitude" : 13.051819
}
}
boost collection :
{
"_id" : ObjectId("5b9754796af22cc65216ffda"),
"startTime" : ISODate("2018-09-11T05:36:57.149Z"),
"stopTime" : ISODate("2018-09-11T05:36:57.149Z"),
"userid" : "5b9753726af22cc65216ffd6"
},
{
"_id" : ObjectId("5b9754b46af22cc65216ffdb"),
"startTime" : ISODate("2018-10-08T18:30:00.000Z"),
"stopTime" : ISODate("2018-10-08T18:30:00.000Z"),
"userid" : "5b97537e6af22cc65216ffd7"
}
Users collection :
{
"_id" : ObjectId("5b9753726af22cc65216ffd6"),
"userName" : "user111"
},
{
"_id" : ObjectId("5b97537e6af22cc65216ffd7"),
"userName" : "user222"
}
The aggregate query to fetch all the tokens near a particular location that belong to users with at least one active boost is:
db.token.aggregate([
{
"$geoNear": {
"near": { type: "Point", coordinates: [80.248797,13.050599] },
"distanceField": "location",
"maxDistance": 1000,
"includeLocs": "location",
"spherical": true
}
},
{"$lookup" : {"from":"boost",
"localField" : "userid",
"foreignField" : "userid",
"as" : "boostDocs"
}},
{"$unwind" : "$boostDocs"},
{"$match" : {"$and":[{"boostDocs.startTime":{"$lte":new Date("11/09/2018")}},{"boostDocs.stopTime":{"$gte":new Date("10/09/2018")}}]}}
])
Notice that query to match the location is at the top of the query as $geoNear will only work if its the first stage of the aggregation pipeline.
The Date that I've used for comparison is just to check if my query works. You can specify your date or Date.now() as per your requirement.

How to query Firebase based on boolean value?

"vehicles" : {
"-KbwlIGLm6dxffPJoJJB" : {
"fuel" : "petrol",
"groups" : {
"-KdWh_9KVF16efcSdrji" : true,
"-Kdb1G720MDgbuR3nPBL" : true
},
"make" : "Honda",
"model" : "City",
"name" : "Honda City",
"speed_limit" : 100,
"tank_size" : 32,
"type" : "car"
},
"-KdU-BlfEdqzKxFjGI3D" : {
"fuel" : "petrol",
"groups" : {
"-KdWh_9KVF16efcSdrji" : true
},
"make" : "yamaha",
"model" : "FZ",
"name" : "Yamaza FZ",
"speed_limit" : 60,
"tank_size" : 12,
"type" : "bike"
}
}
I want to retrieve results where groups has the -KdWh_9KVF16efcSdrji and the key must equal to true.
vehicles_ref.child("groups").orderByChild("-KdWh_9KVF16efcSdrji").equalTo(true).on
("value", function (snapshot) {
console.log(snapshot.val());
});
But currently I'm getting NULL for the above criteria.
i have changed query
vehicles_ref.orderByChild("/groups/-KdWh_9KVF16efcSdrji").equalTo(true).on("value", function (snapshot) {
console.log(snapshot.val());
});
now getting results, but now getting warning
FIREBASE WARNING: Using an unspecified index. Consider adding ".indexOn": "groups/-KdWh_9KVF16efcSdrji" at /vehicles to your security rules for better performance
how to add index to remove this warning ?
I also had that warning that I sent firebase, and I was filtering queries in the queries looking for items with the same id. The warning message ".indexOn" refers to the rule in your database, which should be checked by adding ".indexOn" to the group you are trying to sort the items in.
{
"scores": {
"bruhathkayosaurus" : 55,
"lambeosaurus" : 21,
"linhenykus" : 80,
"pterodactyl" : 93,
"stegosaurus" : 5,
"triceratops" : 22
}
}
Because the names of the dinosaurs are just the keys.
we can add a .value rule to our node / scores to optimize our queries:
{
"rules": {
"scores": {
".indexOn": ".value"
}
}
}
you can use the following link:
https://firebase.google.com/docs/database/security/indexing-data?hl=es-419

taking the difference between adjacent documents in mongoDB

How do I take the difference between adjacent records in mongoDB using javascript? For example, if I have the following three documents in a collection:
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z")
}
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z")
}
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z")
}
I want to take the difference in the "time" field between adjacent values to get:
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z"),
"time_difference" : null
}
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z"),
"time_difference" : 1
}
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z"),
"time_difference" : 3
}
Any ideas on how to do this efficiently in javascript/mongoDB? Thanks.
I don't know whether this was true when the question was asked seven years ago, but this can be solved completely within the aggregation framework. Assuming the collection name is AdjacentDocument, the following aggregation will get the results you're looking for:
db.AdjacentDocument.aggregate(
{$sort: {time: 1}},
{$group: {_id: 0, document: {$push: '$$ROOT'}}},
{$project: {documentAndPrevTime: {$zip: {inputs: ['$document', {$concatArrays: [[null], '$document.time']}]}}}},
{$unwind: {path: '$documentAndPrevTime'}},
{$replaceWith: {$mergeObjects: [{$arrayElemAt: ['$documentAndPrevTime', 0]}, {prevTime: {$arrayElemAt: ['$documentAndPrevTime', 1]}}]}},
{$set: {time_difference: {$trunc: [{$divide: [{$subtract: ['$time', '$prevTime']}, 1000]}]}}},
{$unset: 'prevTime'}
);
Aggregation pipeline walkthrough
First, the documents are sorted from oldest to newest. They are grouped into a single document with the documents stored in an ordered array field:
{$sort: {time: 1}},
{$group: {_id: 0, document: {$push: '$$ROOT'}}}
/*
{
"_id" : 0,
"document" : [
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z")
},
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z")
},
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z")
}
]
}
*/
Next, the previous times are zipped into the document array, creating an array of [document, previousTime]:
{$project: {documentAndPrevTime: {$zip: {inputs: ['$document', {$concatArrays: [[null], '$document.time']}]}}}}
/*
{
"_id" : 0,
"documentAndPrevTime" : [
[
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z")
},
null
],
[
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z")
},
ISODate("2013-02-13T15:45:41.148Z")
],
[
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z")
},
ISODate("2013-02-13T15:45:42.148Z")
]
]
}
*/
Next, the document & time array is unwound, creating a document for each of the initial documents:
{$unwind: {path: '$documentAndPrevTime'}}
/*
{
"_id" : 0,
"documentAndPrevTime" : [
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z")
},
null
]
}
{
"_id" : 0,
"documentAndPrevTime" : [
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z")
},
ISODate("2013-02-13T15:45:41.148Z")
]
}
{
"_id" : 0,
"documentAndPrevTime" : [
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z")
},
ISODate("2013-02-13T15:45:42.148Z")
]
}
*/
Next, we replace the document with the value of the document array element, merged with previous time element (using null if it's the initial index):
{$replaceWith: {$mergeObjects: [{$arrayElemAt: ['$documentAndPrevTime', 0]}, {prevTime: {$arrayElemAt: ['$documentAndPrevTime', 1]}}]}}
/*
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z"),
"prevTime" : null
}
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z"),
"prevTime" : ISODate("2013-02-13T15:45:41.148Z")
}
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z"),
"prevTime" : ISODate("2013-02-13T15:45:42.148Z")
}
*/
Finally, we update the document by setting the time_difference to the difference of the two time fields, and removing the temporary prevTime field. Since the difference between two dates is in milliseconds and your example uses seconds, we calculate the seconds by dividing by 1000 and truncating.
{$set: {time_difference: {$trunc: [{$divide: [{$subtract: ['$time', '$prevTime']}, 1000]}]}}},
{$unset: 'prevTime'}
/*
{
"_id" : ObjectId("50ed90a55502684f440001ac"),
"time" : ISODate("2013-02-13T15:45:41.148Z"),
"time_difference" : null
}
{
"_id" : ObjectId("50ed90a55502684f440001ad"),
"time" : ISODate("2013-02-13T15:45:42.148Z"),
"time_difference" : 1
}
{
"_id" : ObjectId("50ed90a55502684f440001ae"),
"time" : ISODate("2013-02-13T15:45:45.148Z"),
"time_difference" : 3
}
*/
The one thing you will want to make sure of here is that you have a sort on the query you wish to use to garnish your records. If no sort is used it will actually use find order, which is not $natural order.
Find order can differ between queries so if you run the query twice within the period of 2 minutes you might find that they don't return the same order. It does seem however that your query would be logically sorted on tiem_difference.
It should also by noted that this is not possible through normal querying. I also do not see an easy way doing this through the aggregation framework.
So already it seems the next plausible method is either using multiple queries or client side processing. Client side processing is probably the better here using a function like the one defined by #Marlon above.
One thing, I want to clear you. Unlike MYSQL, MongoDB is not give gurantee to the position. I mean, MongoDB will give you different sort at different time. So compare adjacent document may give different result, on every reading.
If you are fine with that and you want to compare then try with MongoDB's MapReduce http://docs.mongodb.org/manual/applications/map-reduce/
Assuming those 3 objects are coming through in an array, you could do something like the below:
var prevTime;
var currentTime;
for(var i = 0; i < records.length; i++)
{
currentTime = new Date(records[i].time).getTime();
records[i].time_difference = currentTime - prevTime;
prevTime = currentTime;
}
Of course you'll need to swap bits out to make it use the records from mongo.
If you need to do any more complex date calculations, I highly suggest checking out datejs (which you can get a node wrapper for if you want).

Categories