Difference between MERGE and CREATE in Cypher Neo4j - javascript

I'm just importing some stuff from .csv file to Neo4j. I've always used MERGE to create a node, but now, when trying to import from .csv, some of data is null, e.g. column address. When I'm doing MERGE instead of CREATE it gives an error, but when I do CREATE it works fine. The only difference I know between MERGE and CREATE is that if the node already exists, MERGE doesn't make a new one.
My query:
LOAD CSV WITH HEADERS FROM '<path>' as line
CREATE (a: Address
{
address: line.address,
postalCode: toInteger(line.postalCode),
town: line.town,
municipalityNr: toInteger(line.municipalityNr),
municipality: line.municipality,
countryCode: line.countryCode,
country: line.country
})
RETURN a.address

When doing a MERGE, Neo4j expects a value that it can merge on. MERGE using a null will always result in an error.
In general, this is the approach:
Only MERGE on properties that are relevant to finding a unique node. So for instance, if you want to MERGE cars, the licence plate would be the property to use.
Make sure you have a CONSTRAINT for the property you MERGE on. This will help speed up the import.
To avoid nulls in the MERGE, you can use COALESCE().
After the MERGE, you can SET the other properties, which may have nulls.
MERGE {c:Car {licensePlate: COALESCE(line.licensePlate,'Unknown') })
SET c.color = line.color,
c.someproperty = line.someproperty
At the end of the run, you will find a single :Car node with licensePlate:'Unknown'

Agree with Graphileon.
There is a trap you should avoid. You also want to avoid properties that may vary within a data set. For instance the same person has a birth date in different formats (e.,g, 5/16/85, 05/16/1985), you would add duplicate nodes if birth date was part of the merge criteria. To avoid this, you can do the merge without the date and then, in a subsequent step, add the property ... of course, only the last such effort would endure.
You can check to see that there are no duplicate nodes. In the example given, if your sure the name is unique, or license number of the car:
match (p:Person) return p.name, count(*) as ct order by ct desc

Related

Chai - Expect an object to have deep property of an array ignore order

I need to check if the payment has a property named transactions with expected values:
expect(payment).to.have.deep.property('transactions', [
TRANSACTION_ID_1,
TRANSACTION_ID_2,
]);
As the order of transactions is not specified, the test doesn't pass all the time.
How can I solve the problem without changing the test structure?
Note: I've found deep-equal-in-any-order plugin, but it seems it doesn't help.
Iterate through array and check if it includes each item.
[TRANSACTION_ID_1, TRANSACTION_ID_2].forEach(id => {
expect(payment).to.have.deep.property('transactions').that.includes(id);
});
If you need to check transactions is unordered array of expected IDs, then check for the length as well.
expect(payment).to.have.deep.property('transactions').that.has.lengthOf(2);
When transactions has all of the expected ids and has same length of expected ids, then it equals to the expected ids when ordered properly.

'default' option in pgp.as.format()

I need to format SQL query with default option for missing object fields. I can do it with an external call to pgp.as.format:
let formattedQuery = pgp.as.format('INSERT INTO some_table (a,b,c) VALUES ($(a), $(b), $(c))', object, {default: null});
db.none(formattedQuery);
Is it possible to pass default option directly without pre-formatting the query? Basically, i would like to do something like this:
db.none('INSERT INTO some_table (a,b,c) VALUES ($(a), $(b), $(c))', object, {default: null})
I'm the author of pg-promise.
All query methods in pg-promise rely on the default query formatting, for better reliability, i.e. when a query template refers to a property, the property must exist, or else an error is thrown. It is logical to keep it that way, because a query cannot execute correctly while having properties in it that haven't been replaced with values.
Internally, the query engine does support advanced query formatting options, via method as.format, such as partial and default. And there are several objects in the library that make use of those options.
One in particular that you should use for generating inserts is helpers.insert, which can generate both single-insert and multi-insert queries. That method, along with even more useful helpers.update make use of type ColumnSet, which is highly configurable, supporting default values for missing properties (among other things), via type Column.
Using ColumnSet, you can specify a default value either for selective columns or for all of them.
For example, let's assume that column c may be missing, in which case we want to set it to null:
var pgp = require('pg-promise')({
capSQL: true // to capitalize all generated SQL
});
// declaring a reusable ColumnSet object:
var csInsert = new pgp.helpers.ColumnSet(['a', 'b',
{
name: 'c',
def: null
}
], {table: 'some_table'});
var data = {a:1, b:'text'};
// generating our insert query:
var insert = pgp.helpers.insert(data, csInsert);
//=> INSERT INTO "some_table"("a","b","c") VALUES(1,'text',null)
This makes it possible to generate multi-insert queries automatically:
var data = [{a:1, b:'text'}, {a:2, b:'hello'}];
// generating a multi-insert query:
var insert = pgp.helpers.insert(data, csInsert);
//=> INSERT INTO "some_table"("a","b","c") VALUES(1,'text',null),(2,'hello',null)
The same approach works nicely for single-update and multi-update queries.
In all, to your original question:
Is it possible to pass default option directly without pre-formatting the query?
No, and neither it should. Instead, you should use the aforementioned methods within the helpers namespace to generate correct queries. They are way more powerful and flexible ;)

MongoDB - Query conundrum - Document refs or subdocument

I've run into a bit of an issue with some data that I'm storing in my MongoDB (Note: I'm using mongoose as an ODM). I have two schemas:
mongoose.model('Buyer',{
credit: Number,
})
and
mongoose.model('Item',{
bid: Number,
location: { type: [Number], index: '2d' }
})
Buyer/Item will have a parent/child association, with a one-to-many relationship. I know that I can set up Items to be embedded subdocs to the Buyer document or I can create two separate documents with object id references to each other.
The problem I am facing is that I need to query Items where it's bid is lower than Buyer's credit but also where location is near a certain geo coordinate.
To satisfy the first criteria, it seems I should embed Items as a subdoc so that I can compare the two numbers. But, in order to compare locations with a geoNear query, it seems it would be better to separate the documents, otherwise, I can't perform geoNear on each subdocument.
Is there any way that I can perform both tasks on this data? If so, how should I structure my data? If not, is there a way that I can perform one query and then a second query on the result from the first query?
Thanks for your help!
There is another option (besides embedding and normalizing) for storing hierarchies in mongodb, that is storing them as tree structures. In this case you would store Buyers and Items in separate documents but in the same collection. Each Item document would need a field pointing to its Buyer (parent) document, and each Buyer document's parent field would be set to null. The docs I linked to explain several implementations you could choose from.
If your items are stored in two separate collections than the best option will be write your own function and call it using mongoose.connection.db.eval('some code...');. In such case you can execute your advanced logic on the server side.
You can write something like this:
var allNearItems = db.Items.find(
{ location: {
$near: {
$geometry: {
type: "Point" ,
coordinates: [ <longitude> , <latitude> ]
},
$maxDistance: 100
}
}
});
var res = [];
allNearItems.forEach(function(item){
var buyer = db.Buyers.find({ id: item.buyerId })[0];
if (!buyer) continue;
if (item.bid < buyer.credit) {
res.push(item.id);
}
});
return res;
After evaluation (place it in mongoose.connection.db.eval("...") call) you will get the array of item id`s.
Use it with cautions. If your allNearItems array will be too large or you will query it very often you can face the performance problems. MongoDB team actually has deprecated direct js code execution but it is still available on current stable release.

Efficient Sorted Data Structure in JavaScript

I'm looking for a way to take a bunch of JSON objects and store them in a data structure that allows both fast lookup and also fast manipulation which might change the position in the structure for a particular object.
An example object:
{
name: 'Bill',
dob: '2014-05-17T15:31:00Z'
}
Given a sort by name ascending and dob descending, how would you go about storing the objects so that if I have a new object to insert, I know very quickly where in the data structure to place it so that the object's position is sorted against the other objects?
In terms of lookup, I need to be able to say, "Give me the object at index 12" and it pulls it quickly.
I can modify the objects to include data that would be helpful such as storing current index position etc in a property e.g. {_indexData: {someNumber: 23, someNeighbour: Object}} although I would prefer not to.
I have looked at b-trees and think this is likely to be the answer but was unsure how to implement using multiple sort arguments (name: ascending, dob: descending) unless I implemented two trees?
Does anyone have a good way to solve this?
First thing you need to do is store all the objects in an array. That'll be your best bet in terms of lookup considering you want "Give me the object at index 12", you can easily access that object like data[11]
Now coming towards storing and sorting them, consider you have the following array of those objects:
var data = [{
name: 'Bill',
dob: '2014-05-17T15:31:00Z'
},
{
name: 'John',
dob: '2013-06-17T15:31:00Z'
},
{
name: 'Alex',
dob: '2010-06-17T15:31:00Z'
}];
The following simple function (taken from here) will help you in sorting them based on their properties:
function sortResults(prop, asc) {
data = data.sort(function(a, b) {
if (asc) return (a[prop] > b[prop]);
else return (b[prop] > a[prop]);
});
}
First parameter is the property name on which you want to sort e.g. 'name' and second one is a boolean of ascending sort, if false, it will sort descendingly.
Next step, you need to call this function and give the desired values:
sortResults('name', true);
and Wola! Your array is now sorted ascendingly w.r.t names. Now you can access the objects like data[11], just like you wished to access them and they are sorted as well.
You can play around with the example HERE. If i missed anything or couldn't understand your problem properly, feel free to explain and i'll tweak my solution.
EDIT: Going through your question again, i think i missed that dynamically adding objects bit. With my solution, you'll have to call the sortResults function everytime you add an object which might get expensive.

suggest me the different options to store this data in javascript

I am getting the data in the following format
[Sl.No, Amount, Transaction Type, Account Number]
[01, $10000, Deposit, 473882829]
[02, $10202, Deposit, 348844844]
[02, $10202, Withdrawal, 348844844]
What is the best way to store this data in Javascript for faster retrieval
var data = ["02", "$10202", "Withdrawal", 348844844]
//empty object
var list = {};
//Constructing the index by concatenating the last two elements of the data.
//Probably this will give the primary key to the data.
var index = data[2] + data[3].toString();
//Store the data using the index
list[index] = data;
You can retrieve the data using the index constructed above.
Determine how the data needs to be accessed. I am guessing it needs to be accessed linearly, so as to not accidentally overdraw (withdraw before deposit), etc -- in this case an Array is generally a suitable data structure. A simple for-loop should be able to find most simple "queries".
Define how the object is represented in Javascript -- is each "transaction item" an array of [x, amount, type, account] or is it an object with the signature {x: ..., amount: ..., type: ..., account: ...} (I often opt for the latter as it adds some self-documentation). It could also be an object created with new TransactionItem (or whatnot) that includes methods.
Determine how to convert the raw data into the chosen object representation. For a well-defined format a simple regular expression or 'split' may work.
Use data. Remember that even a shoddy O(n^2) algorithm is generally "fast enough" for small n. Best to get it working first.
Happy coding.

Categories