JSON Schema: array where non-null elements are unique

JSON Schema: array where non-null elements are unique - javascript

I am trying to construct a JSON schema that meets the following:
Declares a top-level object with at least one property
The value of each property will be an array, each of which must contain exactly N items
Array items must be integers taken from the closed interval [J, K], or null
Integer items in each array must be unique within that array
There is no uniqueness constraint applied to null (so no implied relationship between N and the interval size K-J)
The problem I am running into is #4 and #5. It is easy enough to meet the first 3 requirements, plus part of the 4th, using this schema:
{
"$schema": "http://json-schema.org/draft/2019-09/schema#",
"type": "object",
"minProperties": 1,
"additionalProperties": {
"type": "array",
"minItems": N,
"maxItems": N,
"items": {
"anyOf": [
{
"type": "integer",
"minimum": J,
"maximum": K
},
{
"type": "null"
}
]
},
"uniqueItems": true
}
}
I am not sure how (or if it's even possible) to specify an array that applies the uniqueItems constraint to only a subset of the allowable items. I tried moving uniqueItems to lower levels of the schema with the hope that it might operate with restricted scope, but that doesn't work.
This might be possible using conditionals, but I haven't gone down that road yet since I'm not sure it will actually work, and I am hoping there is an easier approach that I have overlooked.
So, my question is: Is there a way to specify a JSON schema array that selectively enforces a uniqueness constraint only on the items that are not null?

this is beyond the capabilities of uniqueItems and not a constraint JSON Schema is able to express. you will need to check this requirement elsewhere in your application's business logic.

Related

Failed to parse field of type integer

When importing a document, I get an error that is attached below.
I guess the problem arose when the data provider (esMapping.js) was changed, to use the integer sub-field to sort documents.
Is it possible to use some pattern to sort the document so that this error does not occur again? Does anyone have an idea?
The question refers to the one already asked - Enable ascending and descending sorting of numbers that are of the keyword type (Elasticsearch)
Error:
022-05-18 11:33:32.5830 [ERROR] ESIndexerLogger Failed to commit bulk. Errors:
index returned 400 _index: adama_gen_ro_importdocument _type: _doc _id: 4c616067-4beb-4484-83cc-7eb9d36eb175 _version: 0 error: Type: mapper_parsing_exception Reason: "failed to parse field [number.sequenceNumber] of type [integer] in document with id '4c616067-4beb-4484-83cc-7eb9d36eb175'. Preview of field's value: 'BS-000011/2022'" CausedBy: "Type: number_format_exception Reason: "For input string: "BS-000011/2022"""
Mapping (sequenceNumber used for sorting):
"number": {
"type": "keyword",
"copy_to": [
"_summary"
],
"fields": {
"sequenceNumber": {
"type": "integer"
}
}
}

In the returned error message, the value being indexed into the number field is a string with alphabetical characters, 'BS-000011/2022'. This is no problem for the number field that has a keyword type. However, it is an issue for the sequenceNumber sub-field which has an integer type. The text value passed into number is also passed into sequenceNumber sub-field, hence the error.
Unfortunately, the text analyzer used in the previous question won't help either, as sorting can't be performed on a text field. However, the tokenizer used by the custom analyzer document_number_analyzer can be repurposed into an ingest pipeline.
The custom tokenizer, for context, provided by the author in the previous question :
"tokenizer": {
"document_number_tokenizer": {
"type": "pattern",
"pattern": "-0*([1-9][0-9]*)\/",
"group": 1
}
}
If the custom analyzer is used, with the Elasticsearch _analyze API on the value above like so (stack_index being a temporary index to use the analyzer) :
POST stack_index/_analyze
{
"analyzer": "document_number_analyzer",
"text": ["BS-000011/2022"]
}
The analyzer returns one token of 11, but tokens are for search analysis, not sorting.
An Elasticsearch ingest pipeline, using the grok processor, can be applied to the index to perform the extraction of the desired number from the value and indexed as an integer. The processor needs to be configured to expect the value's format, which would be similar to 'BS-0000011/2022'. An example is provided below:
PUT _ingest/pipeline/numberSort
{
"processors": [
{
"grok": {
"field": "number",
"patterns": ["%{WORD}%{ZEROS}%{SORTVALUES:sequenceNumber:int}%{SEPARATE}%{NUMBER}"],
"pattern_definitions": {
"SEPARATE": "[/]",
"ZEROS" : "[-0]*",
"SORTVALUES": "[1-9][0-9]*"
}
}
}
]
}
Grok takes an input text value and extracts structured fields from it. The pattern where the sortable number will be extracted is the SORTVALUES pattern, %{SORTVALUES:sequenceNumber:int}. A new field, called sequenceNumber, will be created in the document. When 'BS-000011/2022' is indexed in the number field, 11 is indexed into the sequenceNumber field as an integer.
You can then create an index template to apply the ingest pipeline. The sequenceNumber field will need to be explicitly added as an integer type. The ingest pipeline will automatically index into as long as a value matching the format of the input above is indexed into the number field. The sequenceNumber field will then be available to sort on.

Deleting an object from a nested array in DynamoDB - AWS JavaScript SDK

I'm building an app where I need to delete items stored in the database. Here's a (shortened) example of user data I have in my DynamoDB table called 'registeredUsers':
{
"userId": "f3a0f858-57b4-4420-81fa-1f0acdec979d"
"aboutMe": "My name is Mary, and I just love jigsaw puzzles! My favourite jigsaw category is Architecture, but I also like ones with plants in them.",
"age": 27,
"email": "mary_smith#gmail.com",
"favourites": {
"imageLibrary": [
{
"id": "71ff8060-fcf2-4523-98e5-f48127d7d88b",
"name": "bird.jpg",
"rating": 5,
"url": "https://s3.eu-west-2.amazonaws.com/jigsaw-image-library/image-library/images/bird.jpg"
},
{
"id": "fea4fd2a-851b-411f-8dc2-1ae0e144188a",
"name": "porsche.jpg",
"rating": 3,
"url": "https://s3.eu-west-2.amazonaws.com/jigsaw-image-library/image-library/images/porsche.jpg"
},
{
"id": "328b913f-b364-47df-929d-925676156e97",
"name": "rose.jpg",
"rating": 0,
"url": "https://s3.eu-west-2.amazonaws.com/jigsaw-image-library/image-library/images/rose.jpg"
}
]
}
}
I want to be able to delete the item 'rose.jpg' in the user.favourites.imageLibrary array. In order to select the correct user, I can provide the userId as the primary key. Then, in order to select the correct image in the array, I can pass the AWS.DocumentClient the 'id' of the item in order to delete it. However, I'm having trouble understanding the AWS API Reference docs. The examples given in the developer guide do not describe how to delete an item by looking at one of it's attributes. I know I have to provide an UpdateExpression and an ExpressionAttributeValues object. When I wanted to change a user setting, I found it pretty easy to do:
const params = {
TableName: REGISTERED_USERS_TABLE,
Key: { userId },
UpdateExpression: "set userPreferences.difficulty.showGridOverlay = :d",
ExpressionAttributeValues: {
":d": !showGridOverlay
},
ReturnValues: "UPDATED_NEW"
};
To conclude, I need a suitable Key, UpdateExpression and ExpressionAttributeValues object to access the rose.jpg item in the favourites array.

Unfortunately, the UpdateExpression syntax is not as powerful as you would have liked. It supports entire nested documents inside the item, but not sophisticated expressions to search in them or to modify them. The only ability it gives you inside a list is to access or modify its Nth element. For example:
REMOVE #favorites.#imagelibrary[3]
Will remove the 3rd element of imagelibrary (note that the "#imagelibrary" will need to be defined in ExpressionAttributeNames), and you can also have a condition on #favorites.#imagelibrary[3].#id, for example, in ConditionExpression. But unfortunately, there is no way to specify more complex combinations of conditions and updates, such as "find me the i where #favorites.#imagelibrary[i].#id is equal something, and then REMOVE this specific element".
Your remaining option is to read the full value of the item (or with ProjectionExpression just the #favorties.#imagelibrary array), and then in your own code find which of the elements you want to remove (e.g., discover that it is the 3rd element), and then in a separate update, remove the 3rd element.
Note that if there's a possibility that some other parallel operation also changes the item, you must use a conditional update (both UpdateExpression and ConditionExpression) for the element removal, to ensure the element that you are removing still has the id you expected. If the condition fails, you need to repeat the whole operation again - read the modified item again, find the element again, and try to remove it again. This is an example of the so-called "optimistic locking" technique which is often used with DynamoDB.

How does MongoDB handle arrays when passed into sort

This code is sorting by different fields within the document and some of those fields are arrays of objects with the key I want to sort by. I don't understand the behavior that I am seeing when I run the following queries. I don't have a lot of experience with mongodb and didn't write these queries.
const dbCursor = connection.collection('container').find({});
cursor.sort({ "titles.title": 1 });
// These don't happen one after the other as shown here. It's either or.
cursor.sort({ "dates.start": 1 });
Titles and dates are both arrays of objects containing the key passed. Dates appears to be sorting by start date even though there may be multiple object with the the key start in it. Title is not sorting alphabetically and appears to be very random. I don't understand what is actually happening when this type of sort is performed in MongoDB.
How is Mongodb handling the array?
Is it only checking the first element in the array?
Is it checking all the elements in the array?
Is there a better way to perform this type of sorting when dealing with arrays?
// one
{ "titles": [{ "title": "Zippy Mississippi Race" }, { "title": "Wacky
Races"}] }
// two
{ "titles": [{ "title": "New Looney Tunes" }, { "title": "Your Bunny
or Your Life/Misjudgment Day" }] }
// three
{ "titles": [{ "title": "Why Oh Why Wyoming" }, { "title": "Wacky Races" }] }
Returns in this order
Update:
So I have discovered that its sorts all the elements in the array. I just don't understand how it determines who the winner is. Can anyone explain why this order is correct based on sorting all the elements in the array?

why do only some JSON subcategories use "[" and some other "{"

In JSON, subcategories are sometimes defined using "{" and at other times using "[".
in this example: games -> box -> template , why "[" after games only?
How should the following XML be defined in JSON. How and when should I use "[" and "{"?
<games>
<game id="21934">
<name>Star Wars: The Old Republic</name>
<popularity>30</popularity>
</game>
</games>
Can you give me a good comparison with XML ?
"games": [
{
"name": "Star Wars: The Old Republic",
"popularity": 30,
"id": 21934,
"giantbomb_id": 24205,
"box": {
"template": "http://static-cdn.jtvnw.net/ttv-boxart/Star%20Wars%3A%20The%20Old%20Republic-{width}x{height}.jpg",
"small": "http://static-cdn.jtvnw.net/ttv-boxart/Star%20Wars%3A%20The%20Old%20Republic-52x72.jpg",
"medium": "http://static-cdn.jtvnw.net/ttv-boxart/Star%20Wars%3A%20The%20Old%20Republic-136x190.jpg",
"large": "http://static-cdn.jtvnw.net/ttv-boxart/Star%20Wars%3A%20The%20Old%20Republic-272x380.jpg"
},

You can best answer this question by reading the documentation at json.org.
[ ] are used to define arrays, whereas { } are used to declare objects. Objects are really a form of associative array (mapping name indices to values instead of number
indices to values). In JSON arrays however, the number indices are implicit.
The main advantages of JSON are that it is a subset of Javascript and that it is a compact data interchange format when compared to XML, which is more verbose. JSON data only needs minimal validation whereas XML requires complex parsing. JSON also sacrifices the so called readabilty element of XML, although personally speaking I find it easier to scan JSON to find mistakes than I do wading through XML elements and attributes.
To take your games example, in XML a list of games would be something like this:
<games>
<game id="21934">
<name>Star Wars: The Old Republic</name>
<type>MMORG</type>
</game>
<!-- more game blocks here -->
<game id="12345">...</game>
</games>
In the above example I have skipped niceties such as declaring the fact it is an XML document, linking the above file to a Data Type Definition (DTD) etc.
In JSON the file would probably just be something like this:
{
"games": [
{ "id": 21934, "name" : "Star Wars: The Old Republic", "type": "MMORG" },
{ "id": 12345, .... }
]
}
You could read the above object directly into a Javascript variable and it would be accepted as valid javascript without further processing. It's much faster and easier to get along with. One thing to note is that despite the fact that "games" is an array of objects, it has been encapsulated in {} to be read as a single object.
So in summary, XML is a formal way of exchanging information, whereas JSON sacrifices the formality for ease and speed of use. Be warned however that JSON does have rules and very minor infractions can cause failure to read some or all of the data, depending on browser implementation.

The [] syntax is for arrays where you locate members by number.
The {} syntax is for objects where you locate members by name.

JSON with empty numeral value? (not 0)

Given a JSON such:
[
{ "id":"A", "status": 1, "rank":1, "score": },
{ "id":"B", "status": 1, "rank":1, "score": }
]
My script fails due to the empty score.
Given a JS such :
if (json[i].score) { do something } else { calculate it }
I want to keep the field empty, and not use 0. I may use "score": "", but this will imply that it's a string (empty at start), while I want score to be a numeral (empty at start). So the number I push in it stay a number, and not a string.
How to state an empty/undefined numeral ?
Note: I intuitively think it's why I sometime meet undefined.
EDIT: the question was edited to clarify the context, the need to check the existence of obj.score, where 0 would be misleading.

TL;DR: Use 0 (zero), if everyone starts with zero score. Use null, if you specifically want to say that someone's score is not set yet. Do not define the property, if it should not be placed in specific case (like object that should not even have "score" property).
Omitting the value in JSON object
Well, in JSON you are not allowed to just omit the value. It must be set to something like number (integer or float), object, list, boolean, null or string. To read more about syntax, try this resource: http://json.org/. This is the diagram taken from that site, showing you the syntax of object representations in JSON:
Most popular approaches: null, 0 (zero), undefined
The usual approach is to set null. In other cases it can be better to use 0 (zero), if applicable (eg. the field means "sum").
Another approach is just to not set the property. After deserialization you are then able to perform tests checking the existence of specific property like that:
JavaScript:
if (typeof my_object.score !== 'undefined'){
// "score" key exists in the object
}
Python:
if 'score' in my_object:
pass # "score" key exists in the object
PHP:
if (array_key_exists('score', $my_object)) {
// "score" key exists in the object
}
Less consistent approaches: false, ""
Some people also accept the value to be false in such case, but it is rather inconsistent. Similarly when it comes to empty string (""). However, both cases should be properly supported in most programming languages during deserialization.

Why not just start score at 0? Everyone will start with a score of 0 in just about anything involving score.
[
{ "id":"A", "status": 1, "rank":1, "score": 0 },
{ "id":"B", "status": 1, "rank":1, "score": 0 }
]

By definition, a numeral value is a value type, which cannot be null. Only reference types (like strings, arrays, etc.) should be initialized to null.
For the semantic of your situation, I would suggest you to use a boolean to know if weither or not there is a score to be read.
[
{ "id":"A", "status": 1, "rank":1, "empty":true },
{ "id":"B", "status": 1, "rank":1, "empty":false, "score":100}
]
Then,
if (!foo.empty) {
var score = foo.score;
}
While a null could be tested as well, it is a wrong representation of a number.

We Keep Coding

JavaScript is the programming language of the Web.

JSON Schema: array where non-null elements are unique - javascript

this is beyond the capabilities of uniqueItems and not a constraint JSON Schema is able to express. you will need to check this requirement elsewhere in your application's business logic.

Related

Failed to parse field of type integer

Deleting an object from a nested array in DynamoDB - AWS JavaScript SDK

How does MongoDB handle arrays when passed into sort

why do only some JSON subcategories use "[" and some other "{"

JSON with empty numeral value? (not 0)

Categories

Resources