Getting Out of memory when importing 100,000 data (20MB file) to database.
I am using "db.t4g.medium" aurora mysql instance which has 4 GB ram.
Error Details :-
Error: Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space
at PromiseConnection.query (/var/task/node_modules/mysql2/promise.js:93:22)
at Function.getQueryResult (/var/task/wpcn-connect-to-rds/rdsProxyDBManager.js:10:49)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async Runtime.exports.handler (/var/task/wpcn-connect-to-rds/index.js:48:21) {
code: 'ER_OUT_OF_RESOURCES',
errno: 1041
Invoke Error {"errorType":"Error","errorMessage":"Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space","code":"ER_OUT_OF_RESOURCES","message":"Out of memory; check if mysqld or some other process uses all available memory; if not, you may have to use 'ulimit' to allow mysqld to use more memory or you can add more swap space","errno":1041
> Is there any way to solve this issue without increasing the RAM or instance type?
I am Using javascript/nodejs programming for connecting to mysql :-
static async getQueryResult (secret, query){
const connection = await mysql2.createConnection(secret);
const [rows, fields] = await connection.query(query);
connection.end();
return {
'statusCode': 200,
'body': rows
}
}
secret = {
host,
user: username,
password: password,
port: port,
waitForConnections: true,
multipleStatements: true,
connectionLimit: 10,
queueLimit: 0
}
The file that we are importing has 100,000 insert statements like shown below:-
INSERT INTO table1 (id,subject,description,progress,hours,startdt,duedt,private) VALUES(null,'test','1','0','0','2022-08-02 00:00:00.000','2022-08-02 00:00:00.000','1');
INSERT INTO table1 (id,subject,description,progress,hours,startdt,duedt,private) VALUES(null,'XSS','','1','1','2022-08-02 00:00:00.000','2022-08-02 00:00:00.000','0');
Is there any way to solve this issue without increasing the RAM or instance type?
You can modify the instance to a different instance type with more RAM, just for the duration of the data loading. The way to do that with the least disruption would be (a) add a new bigger instance to the cluster, (b) do failover to make that new instance the writer, (c) do the data load with no worries about out-of-memory or taking excessive time, (d) failover back to the original T instance, (e) delete the bigger instance.
T instances are memory-constrained, more than with RDS because Aurora is running more management software alongside the database. For that reason, they're recommended for dev/test but not production. See here for best practices when working on such small instances with Aurora:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.BestPractices.html#AuroraMySQL.BestPractices.T2Medium
Related
I have a BLE device that I wish to connect to through Web Bluetooth.
The BLE device's Characteristic is set up to only return values when requested over a secure connection, the bluetooth pairing type has to be 'Just Works'.
I am using Blazor WebAssembly and a nuget called Blazm.Bluetooth, but any pure javascript solution to this is also appreciated.
https://github.com/EngstromJimmy/Blazm.Bluetooth
I am able to get the requestDevice modal to show up in the browser, successfully pair to the device, and set up a notification handler that listens to changes in the characteristics value.
When I try to read a value from the device I get a response code telling me that I am not authorized to access data, since the device is setup to only allow access over a secure connection.
This leads me to the conclusion that a secure connection has not been established.
I am able to communicate with the device (in a different project) using the Windows.Devices.Bluetooth library, so I am certain that the device is working as expected once a secure connection has been established.
code for request:
export async function requestDevice(query)
{
var objquery = JSON.parse(query);
console.log(query);
var device = await navigator.bluetooth.requestDevice(objquery);
await device.gatt.connect();
device.addEventListener('gattserverdisconnected', onDisconnected);
PairedBluetoothDevices.push(device);
return { "Name": device.name, "Id": device.id };
}
function connect(bluetoothDevice) {
exponentialBackoff(3 /* max retries */, 1 /* seconds delay */,
function toTry() {
time('Connecting to Bluetooth Device... ');
return bluetoothDevice.gatt.connect();
},
function success() {
console.log('> Bluetooth Device connected. Try disconnect it now.');
},
function fail() {
time('Failed to reconnect.');
});
}
code for writing to the characteristic:
export async function writeValue(deviceId, serviceId, characteristicId, value)
{
var device = getDevice(deviceId);
console.log(device);
if (device.gatt.connected) {
var service = await device.gatt.getPrimaryService(serviceId);
var characteristic = await service.getCharacteristic(characteristicId);
var b = Uint8Array.from(value);
await characteristic.writeValueWithoutResponse(b);
}
else
{
await sleep(1000);
await writeValue(deviceId, serviceId, characteristicId, value);
}
}
I have used BTP BTVS with WireShark Bluetooth Sniffer to confirm that all request/response packages are identical byte for byte, to the working windows library based project, from the initial connection up to the point where the write response gives me the 'no access' error code.
My biggest question is if it is actually possible to establish a secure connection over the Web BLE API?
I am also interested in code suggestions I could try.
update:
I have tried using the 'add a device' functionality in windows Devices and Printer:
Adding the device in windows before connecting through Web BLE results in:
Microsoft.AspNetCore.Components.WebAssembly.Rendering.WebAssemblyRenderer[100]
Unhandled exception rendering component: GATT operation failed for unknown reason.
undefined
Microsoft.JSInterop.JSException: GATT operation failed for unknown reason.
undefined
at Microsoft.JSInterop.JSRuntime.<InvokeAsync>d__16`1[[Microsoft.JSInterop.Infrastructure.IJSVoidResult, Microsoft.JSInterop, Version=6.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60]].MoveNext()
at Microsoft.JSInterop.JSObjectReferenceExtensions.InvokeVoidAsync(IJSObjectReference jsObjectReference, String identifier, Object[] args)
at Blazm.Bluetooth.BluetoothNavigator.SetupNotifyAsync(Device device, String serviceId, String characteristicId)
at BlazorWebAssemblySample.Pages.Counter.Connect() in C:\project.page.razor:line 82
at Microsoft.AspNetCore.Components.ComponentBase.CallStateHasChangedOnAsyncCompletion(Task task)
at Microsoft.AspNetCore.Components.RenderTree.Renderer.GetErrorHandledTask(Task taskToHandle, ComponentState owningComponentState)
And the same thing but without running the notify setup code results in not being able to pair through Web BLE at all.
Pairing through Web BLE first and then trying to 'add a device' in windows results in windows not being able to find the device.
A "funny" observation is that if I pair through Web BLE first, meaning that the characteristics value change event handler is set up and listening, and I then connect through my Windows.Devices.Bluetooth based project, then all the values read by this project will also be sent to/caught by the Web BLE listener and seen in the browser.
The issue seems a bit similar to what we have seen on https://bugs.chromium.org/p/chromium/issues/detail?id=1271239. As a trial, could you try to pair the device ahead of time using Windows system pairing? Then write to the characteristic and see if it works.
Google Cloud SQL advertises that it's only $0.0150 per hour for the smallest machine type, and I'm being charged for every hour, not just hours that I'm connected. Is this because I'm using a pool? How do I setup my backend so that it queries the cloud db only when needed so I don't get charged for every hour of the day?
const mysql = require('mysql');
const pool = mysql.createPool({
host : process.env.SQL_IP,
user : 'root',
password : process.env.SQL_PASS,
database : 'mydb',
ssl : {
[redacted]
}
});
function query(queryStatement, cB){
pool.getConnection(function(err, connection) {
// Use the connection
connection.query(queryStatement, function (error, results, fields) {
// And done with the connection.
connection.destroy();
// Callback
cB(error,results,fields);
});
});
}
This is not so much about the pool as it is about the nature of Cloud SQL. Unlike App Engine, Cloud SQL instances are always up. I learned this the hard way one Saturday morning when I'd been away from the project for a week. :)
There's no way to spin them down when they're not being used, unless you explicitly go stop the service.
There's no way to schedule a service stop, at least within the GCP SDK. You could alway write a cron job, or something like that, that runs a little gcloud sql instances patch [INSTANCE_NAME] --activation-policy NEVER command at, for example, 6pm local time, M-F. I was too lazy to do that, so I just set a calendar reminder for myself to shut down my instance at the end of my workday.
Here's the MySQL Instance start/stop/restart page for the current SDK's docs:
https://cloud.google.com/sql/docs/mysql/start-stop-restart-instance
On an additional note, there is an ongoing 'Feature Request' in the GCP Platform to start/stop the Cloud SQL (2nd Gen), according to the traffic as well. You can also visit the link and provide your valuable suggestions/comments there as well.
I took the idea from #ingernet and created a cloud function which starts/stops the CloudSQL instance when needed. It can be triggered via a scheduled job so you can define when the instance goes up or down.
The details are here in this Github Gist (inspiration taken from here). Disclaimer: I'm not a python developer so there might be issues in the code, but at the end it works.
Basically you need to follow these steps:
Create a pub/sub topic which will be used to trigger the cloud function.
Create the cloud function and copy in the code below.
Make sure to set the correct project ID in line 8.
Set the trigger to Pub/Sub and choose the topic created in step 1.
Create a cloud scheduler job to trigger the cloud function on a regular basis.
Choose the frequency when you want the cloud function to be triggered.
Set the target to Pub/Sub and define the topic created in step 1.
The payload should be set to start [CloudSQL instance name] or stop [CloudSQL instance name] to start or stop the specified instance (e.g. start my_cloudsql_instance will start the CloudSQL instance with the name my_cloudsql_instance)
Main.py:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import base64
from pprint import pprint
credentials = GoogleCredentials.get_application_default()
service = discovery.build('sqladmin', 'v1beta4', credentials=credentials, cache_discovery=False)
project = 'INSERT PROJECT_ID HERE'
def start_stop(event, context):
print(event)
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
print(pubsub_message)
command, instance_name = pubsub_message.split(' ', 1)
if command == 'start':
start(instance_name)
elif command == 'stop':
stop(instance_name)
else:
print("unknown command " + command)
def start(instance_name):
print("starting " + instance_name)
patch(instance_name, "ALWAYS")
def stop(instance_name):
print("stopping " + instance_name)
patch(instance_name, "NEVER")
def patch(instance, activation_policy):
request = service.instances().get(project=project, instance=instance)
response = request.execute()
dbinstancebody = {
"settings": {
"settingsVersion": response["settings"]["settingsVersion"],
"activationPolicy": activation_policy
}
}
request = service.instances().patch(
project=project,
instance=instance,
body=dbinstancebody)
response = request.execute()
pprint(response)
Requirements.txt
google-api-python-client==1.10.0
google-auth-httplib2==0.0.4
google-auth==1.19.2
oauth2client==4.1.3
I've been working on a server and a push notification daemon that will both run simultaneously and interact with the same database. The idea behind this is that if one goes down, the other will still function.
I normally use Swift but for this project I'm writing it in Node, using Mongoose as my database. I've created a helper class that I import in both my server.js file and my notifier.js file.
const Mongoose = require('mongoose');
const Device = require('./device'); // This is a Schema
var uri = 'mongodb://localhost/devices';
function Database() {
Mongoose.connect(uri, { useMongoClient: true }, function(err) {
console.log('connected: ' + err);
});
}
Database.prototype.findDevice = function(params, callback) {
Device.findOne(params, function(err, device) {
// etc...
});
};
module.exports = Database;
Then separately from both server.js and notifier.js I create objects and query the database:
const Database = require('./db');
const db = new Database();
db.findDevice(params, function(err, device) {
// Simplified, but I edit and save things back to the database via db
device.token = 'blah';
device.save();
});
Is this safe to do? When working with Swift (and Objective-C) I'm always concerned about making things thread safe. Is this a concern? Should I be worried about race conditions and modifying the same files at the same time?
Also, bonus question: How does Mongoose share a connection between files (or processes?). For example Mongoose.connection.readyState returns the same thing from different files.
The short answer is "safe enough."
The long answer has to do with understanding what sort of consistency guarantees your system needs, how you've configured MongoDB, and whether there's any sharding or replication going on.
For the latter, you'll want to read about atomicity and consistency and perhaps also peek at write concern.
A good way to answer these questions, even when you think you've figured it out, is to test scenarios: Hammer a duplicate of your system with fake data and events and see if what happen is OK or not.
I use the following API in my program to detrmine free port and provide it to application to run
portscanner.findAPortNotInUse(3000, 65000, '127.0.0.1', function(error, port) {
console.log('AVAILABLE PORT AT: ' + port)
})
https://github.com/baalexander/node-portscanner
This free port are given to application for use and working OK.
The problem is that if I provide a free port to application A and the application is doesn't occupied it yet(sometimes it takes some time...) and there is coming other application B and request a free port so it give to APP B the port of app A
Which cause to problem...
is there any elegant way to solve it?
my application doesn't have state so it cannot save to which app get which port...
There is solution that we can randomize the range but this is not robust ...
In my application Im getting the URL of the app that I should provide the free port to run.
update
I cannot use some broker or someting else that will controll this outside I need to find some algorithm (maybe with some smart random ) that can help me to do it internally i.e. my program is like singleton and I need some trick how to give port between 50000 to 65000 that will reduce the amount of collision of port that was provided to the apps
update 2
I've decided to try something like the following what do you think ?
using lodash https://lodash.com/docs/4.17.2#random to determine ports between with loops that provide 3(or more if that make sense) numbers for ranges like following
portscanner.findAPortNotInUse([50001, 60000, 600010], '127.0.0.1', function(err, port) {
if(err) {
console.log("error!!!-> " +err);
}else {
console.log('Port Not in Use ' + port);
}
//using that in a loop
var aa = _.random(50000, 65000);
Then If I got false in the port i.e. all 3 port are occupied ,run this process again for 3 other random number.comments suggestion are welcomed!!!
I try to find some way to avoid collision as much as possible...
I would simply accept the fact that things can go wrong in a distributed system and retry the operation (i.e., getting a free port) if it failed for whatever reason on the first attempt.
Luckily, there are lots of npm modules out there that do that already for you, e.g. retry.
Using this module you can retry an asynchronous operation until it succeeds, and configure waiting strategies, and how many times it should be retried maximally, and so on…
To provide a code example, it basically comes down to something such as:
const operation = retry.operation();
operation.attempt(currentAttempt => {
findAnUnusedPortAndUseIt(err => {
if (operation.retry(err)) {
return;
}
callback(err ? operation.mainError() : null);
});
});
The benefits of this solution are:
Works without locking, i.e. it is efficient and makes low usage of resources if everything is fine.
Works without a central broker or something like that.
Works for distributed systems of any size.
Uses a pattern that you can re-use in distributed systems for all kinds of problems.
Uses a battle-tested and solid npm module instead of handwriting all these things.
Does not require you to change your code in a major way, instead it is just adding a few lines.
Hope this helps :-)
If your applications can open ports with option like SO_REUSEADDR, but operation system keeps ports in the list in TIME_WAIT state, you can bind/open port you want to return with SO_REUSEADDR, instantly close it and give it back to application. So for TIME_WAIT period (depending on operation system it can be 30 seconds, and actual time should be decided/set up or found by experiment/administration) port list will show this port as occupied.
If your port finder does not give port numbers for ports in TIME_WAIT state, problem solved by relatively expensive open/close socket operation.
I'd advise you look for a way to retain state. Even temporary state, in memory, is better than nothing at all. This way you could at least avoid giving out ports you've already given out. Because those are very likely not free anymore. (This would be as simple as saving them and regenerating a random port if you notice you found a random port you've already given out). If you don't want collisions, build your module to have state so it can avoid them. If you don't want to do that, you'll have to accept there are going to be collisions sometimes when there don't need to be.
If the URLs you get are random, the best you can do is guess randomly. If you can derive some property in which the URLs uniquely and consistently differ, you could design something around that.
Code example:
function getUnusedPort(url) {
// range is [0, 65001). (inclusive zero, exclusive 65001)
const guessPort = () => Math.floor(Math.random() * 15001) + 50000;
let randomPort = guessPort();
while (checkPortInUse(randomPort)) {
randomPort = guessPort();
}
return randomPort;
}
Notes:
checkPortInUse will probably be asynchronous so you'll have to
accommodate for that.
You said 'between 50000 and 65000'. This is from 50000 up to and including 65000.
When managing multiple applications or multiple servers, where one must be right the first time (without retrying), you need a single source of truth. Applications on the same machine can talk to a database, a broker server or even a file, so long as the resource is "lockable". (Servers work in similar ways, though not with local files).
So your flow would be something like:
App A sends request to service to request lock.
When lock is confirmed, start port scanner
When port is used, release lock.
Again, this could be a "PortService" you write that hands out unused ports, or a simple lock in some shared resource so two things are getting the same port at the same time.
Hopefully you can find something suitable to work for your apps.
As you want to find an port that is not in use in your application, you could do is run following command:
netstat -tupln | awk '{print $4}' | cut -d ':' -f2
so in your application you will use this like:
const exec = require('child_process').exec;
exec('netstat -tupln | awk '{print $4}' | cut -d ':' -f2', (error, stdout, stderr) => {
if (error) {
console.error(`exec error: ${error}`);
return;
}
var listPorts = stdout.split(/\n/)
console.log(listPorts); // list of all ports already in use
var aa = _.random(50000, 65000); // generate random port
var isFree = (listPorts.indexOf(aa)===-1) ? true : false;
if(isFree){
//start your appliation
}else{
// restart the search, write this in a function and start search again
}
});
this should give you list of all ports that are in use,so use any port except ones in the listPorts.
Hi guys I have a problem that i don't really have idea how to solve. it's also a bit strange :/
Basically I have created this Lambda function to connect to a mysql DB using the node package 'mysql'.
If i run the function from command line on my pc using the command 'sls function run function1' and make different queries everything is fine.
But when I call the function from a web browser using the link, I have to refresh the page 2 times to get the right result because at the first refresh the server respond with the old result.
I have noticed that from the command line I always have different thredID while from webbrowser is always the same.
Also I don't close the connection in the lambda function code because everything is fine if i run the function from command line but from browser I can only make 2 queries and then I get a message that say that I cannot use a closed connection.
So it seems like Lambda store the old query result when I call it from web browser.
Obviously I'm making same stupid mistake but I don't know how to solve it.
Does anyone have an idea?
Thanks :)
'use strict';
//npm packages
var mysql=require('mysql');
var deasync = require('deasync');
//variables
var goNext=false; //use to synchronize deasync
var error=false; //it becomes TRUE if an error occured during the connection to the DB
var dataColumnTable; //the data thet you extract from the query to the DB
var errorMessage;
//----------------------------------------------------------------------------------------------------------------
//always same credentials
var connection = mysql.createConnection({
host : 'hostAddress',
user : 'Puser',
password : 'password',
port : '3306',
database : 'database1',
});
//----------------------------------------------------------------------------------------------------------------
module.exports.handler = function(event, context) {
var Email=event.email;
connection.query('SELECT City, Address FROM Person WHERE E_Mail=?', Email, function(err, rows) {
if(err){
console.log("Cannot connect to DB");
console.log(err);
error=true;
errorMessage=err;
}
else{
console.log("data from column acquired!");
dataColumnTable=rows;
}
//connection.end(function(err) {
// connection.destroy();
//});
//console.log("Connection closed!");
goNext=true;
});
require('deasync').loopWhile(function(){return goNext!=true;});
//----------------------------------------------------------------------------------------------------------------
if(error==true)
return callback('Error '+ errorMessage);
else
return callback(null,dataColumnTable); //return a JsonFile
//fine headler
};
Disclaimer: I'm not very familiar with AWS and/or AWS Lambda.
http://docs.aws.amazon.com/lambda/latest/dg/programming-model-v2.html states (emphasis mine):
Your Lambda function code must be written in a stateless style, and have no affinity with the underlying compute infrastructure. Your code should expect local file system access, child processes, and similar artifacts to be limited to the lifetime of the request. Persistent state should be stored in Amazon S3, Amazon DynamoDB, or another cloud storage service. Requiring functions to be stateless enables AWS Lambda to launch as many copies of a function as needed to scale to the incoming rate of events and requests. These functions may not always run on the same compute instance from request to request, and a given instance of your Lambda function may be used more than once by AWS Lambda.
Opening a connection and storing it in a variable outside your handler function is state. The connection will likely be closed between requests or even before your first request. Your lambda function may be reused (hence identical thread ids).
My assumption would be (and an attempt to solve this problem), that you need to create the connection on every request (i.e., inside your handler) and may not expect any value be as initialized or as on last request. (except for constants probably).