13/04/2018

Batch processing in Hibernate – Java’s ORM

Developers are often facing the problem of inserting large data, so this article could help them to
see the ways how to do it in proper way in order to avoid common-mistakes.

 

This is the way how batch processing shouldn’t be done:

public void insertBatch(List entities){
EntityManager em = DatabaseProvider.getEntityManager();
EntityTransaction transaction = em.getTransaction();

try {
transaction.begin();
for (int i = 0; i < entities.size(); i++){
em.persist(entities.get(i));
}
transaction.commit();
} catch (Exception ex){
ex.printStackTrace();
transaction.rollback();
}
}

 

There are several reasons why we should not use this approach:

  1. In case of large collection of entities – e.g. 1 million? This can cause the OutOfMemory error in the persistence context and we have not persisted anything in the database.
  2. Long-running operation which reserves one connection.
  3. In case that we need at least some of the information to be stored, this will not fit our expectations. In case of a failure, none of the records will be inserted.

 

So, the better approach would be:

public void batchInsertOneTransaction(List entities){
EntityManager entityManager = DatabaseProvider.getEntityManager();
EntityTransaction transaction = entityManager.getTransaction();
try {
transaction.begin();
for (int i = 0; i < entities.size(); i++){ entityManager.persist(entities.get(i)); if (i > 0 && i % 50 == 0){
entityManager.flush();
entityManager.clear();
}
}
transaction.commit();
} catch (Exception ex){
transaction.rollback();
ex.printStackTrace();
}
}

 

The reasons why this is a better approach are:

  1. Out of memory cannot happen because we clear our persistence context after we flush the data into database.
  2. This approach will save all the data, but in case of a failure, it will not save any data.

 

Another solution would contain more transactions:

public void batchInsertMultiTransaction(List entities){
EntityManager entityManager = DatabaseProvider.getEntityManager();
EntityTransaction transaction = entityManager.getTransaction();
try {
transaction.begin();
for (int i = 0; i < entities.size(); i++){ try { entityManager.persist(entities.get(i)); if (i > 0 && i % 50 == 0){
entityManager.flush();
entityManager.clear();
transaction.commit();
transaction.begin();
}
} catch (Exception ex){
transaction.rollback();
ex.printStackTrace();
}
}
} finally {
//if the number of entities is not divisable by 50
try {
entityManager.flush();
entityManager.clear();
transaction.commit();
} catch (Exception ex){
transaction.rollback();
ex.printStackTrace();
}
}
}

 

Advantage:

  1. In case of a failure, this approach will store some of the data which should be taken with awareness. In case that we need the exact order of the data, or if all data needs to be stored, or the application depends on the accumulation field, this has to be done with the previous algorithm.
  2. Beside that, with this approach you can mark the failed data and after cleaning them you can insert it again

 

Further:

When it comes to performance, if it’s low, writing native SQL is always a good option, but there is one requirement we need to fulfill.
You have to add the parameter “?rewriteBatchedStatements=true” to the connection url. The connection url will look like
jdbc:mysql://localhost:3306/simple_task_batch?rewriteBatchedStatements=true

Let’s assume that we have a specific entity Person.
This will produce the batch insert to be like

INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 1", "LAST NAME 1"), ("FIRST NAME 2", "LAST NAME 2"), ("FIRST NAME 3", "LAST NAME 3")...

Instead of:

INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 1", "LAST NAME 1");
INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 2", "LAST NAME 2");
INSERT INTO person (first_name,last_name) VALUES ("FIRST NAME 3", "LAST NAME 3");

 

And the method will look like this (with batch size e.g. 100000):

public void batchInsertNative(List entities){
Connection connection = DatabaseProvider.getConnection();

try {
connection.setAutoCommit(false);
PreparedStatement preparedStatement = connection.prepareStatement("INSERT INTO person (first_name,last_name) VALUES (?,?)");;
for (int i = 0; i < entities.size(); i++){
try {
preparedStatement.setString(1,entities.get(i).getFirstName());
preparedStatement.setString(2,entities.get(i).getLastName());
preparedStatement.addBatch();
if (i % 100000 == 0){
preparedStatement.executeBatch();
connection.commit();
}
} catch (Exception ex){
ex.printStackTrace();
connection.rollback();
}
}

//if the number of entities is not divisable by 100000
preparedStatement.executeBatch();
connection.commit();
connection.close();
} catch (Exception e){
e.printStackTrace();
}
}

 

To conclude, we will repeat one more time, this approach is the same as the example batchInsertMultiTransaction, but this time it’s written in native SQL,
so it will save some of the data in case of failure. But also, it depends on what do you want to achieve – either save all the data, just the part of them or none.

For starters, some of the skills you should have:

  • excellent knowledge of Java (OOP)
  • good knowledge of Spring Boot 2 
  • good understanding of source control tools such as Git (Github, Bitbucket, or Gitlab)
  • experience with relational databases 
  • apply good technical practices and contribute to the application architecture with a focus on scalability, maintainability, security, and performance.
  • a college degree that corresponds to the required vacancy and at least 2 years of working experience in Java
  • excellent verbal and written communication skills in English

We can give you a fair amount of great benefits such as:

  • remote work
  • flexible working hours, no overtime policy
  • challenging international projects
  • Private health insurance
  • we are offering an honest, supportive, and friendly working environment
  • knowledge-sharing, transparent and proactive communication among the team are some of our greatest values

 Feel free to apply for the job if you think we are a perfect match!

    For starters, some of the skills you should have:

    • understanding of OOP concepts
    • Knowledge of REST/JSON Web Services and back-office applications using the newest Microsoft stack of technologies including C#, ASP.NET/ASP.NET Core, WebAPI, T-SQL, SQL Server, Entity Framework
    • Good understanding of Javascript, HTML, CSS
    • Good knowledge of GIT
    • Apply good technical practices and contribute to the application architecture with a focus on scalability, maintainability, security, and performance.
    • a college degree that corresponds to the required vacancy and at least 3 years of working experience for Medior/ at least 5 years for Senior
    • excellent verbal and written communication skills in English

    We can give you a fair amount of great benefits such as:

    • remote work
    • flexible working hours, no overtime policy
    • challenging international projects
    • Private health insurance
    • we are offering an honest, supportive, and friendly working environment
    • knowledge-sharing, transparent and proactive communication among the team are some of our greatest values

     Feel free to apply for the job if you think we are a perfect match!

      For starters, some of the skills you should have:

      • A background or qualification in computer science or digital design
      • 5+ years of coding and design experience
      • 2+ years of experience in React
      • understanding of SOLID principles
      • good knowledge of GIT
      • good knowledge of design patterns
      • Good knowledge of UI/UX common practices and standards
      • Webpack
      • SCSS
      • Unit and E2E tests 
      • Experience or motivation to work in Framer Motion

      We can give you a fair amount of great benefits such as:

      • remote work
      • flexible working hours, no overtime policy
      • challenging international projects
      • Private health insurance
      • we are offering an honest, supportive, and friendly working environment
      • knowledge-sharing, transparent and proactive communication among the team are some of our greatest values

       Feel free to apply for the job if you think we are a perfect match!

        Your skills are:

        • Angular 2+
        • Typescript
        • RxJS 
        • HTML5 
        • CSS3

        ...and you may have experience with:

        • Webpack 
        • SCSS, SASS 
        • Unit and E2E tests

        We can give you a fair amount of great benefits such as:

        • remote work
        • flexible working hours, no overtime policy
        • challenging international projects
        • Private health insurance
        • we are offering an honest, supportive, and friendly working environment
        • knowledge-sharing, transparent and proactive communication among the team are some of our greatest values

         Feel free to apply for the job if you think we are a perfect match!

          For starters, some of the skills you should have:

          • excellent knowledge of Java
          • knowledge of Spring Boot 2
          • know how to create automated workflows with Webpack, Gulp
          • good understanding of HTML5 and CSS3
          • good understanding of source control tools such as Git (Github, Bitbucket, or Gitlab)
          • experience with relational databases
          • experience working with microservices
          • AWS or another cloud provider (GCP, Azure, etc.)
          • experience working with Linux/Unix
          • proficiency in standards-compliant and cross-browser compatible code
          • apply good technical practices and contribute to the application architecture with a focus on scalability, maintainability, security, and performance.
          • a college degree that corresponds to the required vacancy and at least 5 years of working experience in Java
          • excellent verbal and written communication skills in English

          We can give you a fair amount of great benefits such as:

          • remote work
          • flexible working hours, no overtime policy
          • challenging international projects
          • Private health insurance
          • we are offering an honest, supportive, and friendly working environment
          • knowledge-sharing, transparent and proactive communication among the team are some of our greatest values

           Feel free to apply for the job if you think we are a perfect match!

            For starters, some of the skills you should have:

            • good programming skills
            • familiarity with API Development (REST, Web API)
            • understanding of OOP concepts
            • a bachelor degree that corresponds to the required vacancy
            • strong proficiency in JavaScript, including DOM manipulation and the JavaScript object model
            • thorough understanding of React.js and its core principles
            • experience with popular React.js workflows (such as Redux)
            • familiarity with specifications of Typescript
            • experience with React Saga
            • good knowledge of HTML5 and CSS3

            We can give you a fair amount of great benefits such as:

            • remote work
            • flexible working hours, no overtime policy
            • challenging international projects
            • Private health insurance
            • we are offering an honest, supportive, and friendly working environment
            • knowledge-sharing, transparent and proactive communication among the team are some of our greatest values

             Feel free to apply for the job if you think we are a perfect match!