Neo4j : Graphs for everyone - Renix Informatics

Graph DB

Posted on October 3, 2023

To manage a growing volume of interconnected data, you can use Neo4j, a non-relational graph database optimized for managing relationships. Neo4j can help you build high-performance and scalable applications that use large volumes of connected data.

Many software developers have limited knowledge about the capabilities of graph databases, including Neo4j. This article aims to provide an overview of graph databases and explain when Neo4j can be useful.

Before getting into Neo4j, let us know about Graph Database.

What is Graph Database?

A graph database stores nodes and relationships instead of tables, or documents. Data is stored just like you might sketch ideas on a whiteboard. Your data is stored without restricting it to a predefined model, allowing a very flexible way of thinking about and using it.

Large enterprises may use complex queries to pull precise and in-depth information regarding their customer and user information or product tracking data, among other uses. In that scenario, graph databases are more useful to extract the information in an effective way.

Neo4j as a Graph Database

Graph databases are based on graph theory from mathematics. Graphs are structures containing vertices (representing entities, such as people or things) and edges (representing connections between vertices). Edges can have numerical values called weight.

Neo4j provides its own implementation of graph theory concepts. Let’s take an in-depth look at the Labeled Property Graph Model in the Neo4j database. It has the following components:

Nodes ( VERTICES ): These are the main data elements that are interconnected through relationships. A node can have one or more labels (that describe its role) and properties (i.e. attributes).
Relationships ( EDGES ): A relationship connects two nodes that, in turn, can have multiple relationships. Relationships can have one or more properties.
Labels: These are used to group nodes, and each node can be assigned multiple labels. Labels are indexed to speed up finding nodes in a graph.
Properties: These are attributes of both nodes and relationships. Neo4j allows for storing data as key-value pairs, which means properties can have any value (string, number, or boolean).

We use Cypher Query Language ( CQL ) to create, update, delete, and access the Neo4j Database which is almost similar (not all ) to the SQL used for RDBMS.

A snippet of code in CQL to create nodes and relationships in Neo4j :

//data stored with this direction
CREATE (p:Person)-[:LIKES]->(t:Technology)

//query relationship backwards will not return results
MATCH (p:Person)<-[:LIKES]-(t:Technology)

//better to query with undirected relationship unless sure of direction
MATCH (p:Person)-[:LIKES]-(t:Technology)

Refer to this courses for better understanding – graphacademy.neo4j.com

Where and how is Neo4j used?

Neo4j is used today by thousands of startups, educational institutions, and large enterprises

In all sectors including financial services, government, energy, technology, retail, and manufacturing. From innovative new technology to driving businesses, users are generating insights with graphs, generating new revenue, and improving their overall efficiency.

Implementation :

Building an Email Targeting system with Neo4j

Now that you know what the Neo4j database is and what opportunities it provides to businesses, you’re ready to take a look at a real-life example of how you can apply this data storage technology. We’ve decided to build a simple email targeting system with a Neo4j database, as an email targeting system is an important feature for lots of online businesses, namely online stores, and marketplaces. Our email targeting system will help analyze customer behavior and decide which offers to target audiences with.

Step 1: Installing Neo4j

For our sample email targeting system, we only need to download and install Neo4j Server. We could use **Neo4j Desktop,** but it contains extra functions, most of which we don’t need.

Installing Neo4j Server is quite simple. You can use the community server from Neo4j 5.1.0 Community Edition for your system OS and requirements

Step 2: Launching the Neo4j Browser

After installing the Neo4j Server, it’s time to run it using the command <NEO4J_HOME>/bin/neo4j start (the top-level directory is referred to as NEO4J_HOME). After that, you can launch your web browser and start an interactive console called Neo4j Browser (it’s installed by default with Neo4j Server).

To access the Neo4j Browser, go to **http://localhost:7474/browser/** in your web browser and sign in with the default login and password (neo4j for both).

Once you’ve signed in, change the password. Then sign in with your new password to establish a connection with the database server.

The Neo4j Browser has an interactive console with a number of commands (: play start, :play concepts, and :play cypher). You can take a training tour and learn more about how to use the Neo4j database, check out sample graphs (such as the Movie Graph), and examine the state of the active database.

Now you have the toolkit for building an email targeting system.

Step 3: Data Modelling

Before you start modelling your data, spend some time analyzing the business purpose of your email targeting system. Such systems are used to offer the most relevant products or services to customers, so marketers and analysts need to monitor customer behavior to launch efficient email marketing campaigns.

Our email targeting system is going to have the following entities (with attributes in parentheses):

Category (title)
Product (title, description, price, availability, shippability)
Customer (name, email, registration date)
Promotional Offer (type, content)

In our graph database, each of these entities is going to have nodes with respective labels. All entities will be connected via relationships (for the sake of simplicity, we’re going to consider only relationships between two entities). Note that we’re using the singular naming for all entities even though one-to-many connections, which are commonly used in relational databases, are also possible. Also, just like entities, some relationships in our model will have properties. Let’s write these properties in parentheses.

So here’s what we’ve got:

Product is_in Category
Customer added_to_wish_list Product
Customer bought Product
Customer viewed (clicks_count) Product

It doesn’t matter where the information about clicks comes from; let’s just assume we have this data.

Promotional Offer used_to_promote Product

Note that in Neo4j, there’s no need to model bidirectional relationships (such as Product is_in Category and Category has_many Product). Graph databases allow us to follow edges in both directions.

And… that’s it.

Modeling entities and relationships in a graph database are that simple and intuitive, as we don’t need to switch from a logical model (how entities are connected from the perspective of a task we need to solve) to a physical model (how we store data in our database). It’s also easy to add, modify, or delete new entities and relationships in a graph database without bothering with foreign keys (as in relational databases) or links (as in NoSQL databases).

That’s an amazing advantage of graph databases.

Step 4: Working with the database

Now it’s time to fill our Neo4j database according to the model we defined in the previous step.

There are two ways to do it:

Use Cypher, a declarative language like SQL that has distinctive semantics and allows you to write flexible and easy-to-read queries. Cypher syntax emphasizes directions in relationships between entities. Recently, Cypher became an open-source project that’s maintained and upgraded by a community of contributors.

For the sake of convenience, we’re going to add nodes and relationships step by step. First, let’s introduce Categories and Products :

CREATE (smartphones:Category {title: 'Smartphones'}), 
(notebooks:Category {title: 'Notebooks'}), 
(cameras:Category {title: 'Cameras'})

// Smartphones
CREATE (sony_xperia_z22:Product {title: 'Sony Experia Z22', price: 765.00, shippability: true, availability: true})
CREATE (samsung_galaxy_s8:Product {title: 'Samsung Galaxy S8', price: 784.00, shippability: true, availability: true})
CREATE (sony_xperia_xa1:Product {title: 'Sony Xperia XA1 Dual G3112', price: 229.50, shippability: true, availability: false})
CREATE (iphone_8:Product {title: 'Apple iPhone 8 Plus 64GB', price: 874.20, shippability: true, availability: false})
CREATE (xiaomi_mi_mix_2:Product {title: 'Xiaomi Mi Mix 2', price: 420.87, shippability: true, availability: true})
CREATE (huawei_p8:Product {title: 'Huawei P8 Lite', price: 191.00, shippability: true, availability: true})

MERGE (sony_xperia_z22)-[:IS_IN]->(smartphones)
MERGE (samsung_galaxy_s8)-[:IS_IN]->(smartphones)
MERGE (sony_xperia_xa1)-[:IS_IN]->(smartphones)
MERGE (iphone_8)-[:IS_IN]->(smartphones)
MERGE (xiaomi_mi_mix_2)-[:IS_IN]->(smartphones)
MERGE (huawei_p8)-[:IS_IN]->(smartphones)

// Notebooks
CREATE (acer_swift_3:Product {title: 'Acer Swift 3 SF314-51-34TX', price: 595.00, shippability: true, availability: false})
CREATE (hp_pro_book:Product {title: 'HP ProBook 440 G4', price: 771.30, shippability: true, availability: true})
CREATE (dell_inspiron_15:Product {title: 'Dell Inspiron 15 7577', price: 1477.50, shippability: true, availability: true})
CREATE (apple_macbook:Product {title: "Apple MacBook A1534 12' Rose Gold", price: 1293.00, shippability: false, availability: true})

MERGE (acer_swift_3)-[:IS_IN]->(notebooks)
MERGE (hp_pro_book)-[:IS_IN]->(notebooks)
MERGE (dell_inspiron_15)-[:IS_IN]->(notebooks)
MERGE (apple_macbook)-[:IS_IN]->(notebooks)

// Cameras
CREATE (canon_eos_6d:Product {title: 'Canon EOS 6D Mark II Body', price: 1794.00, shippability: true, availability: false})
CREATE (nikon_d7500:Product {title: 'Nikon D7500 Kit 18-105mm VR', price: 1612.35, shippability: true, availability: true})

MERGE (canon_eos_6d)-[:IS_IN]->(cameras)
MERGE (nikon_d7500)-[:IS_IN]->(cameras)

Now we should add customers and establish relationships between them and the products in our database (this part is a continuation of the previous query):

// Customers
CREATE (joe:Customer {name: 'Joe Baxton', email: 'joeee_baxton@example.com', age: 25})
CREATE (daniel:Customer {name: 'Daniel Johnston', email: 'dan_j@example.com', age: 31})
CREATE (alex:Customer {name: 'Alex McGyver', email: 'mcgalex@example.com', age: 22})
CREATE (alisson:Customer {name: 'Allison York', email: 'ally_york1@example.com', age: 24})

MERGE (joe)-[:VIEWED {views_count: 15}]->(nikon_d7500)
MERGE (joe)-[:ADDED_TO_WISH_LIST]->(iphone_8)
MERGE (joe)-[:BOUGHT]->(apple_macbook)

MERGE(daniel)-[:VIEWED {views_count: 10}]->(sony_xperia_z22)
MERGE(daniel)-[:VIEWED {views_count: 20}]->(dell_inspiron_15)
MERGE(daniel)-[:ADDED_TO_WISH_LIST]->(dell_inspiron_15)

MERGE(alex)-[:VIEWED {views_count: 20}]->(canon_eos_6d)
MERGE(alex)-[:ADDED_TO_WISH_LIST]->(sony_xperia_xa1)
MERGE(alex)-[:ADDED_TO_WISH_LIST]->(nikon_d7500)
MERGE(alex)-[:BOUGHT]->(xiaomi_mi_mix_2)

MERGE(alisson)-[:ADDED_TO_WISH_LIST]->(acer_swift_3)
MERGE(alisson)-[:ADDED_TO_WISH_LIST]->(hp_pro_book)
MERGE(alisson)-[:BOUGHT]->(huawei_p8)
MERGE(alisson)-[:BOUGHT]->(sony_xperia_xa1);

Now the database contains all necessary entities and relationships. As you can see, Cypher is so declarative that you can guess exactly what every piece of code does.

💡 To visualize the graph, execute the MATCH (n) RETURN n query, which returns all nodes in our graph. If everything is correct, you’ll get this graph:

We can use this graph for multiple use cases that we need for an email targeting system. Here we can see couple of examples how can we retrieve the specific data based on the use case we need.

Example 1: To determine customer preferences

Suppose we need to learn the preferences of our customers to create a promotional offer for a specific product category, such as notebooks. First, Neo4j allows us to quickly obtain a list of notebooks that customers have viewed or added to their wish lists. We can use this code to select all such notebooks:

MATCH (:Customer)-[:ADDED_TO_WISH_LIST|:VIEWED]->(notebook:Product)-[:IS_IN]->(:Category {title: 'Notebooks'})
RETURN notebook;

Now that we have a list of notebooks, we can easily include them in a promotional offer. Let’s make a few modifications to the code above:

CREATE(offer:PromotionalOffer {type: 'discount_offer', content: 'Notebooks discount offer...'})
WITH offer
MATCH (:Customer)-[:ADDED_TO_WISH_LIST|:VIEWED]->(notebook:Product)-[:IS_IN]->(:Category {title: 'Notebooks'})
MERGE(offer)-[:USED_TO_PROMOTE]->(notebook);

We can track the changes in the graph with the following query:

MATCH (offer:PromotionalOffer)-[:USED_TO_PROMOTE]->(product:Product) 
RETURN offer, product;

Example 2: Building a recommendation system

Imagine we want to recommend products to Alex McGyver according to his interests. Neo4j allows us to easily track the products Alex is interested in and find other customers who also have expressed interest in these products. Afterward, we can check out these customers’ preferences and suggest new products to Alex.

First, let’s take a look at all customers and the products they’ve viewed, added to their wish lists, and bought:

MATCH (customer:Customer)-->(product:Product) 
RETURN customer, product;

As you can see, Alex has two touch points with other customers: the Sony Xperia XA1 Dual G3112 (purchased by Allison York) and the Nikon D7500 Kit 18–105mm VR (viewed by Joe Baxton). Therefore, in this particular case, our product recommendation system should offer Alex those products that Allison and Joe are interested in (but not the products Alex is also interested in).

We can implement this simple recommendation system with the help of the following query:

MATCH (:Customer {name: 'Alex McGyver'})-->(product:Product)<--(customer:Customer)
MATCH (customer)-->(customer_product:Product)
WHERE (customer_product <> product)
RETURN customer, customer_product;

We can further improve this recommendation system by adding new conditions, but the takeaway is that Neo4j helps you build such systems quickly and easily.

Challenges

Here are some of the challenges faced by us and you may face while using neo4j as the main purpose of your application or project.

Connecting with Mongo DB

The existing applications which used Mongo DB as their database wants to improve some part of the system with help of Neo4j and have to connect the data in Mongo with neo4j. Neo4j provides 2 to 3 ways to access the mongo data and import it into neo4j, but actually, it was a challenging one to execute.
Some methods have been deprecated or some are not working properly because of runtime errors and query errors.
We can connect through Python Drivers or using Docker which is the advanced way for one who starts using the Neo4j.

APOC Procedures

Apoc-procedures not working APOC stands for Awesome Procedures On Cypher. Before APOC’s release, developers needed to write their own procedures and functions for common functionality that Cypher or the Neo4j database had not yet implemented for support. Each developer might write his own version of these functions, causing a lot of duplication.

There are many procedures available, but many of that not work properly and face issues while using them in neo4j. The APOC procedures which are deprecated are not removed properly in the browser and also do not provide the proper alternative for the deprecated one. This was a popular issue faced by many of Neo4j developers and we have to get more knowledge from experience to use the procedures in our system.

Strengths

Here below are some advantages of using Neo4j :

Performance

Graph databases as well as Neo4j provide much better performance when it comes to querying deeply connected data that has many relationships expressed with complex joins. In relational databases, join-intensive query performance deteriorates when the dataset gets bigger. However, when using graph databases, the performance stays relatively constant even with very large datasets. This is because in the graph data model, the query will check only the part of the graph that will be traversed by the query and not the whole graph.

Flexibility

The graph data model is more natural. It has no impedance mismatch and is whiteboard friendly. This means you can use the language of nodes, relationships, and properties to describe the application domain instead of using complex models such as UML. Then this graph model is directly mapped and implemented in the database. This friendly data model used in the graph databases allows developers to be more productive and reduce project risk. Since the graph model is flexible, you can start with a small model and improve it in the future easily by adding more nodes and relationships with fewer migration and maintenance overhead.

Powerful Query Model

The graph query model is so intuitive and makes it very suitable for applications with object-oriented, semi-structured, and network-like data. The graph model is also very natural for expressing graph-related problems such as path-finding problems. Using this query model, you can write complex high-performance traversals that can be beneficial in many use cases.

In addition to the above advantages of the graph databases, Neo4j also provides the below advantages.

Easy to Learn Query Language

Neo4j provides a powerful traversal framework using an easy-to-learn query language called Cypher. Cypher is a declarative query language designed to be an efficient and human-readable language.

ACID Compliant

Neo4j is compliant with the ACID properties (Atomicity, Consistency, Isolation, Durability) and provides full transaction support.

Weaknesses

Neo4j has the below main weaknesses:

Scalability

Neo4j supports HA master-slave clusters that can linearly scale reads where slaves can share the read load. As for the write load, only the master instance in the cluster can handle it. Other slave instances can still receive the write requests from clients but then these requests will be forwarded to the master node. Therefore, writing to the master instance is faster than writing to a slave instance. This means that Neo4j doesn’t scale writes very well and in case of exceptionally high write loads, only vertical scaling of the master instance is possible. Although it is possible to implement some sharding logic in the client application to distribute the data across a number of servers, however, the sharding logic is still not natively supported by Neo4j. This is because sharding the graph is a near-impossible or NP-hard mathematical problem. In general, sharing the data on the client side depends on the graph structure. If the graph has clear boundaries, then it can be shared easily otherwise it can be really difficult. Additionally, it is complicated to share a densely connected graph.

Storage

Neo4j has some upper bound limit for the graph size and can support single graphs having tens of billions of nodes, relationships, and properties. The current Neo4j version supports up to around 34 billion nodes and relationships and around 274 billion properties. This is quite enough for large graphs of Facebook with similar network graph sizes. These storage constraints don’t pose any limitations in practice since only big businesses such as google can push these limits and these limits were set for storage optimizations and can be increased in future versions.

No date data type support support

Neo4j doesn’t have internal support for date data type but this can be overcome using different methods such as storing the Epoch Linux long values instead.

References:

Author: Rajesh, AI Research Intern