Armed with this decision I set out to learn and build the MVP.
To start I had good knowledge in analytics with R and Python. I had built a couple of analytical dashboards. Since the core functionality was to provide with predictive analytics my default choice was Python.
After googling, I found out the Plotly Dash would be a good place to start. It is really easy to get dashboards ready but I soon realized that this would not let me had more control over the frontend interface and features like authentication, user-based data restriction, etc were available only in the enterprise edition. This lead to googling for the best framework to build web apps. After a few days of research, I went ahead with the MERN (MongoDB, ExpressJS, ReactJS, and NodeJS) stack for the application and integrate it with Python.
Things were initially very promising with
- Focus on Modularity
- Ease of getting started and a plethora of community support
- Various deployment options from serverless to virtual servers
- Flexible Schema with Mongodb with the ability to nest documents meant the same JSON can be used easily for database query, applying business logic and validation the backend and sending and retrieving data in the frontend.
However, things started to become hairy slowly after I started to work on the app. These are the things which you also need to look out for.
- CORS: This is the first thing you need to fix to get the development setup to work. This is mainly because the backend is serving from a different origin URL and the frontend with webpack is served from another URL. Most modern-day browsers will prevent you from doing this for security reasons. Solution: Setup CORS headers in the backend and also add a proxy to the react development server configuration.
- Time: The modularity offered by ReactJS is wonderful if you work in a team. For a lone developer, the time taken to build even a simple form or a table is high. What you would do with a simple form tag and input tags, would need you to first understand state management on input change or submitting requests.
- Third-Party Libraries: To get a minimal app working with different pages you will be soon forced to use other excellent libraries like React Router and Redux. React Router is used for navigating between pages and Redux to centralize the state management so that data from one component can be easily used by another. Their documentation would again go for several pages.
- Realtime: This is when things got messy. I needed realtime insights to be given to the user. This can be done in 3 ways
- 1. Long polling (the react repeatedly queries the backend for any updates)
- 2. HTTP/2 Server Push (a one-way server to client connection)
- 3. WebSockets (bi-directional connection). I went ahead with WebSockets for the following reasons. Adding a long polling check creates unnecessary overhead for both the frontend and backend. HTTP/2 is one-sided and when the connection drops you need to write additional code to manage reconnection and handle dropped packages. Additional setup in the server is also required, and finally lack of wider support. Websockets have been used for several years and it is a standard when it comes to realtime communication and finally, there is also widespread support. Being bidirectional hence changes it can also be used for chat feature when required.
- ReduxSaga: I was already using Redux for state-management So WebSocket needs to be integrated with redux. States in redux cannot be simply changed without proper dispatch of commands. When compared to a normal rest API which only gets a response after sending a request, the WebSockets can get inbound data anytime. This data needs to be stored and the dependant component also needs to be updated. Redux-Saga library seemed to be the best way to do this but there are very few good tutorials that speak about this.
- Themes: Similar to the Charts there are a few really good ones. Some integrate very easily with the Create React App (CRA) setup like Material UI, whereas some force you to eject from CRA like AntD. Even when they are set up correctly the ability to make them work well with the other third-party form library like react-hook-form even more difficult. Even integration with a simple Date Picker Component like reactdatepicker can be challenging.
NodeJS with ExpressJS
NodeJS with ExpressJS setup gives a high level of modularity and control with custom middlewares.
- Websocket: As mentioned earlier I decided to go ahead with WebSockets after completing the react setup. This meant that I cannot use the HTTP methods and validations provided by the ExpressJS. Not only that to use a single connection per client and also to maintain session affinity I needed to get all the code within the WebSocket connect-function. When compared to the vanilla WS library in NodeJS, the SocketIO Websocket library seemed to be the best option here it even can connect with clients behind firewalls. One problem with is however is you can test it only via a real front end and not testing the client as the WS/WSS protocol. This required me to keep both the frontend react and backend server to be running at the same time and repeatedly switching screens during development. This led me to GraphQL.
Graphql is a query API founded and released by Facebook. And has quickly garnered a lot of support recently. I was drawn into this for mainly 2 things
- Testing: The GraphIQL interface makes it easy to test your queries directly without the need to have additional software like Postman or Insomia. It also gives ready documentation to even help write those queries.
- Websocket Subscriptions: The Redux, Redux-Saga, SocketIO connection was getting messy to manage and maintain. GraphQL libraries like Apollo GraphQl not only manage state in the react app but also add real-time goodness. Thus leaving me with a single dependency than three.
- Schema Stitching: There are 2 critical components in GrapghQL schema and resolver. As your app grows you will be left with a lot of code in GraphQL code in a single file. Trying to split them into multiple manageable files was a lot harder and didn’t work as expected. They were very few tutorials on the same.
- React Crashes: Some time you might end with nested components that have their states and each making request to the server. One such example I had was with a MaterUI modal that displays a form. The modal gets displayed when the user clicks the add button on a MeterailUI data table. In this form, I also had a custom date picker which wraps the reactdatepicker library within the MaterialUI. Since this form needed to be accessed directly without the modal, it was created as a separate component that can make GraphQL requests. Unfrotualty this setup repeatedly crashed. It happens because the table components had unclosed state changes even after the modal gets opened in the UI. The modal also had state updates after the submit button is clicked however the UI returns to the table. Updates to states after the component has left the view cause React to crash. These are some issues that need to be handled if the state is not managed properly by multiple libraries.
MongoDB is an excellent choice when developing a web app. Being a schemaless database it helps in quick prototyping and pivoting. You can easily add or remove fields and also put the entire data as a single document with nested data. Not all the documents need to even have the same fields.
- Cost of Schemaless: When the boon becomes a bane. Working without a proper schema and making iterations on the same will force you to either rewrite the business logic or the frontend to handle conditions were the previous documents vary significantly from the later ones. This leads to unnecessary confusion just because the schema is not standardized.
- Mongoose to the Rescue: The above problem can be mitigated to an extent with a library called mongoose with comes with schema definition, validation, and middleware hooks. However, as with any ORM/ODM, not all the features that are possible with the Database are available. One such example creating a materialized view with the merge operator. Such a view is necessary when you need to speed up read queries. One way to circumvent this problem is to take the connection object from the Mongoose and use it directly to build an aggregation pipeline.
- Missing SQL: When it comes to querying, SQL is the most widely used syntax. Exploring further down the line I soon realized that when it comes to transactional data an SQL database fares much better. In such cases, MongoDB is mostly used to dump data only to be taken again and converted to some SQL for analysis.
After all the struggle to get stuff to work together, I switched my focus to the analytics part of the application. However, connecting Nodejs and Python was not as easy as I had thought. It can be done in the following ways
- Child process: The python script can be run as a child process within NodeJS. This forces the python to be tied up very closely with NodeJS and testing both separately becomes challenging.
- RestAPI: Run python flask server and use the URL in NodeJS. A separate database connection can be used to fetch the data and return only the analytics output or the NodeJS request should send all the required data along each time. I chose the former as it is easy to test and also the database was also on the same server.
- Messaging: Use Redis, RabbitMQ, AWS SQS, Google Pub-Sub, etc. The first 2 frameworks are opensource and free. The latter 2 are proprietory and comes with vendor lockin. Redis seemed to be a good option as it also can be used as an efficient cache for read-queries and session management. However, integrating Python and Nodejs with a publish and subscriber pattern was difficult and getting unnecessarily complex.
Option 2 rest was the better way to go. The problem however was the overall communication style. React ← (GraphQL) → Nodejs ← (HTTP) → Flask. Nodejs is asynchronous whereas python is synchronous. Also converting the data to JSON and parsing them and handling exceptions and errors on all these different points was even more challenging.
As of now, I was left with 4 different services to work and maintain as a single person
- React (Frontend)
- ExpressJS (Backend)
- Python Flask (Analytics Server)
- MongoDB (Database)
- Jupyter Notebook (for trying out snippets of code before copying them to Flask)
Previous Post :
Next Post :