Data Engineering

In all AI and Analytics applications, there is a layer of Data Engineering that needs to be done so that the Data Science gives meaningful answers, and avoids the “garbage in, garbage out” problem.

Whether we are using a data-lake or data-warehouse model, data feeds need to be configured, and data is collected and stored. Below are some of the areas we work in:

  • Data Verification: this uses statistical, semantic and other techniques to ensure that onboarded data is wrangled and cleansed. Our mathematicians and computer scientists use a variety of tools and approaches depending on the level of structure in the data.
  • Big Data Engineering: this focuses on the scaling of data and the issues that applications encounter with large amounts of data. We have experience of working with large amounts of data including sensor-derived data and large corpuses of text data.
  • Infrastructure: we work flexibly with computing infrastructure according to the use case. Cloud Infrastructure is very useful for starting projects where we will use Open Source tools and data. It It is also flexible, so we can scale and test tools rapidly and cheaply. We have a number of experts in house and also work with partners to cover all domains.
  • Cloud Computing: this incorporates flexible sets of tools for rapidly and effectively experimenting with scaled computing environments. Depending on the use case, our experts will work either entirely in the Cloud or instantiate an on-premises workflow.
  • On-Premises Hardware/Cluster computing: this is very useful for running known high-intensity types of base-load computing, or when we need to use sensitive data. We have a long track record of developing and using high-performance clusters, and of working with sensitive data for a range of military, government and commercial customers.
Optasense Data Warehouse