In conversation with Ujjwal Kumar, lead data scientist, Akaike Technologies
1) What drew you towards data science out of so many other technologies?
I am an experienced professional with over five years of expertise in data science. During my final year, I became interested in exploring a variety of new technologies. Data science was particularly appealing to me due to its ability to solve complex challenges, which was otherwise deemed impossible.
Since then, it has become a buzz-word in the tech industry and I’ve grown increasingly passionate about automation and artificial intelligence. At my previous internship, I was fortunate enough to work under Akaike’s founder, Rahul Thota.
When I completed my internship, Rahul recommended that I join his new company, Akaike Technologies which was a three-member team at that time.The chance to work with a startup and gain exposure to different projects for multiple clients across various industries enticed me to take the leap.
My philosophy has remained the same since day one— to keep learning. In addition to executing my own tasks, I’m now managing a talented data science team. This allows me the opportunity to remain cognizant of company goals and objectives while guiding my team in finding the best solutions for each situation.
2. What are the common misconceptions about Big Data?
Everyone thinks Big Data is just 4-5 years old. Big Data isn’t a new concept; it has been around for over 20 years. In recent times, powerful tools and technology have enabled a shift towards open source solutions. This makes it more accessible for smaller companies to handle and analyze data.
While many believe that larger datasets produce better results, what’s more important is the quality of the data rather than quantity.
Businesses need to keep in mind that Big Data involves much more than technology. It’s about leveraging data-driven insights to inform business decisions and foster innovation. Furthermore, the real value of Big Data comes from utilizing it for analysis purposes, as opposed to just storage and collection.
Ultimately, Big Data can be very effective but it needs a sound strategy, appropriate tech stack and skilled professionals in order to be successfully utilized by businesses.
3. How do you draw insights from unstructured data?
The first step in a data science project is to gather all available sources of data. This includes both structured data such as text files and databases, and unstructured data such as images. The data should be processed, transforming it into standard IDs that are associated with relevant descriptions.
We then clean up the raw data, creating a new version of the dataset which will be more useful for analysis. Data science is an iterative process—we continually refine our process by revisiting it if we find a better approach.
In exploring the dataset, we examine what kind of data we have, how many visual sets can we identify from it and what measures can be taken to improve its quality. We also use hypothesis to validate our datasets and look out for anomalies. These insights enable us to further explore the dataset’s potentials.
4. Among millions of datasets, how can you identify the right data quickly?
Finding high-quality data from millions of datasets can be a challenging task, but at Akaike Technologies, we follow well-defined steps to speed up the process. It’s important to define your data requirements. Having a clear understanding of the data you need, such as the type of data, the format, and the source, will make it easier to filter out irrelevant datasets.
Next, you can use data catalogs and repositories like Data.gov, Kaggle, or Google’s Dataset Search, as a great starting point for finding high-quality data. These platforms usually have a search function and filters that allow you to quickly find datasets that meet your criteria.
This can be followed by Data APIs that allow you to programmatically access data from a variety of sources, including social media, websites, and databases.
Using APIs can save time compared to manually downloading data. Remember, not all data is created equal. When evaluating potential datasets, consider factors such as completeness, accuracy, timeliness, and consistency to ensure that the data is of high quality.
Finally, consult with experts in the field, such as data scientists or domain specialists, for recommendations on where to find high-quality data. These experts may also be able to help with the evaluation process. Keep in mind that finding high-quality data can be a time-consuming process, and it may require some trial and error. The key is to be patient and persistent in your search for the right data.
5. In 2023, what trends will dominate VisionAI?
Well, there are so many but to sum it up — 2023 will improvise on the innovations that were conceived from 2020 to 2022. The demand for real-time analysis and the growth of the Internet of Things have driven a trend toward edge AI, which deploys models directly on devices instead of cloud.
Computer vision is also increasing its role in healthcare, with 3D computer vision becoming more widely used in robotics, augmented and virtual reality, and industrial inspection.