Though open-source software for managing data has been the standard within technology industries for years, it is increasingly used in diversified market sectors, from consulting to healthcare and retail. Organizational size doesn’t seem to matter when choosing to use open source. Open-source data technologies can be successfully applied whether a company has fewer than 100 or more than 1,000 employees.
2022 State of Open Source Report
The 2022 State of Open Source Report, created in partnership between OpenLogic and the Open Source Initiative, analyzed open source usage and market trends. The objective was to find out which open-source data technologies organizations were using, why they were chosen, and what challenges each technology presented.
Though the most significant number of respondents worked in tech, notable as the sector’s data-driven focus on machine learning compounds by more than 38% annually, various industries were represented in the answers.
The following is an overview of the study’s results, compiled from 2,660 respondents. To be included, each respondent had to use open-source data technology within their organization. The study also managed to gather data from a near-even distribution of small, medium, and large companies in addition to the previously mentioned industrial variety.
One final important note about the demographics of the study is that nearly 53% of the respondents operated in North America. The other 47% were scattered around the globe.
Top Open-Source Data Technologies
In the survey, respondents were asked which open-source data technologies they used in their organization. Some companies used more than one, which is why the total number doesn’t add up to 100%. Here are the top ten choices, ranked in order of usage:
- MySQL (35%)
- PostgreSQL (33%)
- MongoDB (28%)
- MariaDB (21%)
- Cassandra (19%)
- CouchDB (18%)
- Apache Kafka (18%)
- Redis (16%)
- ElasticSearch (15%)
- Apache Flink (14%)
It is important to note that while this is a list of the most used open-source data technologies, two of the selections listed above do not meet the strictest definition of “open source.” MongoDB and ElasticSearch have changed their licensing to alternatives incompatible with the open-source definition, although MongoDB continues to offer a freemium open-source option. However, the State of Open Source Report continues to track their usage, along with CockroachDB, although it is no longer open source (it still clocked in at #12, used by 13% of the respondents).
Important Factors in Picking
How are these organizations choosing which open-source data technology to use? The top reasons for a particular selection were security and patches (26%), followed closely by features and functionality (25%).
Rounding out the top five reasons for choosing a specific data technology were:
- ●Level of proficiency and experience (22%)
- ●Licensing cost (18%)
- ●Enterprise technical support (17%)
Interestingly, the top reason for choice varies when drilling into industry-specific responses—for example, licensing costs significantly impacted organizations’ decisions in education and research. At the same time, technology companies were more concerned about security and patches. Businesses in manufacturing saw the need for enterprise technical support as most important.
Challenges with Open-Source Technology
We’ve covered which open-source data technologies organizations choose and why. But, more importantly, what are the outcomes associated with the organizational picks?
The top challenges faced by these users are:
- ●Installation, upgrades, and configuration issues (52%)
- ●Keeping up with updates and patches (47%)
- ●Personnel experience and proficiency (46%)
- ●Scalability issues (44%)
These were followed by distributed computing and backups and recovery, 36% and 34%, respectively.
As seen in the selection of an open-source data technology, the top challenges also vary when examining them at the sector level. Healthcare and pharmaceutical industries endure the difficulty of not having enough personnel (30%), while retail industries suffer from both keeping up with updates and patches (62%) and scalability issues (also 62%).
In addition, other anomalies of note were spotted in examining the data. When looking at large companies with over 1,000 employees, keeping up with updates and patches (51%) becomes more critical than installation, upgrades, and configuration (49%).
Even more interesting is that library dependency management is less of a challenge in North America, with only 37% of respondents listing it as a challenge, ranking it fifth on their list of significant support challenges. However, Middle East respondents listed library dependency management as their second-highest challenge at 44%.
Conclusion
The 2022 State of Open Source Report is useful for organizational IT managers and open source developers alike. However, it reveals only a portion of the story in hindsight rather than providing a forward-looking perspective.
As data becomes the prime currency of the future, surpassing commodities and even the dollar itself, the need for diverse, adaptive, and accessible open-source data technologies will be more critical than ever. And, despite a few favorites, the top ten list is far from set in stone. New entrants will challenge incumbents, but even for existing open-source infrastructure, maintaining innovation and adaptability to customer needs will make the race to the top a close one.