So you want to build a database?

Occasionally, researchers who are familiar with Babase may express interest in building a "Babase-like" database for their own study population. If this is true for you, thank you! It's very flattering to have our work be so appreciated.

A Few Words Of Advice

To assist you in finding a solution that works best for you and your study population, we have a few words of advice:

If you’ve read the above and still feel that a custom database like Babase is right for you, we are happy to share our technical specifications (including the pocket reference), best practices, and our source code. Good luck!

Notes From a Database Developer

Choosing Software

There is nothing wrong with choosing some software and building something yourself but if you're going to hire a developer, or take advantage of in house talent -- whatever that means, here are some things to think about:

Developing

Open Source Databases -- Can I use one without hiring someone and other questions answered

Using Open Source can keep costs down but it is not a panacea. Critics are quick to point out that total cost of ownership (TCO) can be high, because it costs to hire people for development and maintenance. But what really matters is not TCO but return on investment (ROI). If it costs twice as much but delivers 10 times a much you come out ahead, assuming that you actually need or want some of the extra benefit.

Open Source software tends to follow the Unix philosophy; do only one thing and do it well. Consequently, you can't just install a database engine and be done with it. Most Open Source database products come with only the most rudimentary tools for interacting with the database. You will want to also install a front end. These can be web-based, allowing interaction via a web browser, or programs that install on your desktop. There are many choices for both database engine and front end, which can be intimidating. The good news is twofold. There are some recommendations below and, especially when it comes to a front end, you don't have to limit yourself to a single choice. Install multiple front-ends and see which you like.

I mentioned web-based database access, which raises the question "If I use an Open Source database must I have a server and must I make my database accessible on the web?" The answer is no. Not unless you want to. Everything can always be installed on a single machine. Even if the front-end you wish to use is designed to be used from within a web browser, and must access your database content via a network connection, all of that networking can take place within the network that is internal to every Operating System. (In fact, this tends to be the default after software install since it ensures your data is secure and not accessible anywhere other than from the machine you're sitting in front of.) And most Open Source database products are Operating System agnostic. Meaning that no matter your choice of operating system you can install and run everything on the machine you're using to read this. If you later decide you want to make your database web-accessible, or talk over the Internet directly to the database engine, you can get a Internet server or cloud provider and move your system.

Developers who develop for Open Source databases will likely use other Open Source tools, and want to develop on an Open Source operating system with full access to the entire Open Source ecosystem. While it's certainly possible to develop on one OS and deploy for use on another there can be advantages to developing and deploying on an Open Source Operating System even when you and your institution have little Open Source experience. Before we get to that, the downside of having a single machine used for all purposes, development, testing, production use, etc., is that it can be easier for a developer to break something used in production. The upsides are that there's much less to maintain on an ongoing basis when there's only a single machine, and this approach ensures that there's always a working development environment available to new developers who may not be familiar with the system. This lets a new developer "start small" since all the development tools should be ready for them to use.

Your Open Source database has a community which provides support! Usually, minimally, via written documentation, email, and chat. Very often there's also a wiki. Look for support channels on your database's website. Support is important! It probably pays to try out the support channels, at least regards the database itself, before making the final software choice.

Caution: If you do decide to go the Open Source route you will probably have to read some instructions. Pressing an "Install" button and poking the result with your mouse pointer isn't going to cut it.

The Components of a Complete Open Source Database System

Aside from a database engine, many people want to interface with their database using a GUI. Here's a few choices, and the databases they can access. Don't be afraid to choose more than one.

DBBeaver (PostgreSQL, MariaDB, SQLite, and more) (Installs on the desktop)

PgAdmin4 (PostgreSQL) (Installs on a web server or the desktop)

OmniDB (PostgreSQL and more, I forget) (Installs on a web server or installs on your desktop and will serve pages only to your desktop)

Adminer (Postgresql and more, I forget) (Requires a web server, which in turn could be installed on your desktop but then you have to choose a web server and install it)

Wikipedia has a list, if you want more choices.

Database Assessments

Choosing a database involves multiple considerations, not the least of which is the presence of a support network -- whether provided from within your institution or by the outside world. These are the more-or-less full-featured database products that come to mind. Others (I'm looking at you MS Access) tend to be products that people tend to "grow out of", meaning they are sold to "get you in the door" and tend to require a complete do-over when it comes time to up-sell you the full-featured product.

There are many more products, not mentioned here. And many other types of databases and related data-stores. These are relational databases, which is, more or less, the "traditional" sort of database. A relational database is probably the kind of database you want; it allows for both in-database and out-of-database data analysis.

What database product is best for you is something you have to decide. "Best" is relative and always involves tradeoffs.

Open Source Databases

PostgreSQL

This tends to be my database of choice. Full-featured, well documented, with a supportive community. New features are well thought-out and reviewed for compatibility with the (numerous) SQL standards. Extensive 3rd party extensions are available, including PostGIS which supports Geo-spacial data and analysis.

Postgres' SQL dialect is similar to that of Oracle, and there are many similarities in their feature sets.

MySQL

Originally a thin "SQL skin" over existing database "back-end" engines, it developed into a full-featured database. Although still widely used, it was abandoned by the larger Open Source Operating System distributors after purchase by Oracle. It was then re-born as MariaDB.

MariaDB

Forked (copied) from MySQL, it is the other notable, widely-used, full-featured relational Open Source database. Not as well documented as PostgreSQL. Like MySQL, the documentation is not Open Source so you are reliant on the parent company for access to the documentation. Full featured and widely used, but tends to have a lot of odd "corner-cases" in both use and administration. The syntax of it's SQL dialect employs less common parts of the SQL standards.

SQLite

Widely used but less full-featured than the other Open Source databases. Very simple to use and administer. Limited in-database data-validation features.

Proprietary Databases

Oracle

Very full-featured. Expensive. While there are tools and companies that assist migration to PostgreSQL, institutions that use Oracle are usually "locked-in" due to dependencies on business products with licenses that require Oracle.

Oracle-the-company has a reputation for its inflexible contracts and aggressive pursuit of revenue.

Microsoft SQL Server

Full featured, with a SQL syntax similar to most other products. Microsoft tends to use its own vocabulary and integrate best with other Microsoft products.

Other Babase-like Databases

Disclaimer

We take no responsibility for any decisions you make based on the information we present. We don't claim to be right, or accurate, or anything else. All we're offering is free advice! :)

DIY_Database (last edited 2023-09-19 23:29:37 by KarlPinc)

Wiki content based upon work supported by the National Science Foundation under Grant Nos. 0323553 and 0323596. Any opinions, findings, conclusions or recommendations expressed in this material are those of the wiki contributor(s) and do not necessarily reflect the views of the National Science Foundation.