Differences between revisions 7 and 8
Revision 7 as of 2023-09-19 23:21:10
Size: 17282
Editor: KarlPinc
Comment: More about getting support
Revision 8 as of 2023-09-19 23:29:37
Size: 17642
Editor: KarlPinc
Comment: Backups
Deletions are marked like this. Additions are marked like this.
Line 89: Line 89:

 * Backups. You haven't forgotten about backups, have you? Backups means not just one copy. Multiple copies, taken at different times, duplicated and sent to multiple locations to protect in the event of complete, but local, disaster. And backups are not backups if you don't periodically test and make sure you can completely restore from each backup.

So you want to build a database?

Occasionally, researchers who are familiar with Babase may express interest in building a "Babase-like" database for their own study population. If this is true for you, thank you! It's very flattering to have our work be so appreciated.

A Few Words Of Advice

To assist you in finding a solution that works best for you and your study population, we have a few words of advice:

  • First, consider your most immediate and essential needs, not the larger vision of a database. Most likely, the most immediate needs would focus around a modest number of tables to start with: a table of study subjects, which would include all the basic biographical information for each one, and then several tables that would join with that one, which would contain multi-dimensional information for each subject, including biological sample info.

  • In other words, keep it as small as possible to begin with. Keeping it small, and adding to it only as you need, will help ensure that you end up in the long run with a database that suits your needs well.
  • Second, begin by investing in Microsoft Access, Filemaker Pro, or a similar off-the-shelf database program and start with that, rather than a custom database. This may sound counter-intuitive, but hiring a database programmer at the start of your database endeavor is likely to greatly increase the time investment you make in building the database, with no added benefit in terms of database design or usability.

  • This is because a database programmer won’t know how to build a database for you until you yourself have a very intimate and detailed understanding of what you want. The best way to get that is to build the first version of your database, the first essential tables, yourself. This will allow you to learn what works and what doesn’t, and exactly what you need.
  • In fact, you may find that you never need a custom database or a programmer — you may be able to meet all your needs this way, adding tables one at a time in an Access or Filemaker database. That would be by far the best outcome.
  • Third, for pre-tenure, tenure-track researchers, it is best not to invest too much time in building a custom database until you have tenure. Building a custom database is very time intensive. This is why so many researchers don’t have them. It’s a huge investment in something that will not move you towards the tenure line at all. Better to save the bulk of it for after tenure.

If you’ve read the above and still feel that a custom database like Babase is right for you, we are happy to share our technical specifications (including the pocket reference), best practices, and our source code. Good luck!

Notes From a Database Developer

Choosing Software

There is nothing wrong with choosing some software and building something yourself but if you're going to hire a developer, or take advantage of in house talent -- whatever that means, here are some things to think about:

  • Be prepared to accept some changes to the way you work and the software you use. (If you liked what you have already and your processes you wouldn't be hiring someone.)
  • You have hired an expert. Listen to them.
  • Don't buy something expensive and expect your developer to use it. Your developer will listen to your requirements, choose the software that best meets your needs, and work with you to develop the processes for interacting with your database.
  • Coordinate with your IT department early, perhaps even before searching for a developer. Your developer and your IT department, really, all your human IT resources, must all work together.
  • Your developer will not be there forever:
    • Documentation is key. If it's not documented you won't know how to use it and it's useless. Document:
      • What the system can do; features.
      • How the system is actually used; procedures.
      • How to maintain the system itself over time, not just the data. This should include how to maintain the entire software stack required to run the system, possibly down to the Operating System.
      It is unlikely that you will want to pay the developer to know everything about the way you work and write down the day-to-day procedures.
  • Most of your long-term cost (you are in it for the long-term, right?) is going to be in maintenance.
    • Of course it all depends on how long your new system is used, but typical estimates are 30% development, 70% maintenance.
    • Regression test suites and documentation cost up-front but lower maintenance costs.
    • The Operating System and software stack that supports your database will eventually need to be upgraded, and will need regular security updates.
    • Keep the human-facing interfaces simple. Writing programs to interact with humans is expensive; 90% of it is detecting errors, most human-caused, and reporting them in a way a human can understand. If you can do it without a GUI, do it without a GUI.
  • 50% of development costs can go towards design.
    • Without developing requirements you can't design.
    • Without a design you don't know if your requirements are met.
    • You don't see anything "real" until you implement something.
    • You don't know what it's going to cost to implement until you have a design!
  • Plan ahead for data sharing.
    • If the software you plan to use to access and manipulate the database runs only on the desktop then everyone who needs to use it must install it on their desktop (or laptop, etc.).
    • Any software tools your system requires that are not installed on a single Internet-accessible server may need to work on multiple versions/releases of possibly different Operating Systems and co-exist with various versions/releases of whatever other software must also be installed, or happens to be installed.

  • Development and deployment can involve a lot of different technologies and skill sets. Limiting the number of these is desirable but difficult.
    • Producing a "product" requires at least some, and probably most, of the following:
      • database design skills
      • user interface design skills
      • knowledge of your particular problem domain (animal research, data collection and entry, etc.)
      • database application development skills (knowledge of SQL, etc.)
      • programming skills, likely in multiple languages/technologies (database access libraries, web or application development frameworks, in-database validation languages, out-of-database programming languages for both application development and data transformation to assist with moving data into your new system, languages that run in the browser, etc.)
      • skills with programming tools (editors; debuggers; revision control systems; regression testing tools; build automation utilities, servers, and continuous integration systems)
      • good writing and communication skills
      • user interface development tool kits (HTML + CSS, or GUI libraries)
      • knowledge of best security practices for application development
    • Deploying a "product" requires at least some, and probably most, of the following:
      • database administration, tuning, and configuration
      • operating system administration, including security awareness
      • networking and network security
      • web server configuration
  • Involving more people slows things down. (See Fred Brooks: "The Mythical Man Month")
  • Wikis make information easily available to everyone who needs it. (We like ours.)
  • Think hard about ensuring data integrity, including domain-specific rules like "You must be female to give birth".
  • Backups. You haven't forgotten about backups, have you? Backups means not just one copy. Multiple copies, taken at different times, duplicated and sent to multiple locations to protect in the event of complete, but local, disaster. And backups are not backups if you don't periodically test and make sure you can completely restore from each backup.

Developing

  • Work in as small a universe as possible. Break up large projects.
    • Don't try to design everything before implementing anything.
    • Do try to build something that's complete enough to be useful.
    • Getting something working tells you what you need to do next, what needs fixing or extending.
  • Regression test suites keep bugs down.
  • Leverage 3rd party software as much as possible.
    • Why write a menu in software when a wiki will do the same thing?
    • You don't have to pay the maintenance costs of Open Source software.
    • Consider in-database data integrity validation. It lets you use 3rd-party software for database updates and other data maintenance.
  • Revision control is essential. It lets you:
    • Determine when bugs were introduced, including bugs that may result in corrupted data.
    • Reap multiple significant benefits.

Open Source Databases -- Can I use one without hiring someone and other questions answered

Using Open Source can keep costs down but it is not a panacea. Critics are quick to point out that total cost of ownership (TCO) can be high, because it costs to hire people for development and maintenance. But what really matters is not TCO but return on investment (ROI). If it costs twice as much but delivers 10 times a much you come out ahead, assuming that you actually need or want some of the extra benefit.

Open Source software tends to follow the Unix philosophy; do only one thing and do it well. Consequently, you can't just install a database engine and be done with it. Most Open Source database products come with only the most rudimentary tools for interacting with the database. You will want to also install a front end. These can be web-based, allowing interaction via a web browser, or programs that install on your desktop. There are many choices for both database engine and front end, which can be intimidating. The good news is twofold. There are some recommendations below and, especially when it comes to a front end, you don't have to limit yourself to a single choice. Install multiple front-ends and see which you like.

I mentioned web-based database access, which raises the question "If I use an Open Source database must I have a server and must I make my database accessible on the web?" The answer is no. Not unless you want to. Everything can always be installed on a single machine. Even if the front-end you wish to use is designed to be used from within a web browser, and must access your database content via a network connection, all of that networking can take place within the network that is internal to every Operating System. (In fact, this tends to be the default after software install since it ensures your data is secure and not accessible anywhere other than from the machine you're sitting in front of.) And most Open Source database products are Operating System agnostic. Meaning that no matter your choice of operating system you can install and run everything on the machine you're using to read this. If you later decide you want to make your database web-accessible, or talk over the Internet directly to the database engine, you can get a Internet server or cloud provider and move your system.

Developers who develop for Open Source databases will likely use other Open Source tools, and want to develop on an Open Source operating system with full access to the entire Open Source ecosystem. While it's certainly possible to develop on one OS and deploy for use on another there can be advantages to developing and deploying on an Open Source Operating System even when you and your institution have little Open Source experience. Before we get to that, the downside of having a single machine used for all purposes, development, testing, production use, etc., is that it can be easier for a developer to break something used in production. The upsides are that there's much less to maintain on an ongoing basis when there's only a single machine, and this approach ensures that there's always a working development environment available to new developers who may not be familiar with the system. This lets a new developer "start small" since all the development tools should be ready for them to use.

Your Open Source database has a community which provides support! Usually, minimally, via written documentation, email, and chat. Very often there's also a wiki. Look for support channels on your database's website. Support is important! It probably pays to try out the support channels, at least regards the database itself, before making the final software choice.

Caution: If you do decide to go the Open Source route you will probably have to read some instructions. Pressing an "Install" button and poking the result with your mouse pointer isn't going to cut it.

The Components of a Complete Open Source Database System

Aside from a database engine, many people want to interface with their database using a GUI. Here's a few choices, and the databases they can access. Don't be afraid to choose more than one.

DBBeaver (PostgreSQL, MariaDB, SQLite, and more) (Installs on the desktop)

PgAdmin4 (PostgreSQL) (Installs on a web server or the desktop)

OmniDB (PostgreSQL and more, I forget) (Installs on a web server or installs on your desktop and will serve pages only to your desktop)

Adminer (Postgresql and more, I forget) (Requires a web server, which in turn could be installed on your desktop but then you have to choose a web server and install it)

Wikipedia has a list, if you want more choices.

Database Assessments

Choosing a database involves multiple considerations, not the least of which is the presence of a support network -- whether provided from within your institution or by the outside world. These are the more-or-less full-featured database products that come to mind. Others (I'm looking at you MS Access) tend to be products that people tend to "grow out of", meaning they are sold to "get you in the door" and tend to require a complete do-over when it comes time to up-sell you the full-featured product.

There are many more products, not mentioned here. And many other types of databases and related data-stores. These are relational databases, which is, more or less, the "traditional" sort of database. A relational database is probably the kind of database you want; it allows for both in-database and out-of-database data analysis.

What database product is best for you is something you have to decide. "Best" is relative and always involves tradeoffs.

Open Source Databases

PostgreSQL

This tends to be my database of choice. Full-featured, well documented, with a supportive community. New features are well thought-out and reviewed for compatibility with the (numerous) SQL standards. Extensive 3rd party extensions are available, including PostGIS which supports Geo-spacial data and analysis.

Postgres' SQL dialect is similar to that of Oracle, and there are many similarities in their feature sets.

MySQL

Originally a thin "SQL skin" over existing database "back-end" engines, it developed into a full-featured database. Although still widely used, it was abandoned by the larger Open Source Operating System distributors after purchase by Oracle. It was then re-born as MariaDB.

MariaDB

Forked (copied) from MySQL, it is the other notable, widely-used, full-featured relational Open Source database. Not as well documented as PostgreSQL. Like MySQL, the documentation is not Open Source so you are reliant on the parent company for access to the documentation. Full featured and widely used, but tends to have a lot of odd "corner-cases" in both use and administration. The syntax of it's SQL dialect employs less common parts of the SQL standards.

SQLite

Widely used but less full-featured than the other Open Source databases. Very simple to use and administer. Limited in-database data-validation features.

Proprietary Databases

Oracle

Very full-featured. Expensive. While there are tools and companies that assist migration to PostgreSQL, institutions that use Oracle are usually "locked-in" due to dependencies on business products with licenses that require Oracle.

Oracle-the-company has a reputation for its inflexible contracts and aggressive pursuit of revenue.

Microsoft SQL Server

Full featured, with a SQL syntax similar to most other products. Microsoft tends to use its own vocabulary and integrate best with other Microsoft products.

Other Babase-like Databases

Disclaimer

We take no responsibility for any decisions you make based on the information we present. We don't claim to be right, or accurate, or anything else. All we're offering is free advice! :)

DIY_Database (last edited 2023-09-19 23:29:37 by KarlPinc)

Wiki content based upon work supported by the National Science Foundation under Grant Nos. 0323553 and 0323596. Any opinions, findings, conclusions or recommendations expressed in this material are those of the wiki contributor(s) and do not necessarily reflect the views of the National Science Foundation.