--- title: "Pre Release database Cleanup" output: html_vignette vignette: > %\VignetteIndexEntry{Pre Release database Cleanup} %\VignetteEngine{knitr::rmarkdown} --- This is an quick script for cleaning up the database. For further documentation see the README.rmd file in the main `qaqc` folder. **Step 1: set up an outdir and a connection to bety** The outdir is where temporary editing files will be written, and where a backup of bety will be stored. ```{r, eval=FALSE} con <- PEcAn.DB::db.open( params = list( driver = "PostgreSQL", dbname = "bety", password = "bety", host = "psql-pecan.bu.edu", user = "bety") ) outdir<-as.character(getwd()) #set as prefered out directory bety_backup_directory<-as.character(getwd()) #set as prefered directory for bety backup. options(scipen=999) #To view full id values ``` **Step 2: Back up bety** Before any major deletion processes, it makes sense to make a back-up version of the database. Don't skip this step. ```{r, eval=FALSE} system('TODAY=$( date +"%d" )') backup_dir<-paste('pg_dump -U bety -d bety | gzip -9 > ', bety_backup_directory,'/bety-pre-culling${TODAY}.sql.gz', sep="") system(backup_dir) ``` **Step 3: find all of the entries that should be deleted *** ```{r, eval=FALSE} formats<-find_formats_without_inputs(con=con, created_before = "2016-01-01", user_id = NULL, updated_after = "2016-01-01") #Ideally, date should be set to the date of the last release inputs<-find_inputs_without_formats(con=con, created_before = "2014-01-01",updated_after = NULL, user_id = NULL) ``` Since just a dump of every column can be hard to read, just choose the columns that are important. ```{r,eval=FALSE} column_names<-get_table_column_names(table = formats, con = con) column_names$formats column_names<-column_names$formats[column_names$formats %in% c("id","created_at", "name", "notes", "table_name")] column_names ``` ** Option 1: Edit an R object *** This is the most important step! Navigate to the written out table and *delete entries that should remain in the database*. ```{r, eval=FALSE} formats<-formats[colnames(formats) %in% column_names] #subset for easy viewing View(formats) formats<-formats[grep("test",formats$name),] View(formats) ``` ** Option 2: Edit a file ** This is also the most important step! Navigate to the written out table and *delete entries that should remain in the database*. If the tables are difficult to read, change what columns are retained by editing the "relevant_table_columns" parameter. ```{r, eval=FALSE} write_out_table(table = formats, outdir = outdir, relevant_table_columns = column_names, table_name = "formats") write_out_table(table = inputs,outdir = outdir, relevant_table_columns =c("id", "created_at", "name"), table_name = "inputs") ``` **Step 5: Cull all remaining entries in the files** This will delete all entries remaining in the file, and place a delete log in the outdir directory. *please do not run before editing either the table object or the "query_of" file* ```{r} ### Note: Do not run before the editing step ### ## From an R object ## #cull_database_entries(table=formats, outdir = outdir, con=con, table_name = "formats") ## From a file ## #cull_database_entries(outdir = outdir,file_name ='query_of_formats', con=con, table_name = "formats") #cull_database_entries(outdir = outdir,file_name ='query_of_inputs', con=con, table_name = "inputs") ```