Issue
I regularly have to reduce huge db sql dumps (more than 100gb) down to more manageable file sizes by removing unnecessary INSERT statements. I do that with the following script. I'm concerned that my script involves iterating multiple times through the source file, which is obviously computationally expensive. Is there a way to combine all my SED statements into one, so the source file only needs to be processed once, or can be processed in a more efficient way?
sed '/INSERT INTO `attendance_log`/d' input.sql | \
sed '/INSERT INTO `analytics_models_log`/d' | \
sed '/INSERT INTO `backup_logs`/d' | \
sed '/INSERT INTO `config_log`/d' | \
sed '/INSERT INTO `course_completion_log`/d' | \
sed '/INSERT INTO `errorlog`/d' | \
sed '/INSERT INTO `log`/d' | \
sed '/INSERT INTO `logstore_standard_log`/d' | \
sed '/INSERT INTO `mnet_log`/d' | \
sed '/INSERT INTO `portfolio_log`/d' | \
sed '/INSERT INTO `portfolio_log`/d' | \
sed '/INSERT INTO `prog_completion_log`/d' | \
sed '/INSERT INTO `local_amosdatasend_log_entry`/d' | \
sed '/INSERT INTO `totara_sync_log`/d' | \
sed '/INSERT INTO `prog_messagelog`/d' | \
sed '/INSERT INTO `stats_daily`/d' | \
sed '/INSERT INTO `course_modules_completion`/d' | \
sed '/INSERT INTO `question_attempt_step_data`/d' | \
sed '/INSERT INTO `scorm_scoes_track`/d' | \
sed '/INSERT INTO `question_attempts`/d' | \
sed '/INSERT INTO `grade_grades_history`/d' | \
sed '/INSERT INTO `task_log`/d' > reduced.sql
Is this idea going in the right direction?
cat input.sql | sed '/INSERT INTO `analytics_models_log`/d' | sed '/INSERT INTO `backup_logs`/d' | sed '/INSERT INTO `config_log`/d' | sed '/INSERT INTO `course_completion_log`/d' | sed '/INSERT INTO `errorlog`/d' | sed '/INSERT INTO `log`/d' | sed '/INSERT INTO `logstore_standard_log`/d' | sed '/INSERT INTO `mnet_log`/d' | sed '/INSERT INTO `portfolio_log`/d' | sed '/INSERT INTO `portfolio_log`/d' | sed '/INSERT INTO `prog_completion_log`/d' | sed '/INSERT INTO `local_amosdatasend_log_entry`/d' | sed '/INSERT INTO `totara_sync_log`/d' | sed '/INSERT INTO `prog_messagelog`/d' | sed '/INSERT INTO `stats_daily`/d' | sed '/INSERT INTO `course_modules_completion`/d' | sed '/INSERT INTO `question_attempt_step_data`/d' | sed '/INSERT INTO `scorm_scoes_track`/d' | sed '/INSERT INTO `question_attempts`/d' | sed '/INSERT INTO `grade_grades_history`/d' | sed '/INSERT INTO `task_log`/d' > reduced.sql
Solution
If you have multiple sed ... | sed ...
you can combine them by writing sed -e ... -e ...
or sed ...;...
. But in this case there is an even more efficient method:
sed -E '/INSERT INTO `(attendance_log|analytics_models_log|...)`/d'
Alternatively, switch to grep
which could be even faster:
grep -vE 'INSERT INTO `(attendance_log|analytics_models_log|...)`'
or
grep -vFf <(printf 'INSERT INTO `%s`\n' attendance_log analytics_models_log ...)
You could even try to replace all ..._log
and logs
with a regex, if this is what you want. With this, you only have to explicitly list the non-log files:
INSERT INTO `([^`]*logs?|local_amosdatasend_log_entry|stats_daily|...)`
Answered By - Socowi Answer Checked By - Dawn Plyler (WPSolving Volunteer)