What I want to do is mass update of the following metadata by appending the files with defined category and tags for each article. I can run the ls command on cloned repository, but what filters/regex would extract Docs metadata with one command?
On cloned file directory “modules/ROOT/pages”, run the following command.
$ find -type f -name "*.adoc" | wc -l
Rough process steps
List the files and metadata and save them onto a CSV file
Update metadata on a CSV file
List the files and metadata, loop over them, append the text with echo "text" >> $file
Question on point 1 above
How would you list file names and metadata from Pagure and download them from multiple files (more than 250) with a single command? Metadata is mostly empty.
One may be able to get the file list using the Pagure API, but to get the metadata, I expect one has to parse the file to grep it etc.?
I don’t know if one command would be enough, but a script that goes something like this would work?
# clone all repos using a for loop in a dir somewhere
git clone "$REPO_URL"
# use rg/grep to extract metadata to a file
category_result="$(rg -i '^:category:' -g '*.adoc' --no-heading -H | sed 's/:category://')"
category="$(echo $category_result | cut -d ':' -f2)"
category_fn="$(echo $category_result | cut -d ':' -f1)
tags="$(rg -i "^:tags:' -g $category_fn --no-heading -I| sed -e 's/:tags: //' | tr ' ' ',')"
echo "$category_fn,$category,$tags" >> myfile.csv
..
# process csv, add to files as required
This only gets tags from each file that has a category, so the assumption is that each file must have a category if it has tags. The logic can also be reversed. If there’s no guarantee that files will have both, we’re probably going to do two csv files, one for category, and one for tags, and then merge them column wise using paste.
Thanks for your generous help. I’m better informed by your script. As a beginner of bash shell scripting, I have the following alert when running the script. Could you advise how to correct it? Thanks in advance!
> line 15: unexpected EOF while looking for matching )'`
I ran the shellcheck utility to get sneak peak on bash shell syntax suggestions. Please see below.
line 9:
category_fn="$(echo $category_result | cut -d ‘:’ -f1)
^-- SC1009 (info): The mentioned syntax error was in this variable assignment.
^-- SC1078 (warning): Did you forget to close this double quoted string?
line 10:
tags="$(rg -i “^:tags:’ -g $category_fn --no-heading -I| sed -e ‘s/:tags: //’ | tr ’ ’ ‘,’)”
^-- SC1079 (info): This is actually an end quote, but due to next char it looks suspect.
^-- SC1073 (error): Couldn’t parse this command expansion. Fix to allow more checks.
line 15:
^-- SC1072 (error): Expected end of $(…) expression. Fix any mentioned problems and try again.
I think maybe a bash script is not the right tool for this. Things like quoting and multiline strings are tricky to get right. I’ll tinker with it a little more, but may end up resorting to Python or something if a bash script is too complex
A CSV file shows a duplicated entry of using-yubikeys.adoc file. There are two lines of data. I’ll also try different regex parameters to get the right result for the target files.