Mass update of Docs metadata

Continuing the discussion from How to pull activity data from Pagure:

Problem description

What I want to do is mass update of the following metadata by appending the files with defined category and tags for each article. I can run the ls command on cloned repository, but what filters/regex would extract Docs metadata with one command?

:category: CATEGORY
:tags: TAG_01 TAG_02 TAG_03 ... TAG_n

Example of Docs metadata to be updated in bulk

filename category tag1 tag2 tag3
performing-administration-tasks-using-sudo.adoc Administration tutorial
postgresql.adoc Installation How-to server
proc_setting-key-shortcut.adoc Managing software How-to Troubleshooting
publish-rpm-on-copr.adoc Upgrading tutorial
qemu.adoc server
raspberry-pi.adoc
reset-root-password.adoc
root-account-locked.adoc
samba.adoc

Check the number of target files

On cloned file directory “modules/ROOT/pages”, run the following command.

$ find -type f -name "*.adoc" | wc -l

Rough process steps

  1. List the files and metadata and save them onto a CSV file
  2. Update metadata on a CSV file
  3. List the files and metadata, loop over them, append the text with echo "text" >> $file

Question on point 1 above

How would you list file names and metadata from Pagure and download them from multiple files (more than 250) with a single command? Metadata is mostly empty.

One may be able to get the file list using the Pagure API, but to get the metadata, I expect one has to parse the file to grep it etc.?

I don’t know if one command would be enough, but a script that goes something like this would work?

# clone all repos using a for loop in a dir somewhere
git clone "$REPO_URL"

# use rg/grep to extract metadata to a file
category_result="$(rg -i '^:category:' -g '*.adoc' --no-heading -H | sed 's/:category://')"
category="$(echo $category_result | cut -d ':' -f2)"
category_fn="$(echo $category_result | cut -d ':' -f1)
tags="$(rg -i "^:tags:' -g $category_fn --no-heading  -I| sed -e 's/:tags: //' | tr ' ' ',')"
echo "$category_fn,$category,$tags" >> myfile.csv
..

# process csv, add to files as required

This only gets tags from each file that has a category, so the assumption is that each file must have a category if it has tags. The logic can also be reversed. If there’s no guarantee that files will have both, we’re probably going to do two csv files, one for category, and one for tags, and then merge them column wise using paste.

What do you think?

Thanks for your generous help. I’m better informed by your script. As a beginner of bash shell scripting, I have the following alert when running the script. Could you advise how to correct it? Thanks in advance!

> line 15: unexpected EOF while looking for matching )'`

I ran the shellcheck utility to get sneak peak on bash shell syntax suggestions. Please see below.

line 9:
category_fn="$(echo $category_result | cut -d ‘:’ -f1)
^-- SC1009 (info): The mentioned syntax error was in this variable assignment.
^-- SC1078 (warning): Did you forget to close this double quoted string?

line 10:
tags="$(rg -i “^:tags:’ -g $category_fn --no-heading -I| sed -e ‘s/:tags: //’ | tr ’ ’ ‘,’)”
^-- SC1079 (info): This is actually an end quote, but due to next char it looks suspect.
^-- SC1073 (error): Couldn’t parse this command expansion. Fix to allow more checks.

line 15:

^-- SC1072 (error): Expected end of $(…) expression. Fix any mentioned problems and try again.

1 Like

Without overlooking the script completely, the number of " in this line don’t match. The "^: looks suspicious to me. Probably it’s '^:tags:’

1 Like

Uploaded the script (before I tinker with) to my personal project repo.

[1]


  1. My reference and holiday reading list
    Bash shell scripting for beginners
    The Linux Command Line by William Shotts ↩︎

1 Like

Yeh, that should do it.

I think maybe a bash script is not the right tool for this. Things like quoting and multiline strings are tricky to get right. I’ll tinker with it a little more, but may end up resorting to Python or something if a bash script is too complex :thinking:

I opened a PR with a much better solution:

It gets the categories and tags lists to different files, and then we use join to combine them column wise based on the file name field.

I retyped the line that threw error code in the shellcheck (showing SC1112 (warning): This is a unicode quote. Delete and retype it (or ignore/doublequote for literal).

A CSV file is created successfully. Thanks.

A CSV file shows a duplicated entry of using-yubikeys.adoc file. There are two lines of data. I’ll also try different regex parameters to get the right result for the target files.

Your new bash script works! Six files are created. A big thank you.

1 Like