stepbystep:archi1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
stepbystep:archi1 [2021/03/10 20:44]
giancarlo [Metadata Display]
stepbystep:archi1 [2021/05/24 18:25] (current)
giancarlo
Line 7: Line 7:
   * [[stepbystep:drupal|Drupal 9]]   * [[stepbystep:drupal|Drupal 9]]
   * [[stepbystep:archi1|Archipelago]]   * [[stepbystep:archi1|Archipelago]]
 +  * [[stepbystep:maintenance|Maintenance]]
 </nav> </nav>
 ====== Archipelago ====== ====== Archipelago ======
Line 31: Line 32:
 Successfully enabled: ctools_views Successfully enabled: ctools_views
 Successfully enabled: bamboo_twig, bamboo_twig_config, bamboo_twig_file, bamboo_twig_loader, bamboo_twig_path, bamboo_twig_security, bamboo_twig_token Successfully enabled: bamboo_twig, bamboo_twig_config, bamboo_twig_file, bamboo_twig_loader, bamboo_twig_path, bamboo_twig_security, bamboo_twig_token
 +Successfully enabled: jquery_ui_datepicker, jquery_ui
 </code> </code>
 Browse UI at admin/config/services/jsonapi and enable JSONAPI Accept all JSON:API create, read, update, and delete operations. Browse UI at admin/config/services/jsonapi and enable JSONAPI Accept all JSON:API create, read, update, and delete operations.
Line 127: Line 129:
 <code bash> <code bash>
 $ vendor/bin/drush config:export --destination=~/bckconfig $ vendor/bin/drush config:export --destination=~/bckconfig
 +</code>
 +Make site admin member of administrator group
 +<code bash>
 +$ vendor/bin/drush urol administrator "MysiteAdministrator"
 </code> </code>
 Then sync configurations Then sync configurations
Line 978: Line 984:
 |            | codemirror_editor.settings | Update    | |            | codemirror_editor.settings | Update    |
 +------------+----------------------------+-----------+ +------------+----------------------------+-----------+
 +$ mv ~/uploadconfig/* ~/uploaded/
 +</code>
 +<code bash>
 +$ mv ~/archipelago-deployment-1.0.0-RC2D9/config/sync/pathauto.pattern.digital_object_uuid.yml ~/uploadconfig/
 +$ vendor/bin/drush config:import --partial --source=~/uploadconfig
 ++------------+--------------------------------------+-----------+
 +| Collection | Config                               | Operation |
 ++------------+--------------------------------------+-----------+
 +|            | pathauto.pattern.digital_object_uuid | Create    |
 ++------------+--------------------------------------+-----------+
 +$ mv ~/uploadconfig/* ~/uploaded/
 +
 +$ mv ~/archipelago-deployment-1.0.0-RC2D9/config/sync/pathauto.settings.yml ~/uploadconfig/
 +$ vendor/bin/drush config:import --partial --source=~/uploadconfig
 ++------------+-------------------+-----------+
 +| Collection | Config            | Operation |
 ++------------+-------------------+-----------+
 +|            | pathauto.settings | Update    |
 ++------------+-------------------+-----------+
 $ mv ~/uploadconfig/* ~/uploaded/ $ mv ~/uploadconfig/* ~/uploaded/
 </code> </code>
Line 1034: Line 1059:
 $ which exiftool $ which exiftool
 /usr/bin/exiftool /usr/bin/exiftool
- +</code> 
-sudo -s apt install  poppler-utils (pdfinfo)+<code bash> 
 +sudo -s apt install  poppler-utils
 $ pdfinfo -v $ pdfinfo -v
 pdfinfo version 0.86.1 pdfinfo version 0.86.1
Line 1042: Line 1068:
 $ which pdfinfo $ which pdfinfo
 /usr/bin/pdfinfo /usr/bin/pdfinfo
 +</code> 
 +<code bash>
 wget https://github.com/openpreserve/fido/archive/v1.4.1.zip wget https://github.com/openpreserve/fido/archive/v1.4.1.zip
 unzip v1.4.1.zip unzip v1.4.1.zip
Line 1051: Line 1078:
 $ which fido $ which fido
 /usr/local/bin/fido /usr/local/bin/fido
 +</code>
 +OCR tools
 +<code bash>
 +$ sudo apt install tesseract-ocr
 +$ sudo apt install tesseract-ocr-ita
 +$ tesseract -v
 +tesseract 4.1.1
 + leptonica-1.79.0
 +  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 + Found AVX
 + Found SSE
 + Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4
 +$ which tesseract
 +/usr/bin/tesseract
 +</code>
 +<code bash>
 +$ sudo apt install pdf2djvu
 +$ pdf2djvu --version
 +pdf2djvu 0.9.17
 ++ DjVuLibre 3.5.27
 ++ Poppler 0.86.1
 ++ GraphicsMagick++ 1.3.35 (Q16)
 ++ Exiv2 0.27.2
 +$ which pdf2djvu
 +/usr/bin/pdf2djvu
 +</code>
 +<code bash>
 +$ sudo apt install python python-lxml python3-djvu
 +$ wget http://nl.archive.ubuntu.com/ubuntu/pool/universe/p/python-djvulibre/python-djvu_0.8-3_amd64.deb
 +$ sudo dpkg -i python-djvu_0.8-3_amd64.deb
 +$ sudo apt install python-subprocess32 libdjvulibre-dev libdjvulibre21
 +$ sudo apt install make
 +$ wget https://codeload.github.com/jwilk/ocrodjvu/zip/0.12
 +$ unzip 0.12
 +$ cd ocrodjvu-0.12/
 +$ sudo make install
 +python - < lib/__init__.py  # Python version check
 +sed -e "1 s@^#!.*@#!/usr/bin/python@" -e "s#^basedir = .*#basedir = '/usr/local/share/ocrodjvu/'#" ocrodjvu > ocrodjvu.tmp
 +install -d /usr/local/bin
 +install ocrodjvu.tmp /usr/local/bin/ocrodjvu
 +rm ocrodjvu.tmp
 +sed -e "1 s@^#!.*@#!/usr/bin/python@" -e "s#^basedir = .*#basedir = '/usr/local/share/ocrodjvu/'#" hocr2djvused > hocr2djvused.tmp
 +install -d /usr/local/bin
 +install hocr2djvused.tmp /usr/local/bin/hocr2djvused
 +rm hocr2djvused.tmp
 +sed -e "1 s@^#!.*@#!/usr/bin/python@" -e "s#^basedir = .*#basedir = '/usr/local/share/ocrodjvu/'#" djvu2hocr > djvu2hocr.tmp
 +install -d /usr/local/bin
 +install djvu2hocr.tmp /usr/local/bin/djvu2hocr
 +rm djvu2hocr.tmp
 +install -d /usr/local/share/ocrodjvu/lib/
 +install -p -m644 lib//*.py /usr/local/share/ocrodjvu/lib/
 +install -d /usr/local/share/ocrodjvu/lib/cli
 +install -p -m644 lib/cli/*.py /usr/local/share/ocrodjvu/lib/cli
 +install -d /usr/local/share/ocrodjvu/lib/engines
 +install -p -m644 lib/engines/*.py /usr/local/share/ocrodjvu/lib/engines
 +umask 022 && python -m compileall -q /usr/local/share/ocrodjvu/lib/
 +# run "make -C doc" to build the manpages
 +
 +$ djvu2hocr --version
 +djvu2hocr 0.12
 ++ Python 2.7.18
 ++ subprocess32
 ++ python-djvulibre 0.8
 ++ lxml 4.5.0
 ++ html5lib-python 1.0.1
 +$ which djvu2hocr
 +/usr/local/bin/djvu2hocr
 +</code>
 +Compile and install pdfalto
 +<code bash>
 +$ sudo apt-get install cmake pkg-config build-essential
 +
 +$ wget https://github.com/kermitt2/pdfalto/archive/refs/tags/0.4.zip
 +$ unzip 0.4.zip
 +$ cd pdfalto-0.4/
 +$ ./install_deps.sh
 +$ git clone https://github.com/kermitt2/xpdf-4.03
 +$ cmake .
 +$ make
 +$ cd ..
 +$ sudo mv pdfalto-0.4 /usr/local/src/
 +$ sudo ln -s /usr/local/src/pdfalto-0.4/pdfalto /usr/local/bin/pdfalto
 +
 +$ pdfalto
 +pdfalto version 0.4
 +Usage: pdfalto [options] <PDF-file> [<xml-file>]
 +  -f <int>                      : first page to convert
 +  -l <int>                      : last page to convert
 +  -verbose                      : display pdf attributes
 +  -noImage                      : do not extract Images (Bitmap and Vectorial)
 +  -noImageInline                : do not include images inline in the stream
 +  -outline                      : create an outline file xml
 +  -annotation                   : create an annotations file xml
 +  -noLineNumbers                : do not output line numbers added in manuscript-style textual documents
 +  -readingOrder                 : blocks follow the reading order
 +  -noText                       : do not extract textual objects (might be useful, but non-valid ALTO)
 +  -charReadingOrderAttr         : include TYPE attribute to String elements to indicate right-to-left reading order (might be useful, but non-valid ALTO)
 +  -fullFontName                 : fonts names are not normalized
 +  -nsURI <string>               : add the specified namespace URI
 +  -opw <string>                 : owner password (for encrypted files)
 +  -upw <string>                 : user password (for encrypted files)
 +  -filesLimit <int>             : limit of asset files be extracted
 +  -q                            : don't print any messages or errors
 +  -v                            : print version info
 +  -h                            : print usage information
 +  -help                         : print usage information
 +  --help                        : print usage information
 +  -?                            : print usage information
 +</code>
 +Update pdfalto to master for an issue on namespace
 +<code bash>
 +$ git clone https://github.com/kermitt2/pdfalto.git
 +$ cd pdfalto/
 +$ ./install_deps.sh
 +$ git clone https://github.com/kermitt2/xpdf-4.03
 +$ cmake .
 +$ make
 +$ cd ..
 +$ sudo mv pdfalto /usr/local/src/pdfalto-0.5-SNAPSHOT
 +$ sudo rm /usr/local/bin/pdfalto
 +$ sudo ln -s /usr/local/src/pdfalto-0.5-SNAPSHOT/pdfalto /usr/local/bin/pdfalto
 +
 +$ pdfalto
 +pdfalto version 0.5
 +Usage: pdfalto [options] <PDF-file> [<xml-file>]
 +  -f <int>                      : first page to convert
 +  -l <int>                      : last page to convert
 +  -verbose                      : display pdf attributes
 +  -noImage                      : do not extract Images (Bitmap and Vectorial)
 +  -noImageInline                : do not include images inline in the stream
 +  -outline                      : create an outline file xml
 +  -annotation                   : create an annotations file xml
 +  -noLineNumbers                : do not output line numbers added in manuscript-style textual documents
 +  -readingOrder                 : blocks follow the reading order
 +  -noText                       : do not extract textual objects (might be useful, but non-valid ALTO)
 +  -charReadingOrderAttr         : include TYPE attribute to String elements to indicate right-to-left reading order (might be useful, but non-valid ALTO)
 +  -fullFontName                 : fonts names are not normalized
 +  -nsURI <string>               : add the specified namespace URI
 +  -opw <string>                 : owner password (for encrypted files)
 +  -upw <string>                 : user password (for encrypted files)
 +  -filesLimit <int>             : limit of asset files be extracted
 +  -q                            : don't print any messages or errors
 +  -v                            : print version info
 +  -h                            : print usage information
 +  -help                         : print usage information
 +  --help                        : print usage information
 +  -?                            : print usage information
 </code> </code>
  • stepbystep/archi1.1615405441.txt.gz
  • Last modified: 2021/03/10 20:44
  • by giancarlo