allow globs in config file paths

The configuration parser will have to be changed again because YAML does not
support asterisks in its key names.
This commit is contained in:
Johann150 2021-02-12 14:50:27 +01:00
parent 49813d0c68
commit fdca530591
No known key found for this signature in database
GPG key ID: 9EE6577A2A06F8F1
5 changed files with 86 additions and 22 deletions

View file

@ -14,6 +14,7 @@ Thank you to @gegeweb for contributing to this release.
* Disabling support for TLSv1.2 can now be done using the `--only-tls13` flag, but this is *NOT RECOMMENDED* (#12).
* The tools now also contain a startup script for FreeBSD (#13).
* Using central config mode (flag `-C`), all configuration can be done in one `.meta` file (see README.md for details).
* The `.meta` configuration file now allows for globs to be used.
### Changed
* The configuration files are now parsed as YAML. The syntax only changes in that a space is now required behind the colon.

7
Cargo.lock generated
View file

@ -6,6 +6,7 @@ version = "2.4.1"
dependencies = [
"env_logger",
"getopts",
"glob",
"log",
"mime_guess",
"once_cell",
@ -95,6 +96,12 @@ dependencies = [
"unicode-width",
]
[[package]]
name = "glob"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9b919933a397b79c37e33b77bb2aa3dc8eb6e165ad809e58ff75bc7db2e34574"
[[package]]
name = "hermit-abi"
version = "0.1.18"

View file

@ -23,6 +23,7 @@ percent-encoding = "2.1"
rustls = "0.19.0"
url = "2.2"
yaml-rust = "0.4"
glob = "0.3"
[profile.release]
lto = true

View file

@ -65,9 +65,20 @@ A file called `index.gmi` will always take precedence over a directory listing.
### Meta-Presets
You can put a file called `.meta` in a directory. This file stores some metadata about these files which Agate will use when serving these files. The file should be UTF-8 encoded. Like the `.directory-listing-ok` file, this file does not have an effect on sub-directories. (*1)
This file is parsed as a YAML file and should contain a "hash" datatype with file names as the keys. This means:
Lines starting with a `#` are comments and will be ignored like empty lines. All other lines must start with a file name, followed by a colon and a space and then the metadata.
You can put a file called `.meta` in any content directory. This file stores some metadata about the adjacent files which Agate will use when serving these files. The `.meta` file must be UTF-8 encoded.
You can also enable a central configuration file with the `-C` flag (or the long version `--central-conf`). In this case Agate will always look for the `.meta` configuration file in the content root directory and will ignore `.meta` files in other directories.
The `.meta` file is parsed as a YAML file and should contain a "hash" datatype with file names as the keys. This means:
* Lines starting with a `#` are comments and will be ignored, as will empty lines.
* All other lines must have the form `<path>: <metadata`, i.e. start with a file path, followed by a colon and a space and then the metadata.
`<path>` is a case sensitive file path, which may or may not exist on disk. If <path> leads to a directory, it is ignored.
If central configuration file mode is not used, using a path that is not a file in the current directory is undefined behaviour (for example: `../index.gmi` would be undefined behaviour).
You can use Unix style patterns in existing paths. For example `content/*` will match any file within `content`, and `content/**` will additionally match any files in subdirectories of `content`.
However, the `*` and `**` globs on their own will by default not match files or directories that start with a dot because of their special meaning (see Directory listing).
This behaviour can be disabled with `--serve-secret` or by explicitly matching files starting with a dot with e.g. `content/.*` or `content/**/.*` respectively.
For more information on the patterns you can use, please see the [documentation of `glob::Pattern`](https://https://docs.rs/glob/0.3.0/glob/struct.Pattern.html).
Rules can overwrite other rules, so if a file is matched by multiple rules, the last one applies.
The metadata can take one of four possible forms:
1. empty
@ -85,14 +96,21 @@ If a line violates the format or looks like case 3, but is incorrect, it might b
Such a configuration file might look like this:
```text
# This line will be ignored.
**.de.gmi: ;lang=de
nl/**.gmi: ;lang=nl
index.gmi: ;lang=en-UK
LICENSE: text/plain;charset=UTF-8
gone.gmi: 52 This file is no longer here, sorry.
```
You can enable a central configuration file with the `-C` flag (or the long version `--central-conf`). In this case Agate will always look for the `.meta` configuration file in the content root directory and will ignore `.meta` files in other directories.
(*1) It is *theoretically* possible to specify information on files which are in sub-directories. The problem would only be to make sure that this file is loaded before the respective path/file is requested. This is because Agate does not actively check that the "no sub-directories" regulation is met. In fact this might be dropped in a change of configuration format in the foreseeable future.
If this is the `.meta` file in the content root directory and the `-C' flag is used, this will result in the following:
requested filename|response header
---|---
`/ ` or `/index.gmi`|`20 text/gemini;lang=en-UK`
`/LICENSE`|`20 text/plain;charset=UTF-8`
`/gone.gmi`|`52 This file is no longer here, sorry.`
any non-hidden file ending in `.de.gmi` (including in non-hidden subdirectories)|`20 text/gemini;lang=de`
any non-hidden file in the `nl` directory ending in `.gmi` (including in non-hidden subdirectories)|`20 text/gemini;lang=nl`
### Logging Verbosity
@ -109,7 +127,6 @@ If you want to serve the same content for multiple domains, you can instead disa
[Gemini]: https://gemini.circumlunar.space/
[Rust]: https://www.rust-lang.org/
[home]: gemini://gem.limpet.net/agate/
[rustup]: https://www.rust-lang.org/tools/install
[source]: https://github.com/mbrubeck/agate
[crates.io]: https://crates.io/crates/agate
[documentation of `env_logger`]: https://docs.rs/env_logger/0.8

View file

@ -1,3 +1,4 @@
use glob::{glob_with, MatchOptions};
use std::collections::BTreeMap;
use std::path::{Path, PathBuf};
use std::time::SystemTime;
@ -69,20 +70,20 @@ impl FileOptions {
/// Checks wether the database for the directory of the specified file is
/// still up to date and re-reads it if outdated or not yet read.
fn update(&mut self, dir: &Path) {
let mut dir = if super::ARGS.central_config {
fn update(&mut self, file: &Path) {
let mut db = if super::ARGS.central_config {
super::ARGS.content_dir.clone()
} else {
dir.parent().expect("no parent directory").to_path_buf()
file.parent().expect("no parent directory").to_path_buf()
};
dir.push(SIDECAR_FILENAME);
db.push(SIDECAR_FILENAME);
let should_read = if let Ok(metadata) = dir.as_path().metadata() {
let should_read = if let Ok(metadata) = db.as_path().metadata() {
if !metadata.is_file() {
// it exists, but it is a directory
false
} else if let (Ok(modified), Some(last_read)) =
(metadata.modified(), self.databases_read.get(&dir))
(metadata.modified(), self.databases_read.get(&db))
{
// check that it was last modified before the read
// if the times are the same, we might have read the old file
@ -99,7 +100,7 @@ impl FileOptions {
};
if should_read {
self.read_database(&dir);
self.read_database(&db);
}
}
@ -109,6 +110,9 @@ impl FileOptions {
log::trace!("reading database {:?}", db);
if let Ok(contents) = std::fs::read_to_string(db) {
self.databases_read
.insert(db.to_path_buf(), SystemTime::now());
let docs = match YamlLoader::load_from_str(&contents) {
Ok(docs) => docs,
Err(e) => {
@ -136,7 +140,7 @@ impl FileOptions {
continue;
};
// generate workspace-unique path
// generate workspace-relative path
let mut path = db.clone();
path.pop();
path.push(rel_path);
@ -175,16 +179,50 @@ impl FileOptions {
PresetMeta::FullMime(header.to_string())
};
self.file_meta.insert(path, preset);
let glob_options = MatchOptions {
case_sensitive: true,
// so there is a difference between "*" and "**".
require_literal_separator: true,
// security measure because entries for .hidden files
// would result in them being exposed.
require_literal_leading_dot: !crate::ARGS.serve_secret,
};
// process filename as glob
let paths = if let Some(path) = path.to_str() {
match glob_with(path, glob_options) {
Ok(paths) => paths.collect::<Vec<_>>(),
Err(err) => {
log::error!("incorrect glob pattern: {}", err);
continue;
}
}
} else {
log::error!("path is not UTF-8: {:?}", path);
continue;
};
if paths.is_empty() {
// probably an entry for a nonexistent file, glob only works for existing files
self.file_meta.insert(path, preset);
} else {
for glob_result in paths {
match glob_result {
Ok(path) if path.is_dir() => { /* ignore */ }
Ok(path) => {
self.file_meta.insert(path, preset.clone());
}
Err(err) => {
log::warn!("could not process glob path: {}", err);
continue;
}
};
}
}
}
} else {
log::error!("no YAML document {:?}", db);
return;
};
self.databases_read.insert(
db.as_path().parent().unwrap().to_path_buf(),
SystemTime::now(),
);
}
} else {
log::error!("could not read configuration file {:?}", db);
}