Design - Hashtags
Sunday, September 10, 2023
7:51 AM
OneNote has a rudimentary tagging system that lets users apply one or more "tags" to a paragraph. Each tag is indicated by a unique icon. Users can customize the name, which shows up as a tooltip over the icon. Users can then list tags within a specified scope. However, the Find Tags function is merely a list of tags within the scope; there is no way to search for a specific tag icon or name. Also, the tag itself doesn't provide any context although the Find Tags feature pulls about the first 50 characters from the related paragraph. Many users have complained about the usefulness of this feature and its implementation.
OneMore Page Tags adds a page-level tagging capability where the user can add one or more tag keywords to a special content box (one:Outline) that appears below the title line, next to the date line. The advantage of this feature is that users can quickly search for one or more tags and navigate through a tree of pages annotated with those tags. The disadvantages are that these page-level tags do not provide a useful context related to the content of the page and the use of a specially positioned container at the top of the page is fragile to unintended user manipulation, not to mention needing to treat this container exceptionally by OneMore commands intended to process or manage containers in general.
OneMore Hashtags is a new feature that provides a more traditional tagging feature using hashtag-keywords within the text of a page. Typing a hashtag within the flow of content is a more natural and intuitive approach. This also lets users search for hashtags within a page, across pages in a section, or even across notebooks, while also providing textual context for each hashtag. Future enhancements could include the ability to highlight specific hashtags.
Design
A user can add one or more hashtags to their content, embedded within or in place of any text on a page.
A hashtag may start with either one or two hashtag numbers symbols, such as #hashtag or ##hashtag. Tags may contain letters, digits, hyphens, and underscores. Any other character will terminate the hashtag.
Single hash hashtags must not begin with a number. #a123 is valid but #1abc is not. This is to ensure OneMore differentiates between a hashtag and a numbered sequence like #1, #2, #3, etc. If you want a tag such as #123 then use two number signs, such as ##123.
Enhancements may include the ability to exclude patterns important to programmers, such as HTML Hex colors, like #FFCC00, or C# or C++ pragma directives, like #include or #define.
Valid hashtags examples include
- #hashtag
- #hash-tag
- #hash_tag_12
- ##12345
Out of Scope
Nested tags such as those in Obsidian. These ostensibly emulate a categorization hierarchy. OneNote implements this hierarchy as a structure of notebooks, sections, and section groups. So nested tags would be redundant in OneNote.
Possible Functionality
- Right-click a tag and find related pages
- Tag navigator window
- Search for tags, show pages per each tag
- Click page - navigate to page
- Click page - find "related" pages, other pages that have the same tags (some, all)
- Build map of related tags - those that are mentioned on the same page, along with occurance counts, similar to a relative tag cloud
Scanning
Hashtags are discovered using a scanner class that enumerates notesbooks, sections, and pages. Each page has a lastModifiedTime attribute that we can use to compare against the time of the last scan to optimize each successive scan by skipping pages that haven't changed.
Hashtag Scanning |
HashtagService is created upon OneNote startup as a low priority background thread. It scans all (unlocked) pages in all notebooks. It repeats this every two minutes.
The service uses HashtagScanner as the primary business logic, fabricating a HashtagPageScanner for each page. The page scanner discovers hashtags on the page and returns them as a collection to the scanner, which HashtagProvider to resolve (save, delete) hashtags for the page.
Hashtag Data Store
A number of alternatives were considered.
✗ - Alternative 0 - Scan JIT, In-Memory
Scanning multiple pages on demand. This could be scope to the current page, current section, current notebook, or all notebooks.
Advantages
- Simplicity
Disadvantages
- Time consuming. Scanning about 1500 pages takes anywhere from 21 seconds to over a minute based on system load.
- Not a realistic interactive experience
✗ - Alternative 1 - Save to one:Meta
Create a top-level one:Meta entry on the page (name="omHashtags") making it discoverable using the onenote.FindMeta function. one:Meta max length is 262144 chars. Even if each hashtag is 25 characters, this leaves room for well over 10,000 hahstags on a single page (262144 / 25 = 10485.76)
The meta content could be of the form "##tag1,##tag2, … ,##tagn,"
- Each tag is fully specified, including its double-pound prefix
- Tags are delimeted by a comma
- The last tag is also followed by a comma, making it easy to substring search for a complete hashtag name of the form "##name,"
There is no way to associate a tag with its paragraph. We could expand the scheme to include the paragraph object ID.
"ID1=##tag1,ID2=##tag2,ID2=##tag3, … ,IDn=##tagn,"
This still leaves room for well over 3,000 hashtags on a single page (262144 / 75 = 3495.2533)
Advantages
- Takes advantage of well-established built-in features of the OneMore XML schema and meta searching capabilities.
- No third-party packages are required.
Disadvantage
- The FindMeta function does not search the content of each Meta; it only searches for the name. This means that searching must be done in two steps: discover all pages that have the omHashtags Meta element and then filter on the ones with the target hashtag in the content attribute. This may be slow and inefficient.
- Increases base size of a page, making all OneMore features slower to save through the Interop onenote.UpdatePageContent API.
- Must store last-scan-time someplace, perhaps in the OneMore settings file or a new file in the app data folder.
✗ - Alternative 2 - Save to File.json
Serialize a collection of Hashtag models to JSON, either distributed by scope or including a scope property.
Advantages
- Simple and cheap
- Could work well for relatively smaller data sets
Disadvantages
- Could become a performance bottleneck when the store grows over a certain size.
- Entire store needs to be read for each query and rewritten for every modification.
- No built-in searching capabilities, other than LINQ.Where() or similar filtering; not indexed, not performant.
- Although a user may have less than a couple of hundred tags, those could be duplicated across dozens of pages, quickly multiplying contextual referencing making the stored model quite large and combersome. One solution may be to normalize data into referrenced models but then we're starting to reinvent a DBMS.
✓ - Alternative 3 - Save to Sqlite
Record last scan date/time in a separate control table. Use this timestamp to compare against the lastModified timestamp of each page to know whether we need to rescan an updated page.
Record each hashtag on every page indvidiually. We can capture the hashtag and it's location on the page and when it was recorded. This provides contextual location of each hashtag when searching and displaying to user.
Hashtag ER Model |
|
hashtag_scanner Table
- Contains exactly one row
- scannerID is 0.
- version is used to know when to upgrade the schema, currently set at 1.
- scanTime indicates the timestamp of the most recently compeleted scan. Used to compare against the lastModifiedTime of each page to know whether to scan its contents
hashtag Table
- Each row indicates the existence of at least one ocrrance of a named hashtag in a specified paragraph
- Paragraph is uniquely identified by its objectID (found on pageID)
- snippet captures the context of the tag, including surrounding text
- lastModified could be used to show the age of the hashtag - when it was first discovered
hashtag_page
- Normalizes page references for multiple tags
- notebookID and sectionID provides filtering capabilities
- titleID is used to navigate to the top of a page when already on that page
Advantages
- Solved problem; "quick to market!"
- Can enhance, easier to evolve schema compared to a formatted string stored in a Meta element.
Disadvantage
- Another nuget package and thing to maintain
Hashtag User Interface
✓ - Alternative 0 - Build a new window
Adds complexity and noise, cognitive dissonance.
✗ - Alternative 1 - Integrate with Navigator window
Introduce a tabbed interface to the window. The primary tab will display history navigation, secondary tab will display hashtag searching. Although users may not intuitively look here.
✗ - Alternative 2 - Integrate with Search and Find windows
Both the Search and Copy/Move dialog and the Find Tagged Pages dialog are appropriately relevant to also finding hashtags. Although they are burried beneath the Search menu, there is a more intuitive correlation. An additional "Find Hashtags" command could be added that opens/prefers one of these dialogs.
══════════════════════════════════════════════════════════════════════════════════════════════════
SQLite
- sqlite.org
- SQLStudio
- choco install sqlitestudio
- ADO.NET Provider (nuget, all you need!)
- .NET Framework
- System.Data.Sqlite
- My experience with System.Data.SQLite in C# (techcoil.com)
- .NET Core
- Microsoft.Data.Sqlite
- Overview - Microsoft.Data.Sqlite | Microsoft Learn
- .NET Framework
──────────────────────────────────────────────────────────────────────────────────────────────────
Hashtag Scanning PlantUML (Refresh)
@startuml Hashtag Scanning
skin rose
skinparam defaultFontSize 9
skinparam ParticipantPadding 20
skinparam BoxPadding 80
scale max 500 width
class HashtagScanner
together {
class HashtagPageScanner
class HashtagPageScannerFactory
}
class Hashtag
class HashtagProvider
HashtagService -[hidden] HashtagScanner
HashtagScanner - HashtagProvider : Uses >
HashtagScanner -- HashtagPageScanner : Uses >
HashtagPageScannerFactory - HashtagPageScanner : Creates >
HashtagPageScanner - Hashtag : Discovers >
@enduml
Hashtag ER Model PlantUML (Refresh)
@startuml Hashtag ER Model
skin rose
skinparam ParticipantPadding 20
skinparam BoxPadding 40
scale max 450 width
left to right direction
entity hashtag_scanner {
* scannerID : number
--
* version : number
* scanTime : text
}
entity hashtag {
* tag : text
* objectID : text
--
* moreID : text
* snippet : text
* lastModified : text
}
entity hashtag_page {
* moreID : text
* pageID : text
--
* titleID : text
* notebookID : text
* sectionID : text
* path : text
* name : text
}
hashtag_scanner -[hidden]- hashtag
hashtag ||--|{ hashtag_page : moreID
@enduml
#omwiki #omdeveloper #omdesign
© 2020 Steven M Cohn. All rights reserved.
Please consider a sponsorship or one-time donation to support ongoing development
Created with OneNote.