Sie sind auf Seite 1von 8

Grammar checking process and API

Involved "objects":

one or more documents to be checked one or more grammar checker implementations each possibly supporting multiple languages. A grammar checker may choose to run in it's own thread. one or more grammar check dialogs (at most one per document) one context menu when clicking on text marked as incorrect a global grammar checking iterator (common to all documents and applications) implemented as singleton running in a thread of its own and checking one sentence (of an arbitrary document) at a time.

Required tasks:

Automatic grammar checking Interactive grammar checking via dialog Interactive grammar checking via context menu

Overview of the basic interfaces required:

(1) XFlatParagraph (To be implemented by a new class SwXFlatParagraph which holds a simple pointer to a SwTxtNode - the Writers implementation of a text paragraph. Especially the implementation object must not be the SwXParagraph since it uses SwUnoCursor's and deleting a paragraph will just have the cursor point to the next paragraph instead.) Gives access to the "flat" text of a paragraph (that is the content of fields will be included) by providing it as a simple string. All operations that need to specify sub-strings will use position and length parameters. Beside giving access to the string and allowing some simple manipulations of the text or it's language attributes, this object specifically has two functions: isValid that will yield true if the respective text node has become invalid (e.g. deleted) and isModified to indicate the content has been edited meanwhile (this two flags need to be taken care of by the SwTxtNode implementation). In both cases the results of grammar checking in this specific paragraph have to be discarded and the paragraph needs to be processed again later. Finally this interface will allow to place (and remove) visual markings for incorrect text parts. Please note when we talk about paragraphs in the following text that, unless otherwise stated, it will always be about the text represented/accessed by a XFlatParagraph interface. This may not necessarily be a paragraph as in the documents context. For example it would be possible to think of a complete enumeration (which actually contains several paragraphs) as single paragraph to be handed to the grammar checker and thus allowing for a single sentence to be recognized in it's whole even if it spans several 'real' paragraphs. But don't expect the implementation to cover this problem any time soon. 1/8

(2) XFlatParagraphIterator (Probably to be implemented by SwDoc or a object created by SwDoc for that purpose. It would even be possible to have it implemented by SwXFlatParagraph.) The only absolutely necessary function would be getNextParagraph (which does not even need to know about the current paragraph!) and is to return a XFlatParagraph interface to the next paragraph to be checked. The empty reference means there is nothing left to be checked for now. The order of the iteration should probably be in reading order but is entirely left to the implementation. Thus especially the following will be allowed:

The iteration may skip paragraphs that have been already checked and have not been modified since. The iteration may end prematurely. For example if automatic grammar checking was meanwhile disabled. A full iteration will automatically wrap-around at the end of the document if the iteration was not started at the beginning. Theoretically, for automatic grammar checking, it would also be Ok to iterate more than once over the same paragraph, e.g. if it was modified again before the whole document was processed.

There might be need for another function though that explicitly sets all paragraphs to notyet-checked and has the iteration start at the very beginning the document. (3) XGrammarChecker The grammar checker is always presented with the text of the whole paragraph. If it has needs to do so it may check all the (previous) the text in the paragraph but it must only report errors within the bound of the current sentences. It is required to return all errors in that sentence at once (since this is considered to be the best for the user). (4) XGrammarCheckingIterator The object implementing this interface is the mediator between the grammar checkers and the document (which both should not know about each other). Especially it provides the grammar checking dialog and the context menu with the required data and interfaces to change the text. (5) XGrammarCheckingResultListener This interface provides a call-back function that is used by the GrammarCheckingIterator to provide the specific client with the result of the grammar checking and have it act accordingly (e.g. fill the context menu or have the dialog show the new sentence with it's errors and corrections).

Sample process of automatic grammar checking

The document will get access to the GrammarCheckingIterator and request checking the document by providing:

a unique interface to the document (to be used to identify this document) the XFlatParagraph interface to the first paragraph to check (e.g. one in the visual area of the document, or the very first) 2/8

the starting position of the first sentence (for automatic checking this should be always 0) and a flag indicating that this is for automatic checking only and thus no suggestions are required and no dialog must be displayed.

The GrammarCheckingIterator (which runs in it's own thread and implements a main loop in which it is going to check the grammar) maintains a queue of sentences to be processed. When called with the above arguments it creates an entry consisting of those four values and adds them add the end of the queue. It then returns from the function call. For the sake of simplicity for now let's assume there is only one document to be processed. Thus (since there are no further API calls) the GrammarCheckingIterator will now enter it's main loop and dequeue the first element from the queue (which is the one we just added). If the XFlatParagraph states that one to be still valid (and unmodified?) it retrieves the text of the paragraph checks the Breakiterator for a suggested end-of-sentence position (that is indicated by it's starting position) and, after identifying the language to use, calls all the respective grammar checkers synchronously(!) one-by-one to check that single sentence. Please note that all the asynchronity we require to have for grammar checking is thus implemented in the GrammarCheckingIterator only, and each grammar checker implementation should run in the same thread since there is no advantage in not doing so. This way grammar checking is only virtually parallel but since the GrammarCheckingIterator is the bottle neck for all it also prevents grammar checking from taking too much CPU time even if there are large number of requests at the same time. For the results returned by each grammar checker we first check if the XFlatParagraph is still valid and not modified. If so we remove all previous outdatetd markings for this sentence and then mark all the incorrect text parts otherwise we discard the results silently. When the last grammar checker result for this sentence has been processed and there is still unprocessed text left in the paragraph the GrammarCheckingIterator will create a new entry for the queue consisting of

the same reference to the document the same XFlatParagraph interface the starting position for the next sentence (which is the end-of-sentence position returned by the grammar checkers for the current sentence) and the flag indicating automatic grammar checking

and add it add the end of the queue. If the paragraph has been checked completely this way then the getNextParagraph function from the XFlatParagraphIterator interface is called to retrieve the next paragraph to be checked. If there is one found we start anew as described above by adding this new paragraph to the end of the queue. If not the document is considered to be completely checked, and if it is to be checked again in the future it is the documents task to start grammar checking once more.

Sample process of interactive grammar checking

There are two basic differences when comparing interactive grammar checking with automatic checking:

the results of grammar checking a sentence needs to be interactively post-processed by the user. 3/8

each grammar checker is allowed to make use of it's own implementation of a grammar checking dialog and another dialog to view and modify implementation specific options as well. The 'options dialog' should have two entry points: one accessible from a tool-bar, and the other one would be a button in the grammar checking dialog. If the grammar checker features only an option dialog but not a grammar checker dialog the office internal dialog must be able to start that option dialog. (See questions and problem section as well!) and due to some grammar checkers requiring the text of previous sentences in the paragraph to be known in order to determine if the current one is correct one can not just simply check one sentence after another if a change is applied. If for example the first two sentences are without error and the third sentence got corrected by the user we can't simply proceed to the fourth sentence. Because it can't be figured out what the specific grammar checker implementation keeps track of it can't be helped but to throw everything away and tell that grammar checker that a new paragraph is to be started. Thus we need to have the grammar checker check the first three sentences (without reporting any error for them) in order to build up the internal data to check the forth sentence. Only then we can pass the fourth sentence on to the grammar checker and expect the results to be correct. And for all the following sentences of that paragraph we have to do it all over again. One slightly different approach would be that not the iterator has to pass all the previous sentences on to the checker again but instead have it done by the grammar checker itself implicitly if it has need to do so (preferred way). After all the grammar checker is always given the whole text along with the sentence-start-position. But the grammar checker implementation needs to be aware of that by doing so it may encounter sentences in languages it does not know about and that would usually not have been passed to this specific checker.

Going with the preferred way of having the grammar checker scan previous text implicitly if needs be, interactive checking looks like this: The document determines the first paragraph to be checked (for example the one where the cursor is displayed). In order to have it a little less complicated when determining if the whole document was processed we probably like to start checking at the beginning of the paragraph and not a specific sentence within even if the cursor is placed e.g. in the last sentence (this can be discussed though). That is when the starting paragraph is determined the document accesses the GrammarCheckingIterator and provides similar data as for automatic checking:

the unique reference to the document the XFlatParagraph interface to the first paragraph to be checked the start-of-sentence position of the first sentence. Here 0. and the flag indicating interactive checking now also now a reference to a XGrammarCheckingResultListener interface, implemented by the dialog, that is used by the GrammarCheckingIterator as call-back to provide the dialog with the text, data and results to be displayed.

Again the GrammarCheckingIterator creates an entry for the queue from this but now it places that entry at the start of the queue instead at the end. This way interactive checking will take precedence over automatic checking and the latest UI triggered request will be at the top of the queue and gets processed next. As long as no error is found by the grammar checkers the iteration and the tasks to be done in each iteration are the same as for automatic checking. That is aside from the flag for new queue entries 4/8

indicating interactive checking and those entries being added at the start of the queue. For sake of simplicity we stick to only one single grammar checking dialog used by all checkers here in this text! If one or more of the grammar checkers reports an error with the current sentence then the error reports from all the checkers are collected and the grammar checking dialog is started (if not already open, see below) and filled with the necessary data by the GrammarCheckingIterator (the text and the complete list of errors). The iterator will not wait for the dialog to be finished or to advance to the next sentence, it will continue with it's own tasks (e.g. entering it's main loop and start checking a sentence from another document). The dialog will only show the very sentence the error was found in and has to allow for at least

showing all the error positions (preferably all at once), reviewing each errors (displaying the detailed information about that error) and suggestions for corrections, modifying the sentences text freely, changing the language of text parts or all the text, ignoring the errors and continuing with the next sentence, committing the changes made and continue with checking (as long as the paragraph was not modified or invalidated meanwhile), if that very paragraph was modified meanwhile there will be a button that allows the dialog to discard the changes (that are not yet applied) and restart checking with the sentence the cursor currently is in (which may be in a completely different paragraph) by adding that to the top of the queue (if anything is left), and if the paragraph was invalidated (deleted) the changes in the dialog are to be discarded as well and getNextParagraph should be called to continue checking and (if anything is left) thus adding the next sentence to be checked to the top of the queue, or canceling the interactive checking and closing the dialog.

If the changes are committed they are applied to the paragraph by using the XFlatParagraph interface. Then if there is still text left in the paragraph the next sentence is added at the start of the queue (as described above). If the paragraph was processed the getNextParagraph function is called to get the next paragraph to be checked, if no such paragraph is found the iteration is finished and the dialog can be closed. Otherwise we continue by putting an entry for interactively checking the first sentence of the new found paragraph at the start of the queue. (Either way the entry needs to have the XGrammarCheckingResultListener reference set in order to provide the dialog with new data to be displayed when the next sentence with errors was found.) Then the dialog is left open and the GrammarCheckingIterator takes control again and can proceed with the next entry from the start of the queue. This way the process continues until the next error is found or the iteration over the document is finished. If the dialog is closed (either because the iteration has finished or because the cancel button was pressed) the interactive checking is stopped simply by not adding another entry to the queue. Please note that because the starting point for grammar checking the whole document may vary (be it automatic or interactive) this may result in different errors! For example: In German it is correct to write dolphin either as "Delfin" or as "Delphin". But still one would probably want to enforce 5/8

consistent use of only one of the two spellings. Thus if a grammar checker likes to enforce this it has internally to keep track what spelling was encountered first and reject the other spelling hence forward. Side note: The dialog needs to implement the XComponent interface and the GrammarCheckingIterator needs to be it's listener.

Using the context menu with grammar checking

Opening the context menu by right clicking on a text part that is marked as being incorrect requires yet another approach. The differences here are: Only a single sentence should be checked (but still to do this correctly the grammar checker may need to scan all the previous text in the paragraph) and only those errors/corrections (or part of them if the list gets too long) should be displayed that belong to the respective marked text part. That is only for a subset of all the errors in a sentence the corrections are needed which may leave some room for optimization. Thus when the right-click takes place the document (when creating the menu which is to be done in the main thread) calls the respective function of the GrammarCheckingIterator and an entry similar to interactive checking of that very sentence is added to the start of the queue. The only differences will be that there are some additional values in that entry: one for the starting position of the marked text part, and one for it's length. Thus indicating that the grammar checkers only need to find out errors in that text range and the return value (which usually should hold all errors/corrections for that sentence) needs only to cover that range as well. (On the other hand it would be possible to retrieve all errors and thus behave exactly as interactive checking and just ignore the results that are out of the indicated range.) a flag needs to indicate that this is for the context menu only (and thus there is no need for a iteration to be started, i.e. no further queue entry will be added implicitly when processing this entry) also a reference to the XGrammarCheckingResultListener interface that is used by the GrammarCheckingIterator to provide the context menu with the results is needed. (Naturally this implementation of the interface is a different one then the one used in the dialog for interactive checking.) Since the call to the GrammarCheckingIterator is asynchronously we need to wait a reasonable limited amount of time (e.g. 3 seconds) to receive the results via the call-back. If we do get them in time we can show the context menu as planned. If not, since we can't wait forever, we have to display a fallback menu (either the regular one or one showing an entry like "grammar checking timed out"). Since the context-menu may already be closed (either before the 3 seconds are over or after) when finally the GrammarCheckingIterator is ready to use the call-back function to provide the results, the context-menu needs to implement the XComponent interface and the GrammarCheckingIterator must be it's listener, and it is required to already register as such when the context-menu calls the function to trigger grammar checking for the sentence. Right before the context-menu gets displayed it should already dispose. This would be necessary later anyway and doing it now should prevent the call-back function from being executed belated if grammar checking was too slow (or did not return at all) and the fallback menu is displayed. When everything went fine and the user was able to select a specific correction the XFlatParagraph interface provided as part of the XGrammarCheckingResult will be used to make the changes in the 6/8

text.

Checking several documents at the same time and mixing all the above tasks

Other applications of the iterator concept:

The idea of having a global iterator that iterates over the documents text in using the interface XFlatParagraphIterator and giving access to the a paragraph with the XFlatParagraph interfaces thereby doing "some task" should be applicable as well to the following tasks: word count smart tags spell checking(?) The different needs for the iteration order (or even skipping some paragraphs) might be implemented by using specific iterators or else by giving the iteration function a specific context for the iteration. For example: getNext( eActionContext ) where eActionContext might be one of CONTEXT_WORD_COUNT, CONTEXT_SMART_TAGS, CONTEXT_GRAMMAR_CHECKING

Problems and questions currently left open:


Grammar checking of mixed language text

It is believed that even for sentences that uses several languages there is only a single language the whole sentence is in. (How that language is identified is a completely different matter and probably a complex task though!) And thus that sentence should only be grammar checked in that single language. For example: The German word for television is Fernseher. This sentence should be grammar checked in English and not German If possible though (for example if language attributes are set correctly) it should be noted that Fernseher is not in English and thus at the very least no spelling error should for English should be reported for that word. And probably it is also impossible to report any grammar error that involves embedded foreign words. Thus the best to hope for probably is for the foreign word to be recognized as correct by the respective spell checker. Even with completely embedded sentence like In Gallica Caesar said 'Alea iacta est.' and continued his battle. the above text is in a single language English and not Latin. If an existing grammar checker is smart enough to cope with embedded sentences of a different language I don't know. To keep it simple for the time being the whole text should be grammar checked as one sentence in English and in only that language.

7/8

Grammar checking and spell checking at the same time

Should spell checking have an iterator of it's own with a thread of it's own? Or should spell checking be handled by the GrammarCheckingIterator as well?

Other Questions / problems:

checking is limited to paragraphs (unless the implementation of XFlatParagraph chooses to hide sth. more behind it which is unlikely). Though one could think of enumerations as a possible application for this behavior. in the case of several grammar checkers for one languages, what do we do if they report different end-of-sentence positions? We really can't handle each checker individually here. does a grammar checker that requires knowledge of the previous text in this paragraph need to have those text presented even if it is in a language it does not know? How to achieve consistency of usage (e.g. spelling) when having grammar checkers in multiple languages? E.g. e-mail vs. email? Or does it need to be consistent on a per language base only? How to determine the language of a sentence? Use the language of the first word, or language guessing, or the language with the most words,... ? Problems related to a specific UI, namely the grammar checking dialog still to be defined, not yet covered. The troublesome case of having for example three grammar checkers for one language and two of them wanting to use their own dialog while the third will go with the office internal one is left out. Because if all of them report errors in the same sentence and like to use their own dialog as well we will have to cope with switching between three dialogs just to edit a single sentence. That's just plain awful to even think about. And I doubt there will be even one user to appreciate such a scenario. Should the document (e.g. XFlatParagraph) be in charge to determine the language for checking or should it be the GrammarCheckingIterator? Probably the latter... We may not need the isValid function in the XFlatParagraph interface (and thus go with dispose from the XComponent interface instead) if we keep track of the

8/8

Das könnte Ihnen auch gefallen