With the vast amounts of unstructured data that is accrued through surveillance, body-cameras, mobile phones, and other sources, there is a need to perform data synthesis into natural language through automated methods. Recent advances in machine learning have enabled compression of data sequences into short, compact, informal summaries as keyframes and video thumbnails. Additionally, the capability to generate text that describes an overall image or full motion video has rapidly increased. However, generating text in more formal structures, such as reports, remains a relatively unsolved area of Natural Language Generation (NLG). This work is an initial attempt to understand the gap in the data summarization and document generation problem, specifically for the generation of situation reports and study the data annotations necessary to implement an end-to-end pipeline that would ingest data, summarize it, and generate a situation report for easy consumption by a user.
|