The data was found by the Cyber Risk Research team at security provider UpGuard which has made several similar discoveries in the past.
The data in question was collected by the US central Command and the US Pacific Command and was in three publicly downloadable cloud-based storage servers.
UpGuard said one of the buckets contained about 1.8 billion posts collected over the past eight years. The content came from news sites, comment sections, Web forums, and social media sites like Facebook in multiple languages and from countries around the world.
In the past, UpGuard has found misconfigured Amazon Web Services S3 buckets leaking data from global corporate consulting and management firm Accenture, publisher Dow Jones, a Chicago voter database, a North Carolina security firm, and a contractor for the US National Republican Committee.
The discovery was made by Chris Vickery, the director of the company's Cyber Risk Research unit. The three S3 buckets were configured to allow any AWS global authenticated user to look at or download the data. A free sign-up provides an AWS account of this type.
The sub-domain names of the buckets were “centcom-backup,” “centcom-archive,” and “pacom-archive”, which gave an indication of where the data within came from.
UpGuard wrote: "CENTCOM refers to the US Central Command, based in Tampa, Florida, and responsible for US military operations from East Africa to Central Asia, including the Iraq and Afghan Wars. PACOM is the US Pacific Command, headquartered in Aiea, Hawaii, and covering East, South, and Southeast Asia, as well as Australia and Pacific Oceania."
Within the bucket "centcom-backup", there were indications that the software was used by a vendor known as VendorX. The application, known as Outpost, is a Pentagon social engineering effort.
UpGuard has a detailed blog post about the find here.