IHAC that is processing e-mails. These e-mails can have large attachments (more than 1MB). The desire is to index both the e-mail and the attachments. I do understand that there is a stage that allows to break apart e-mails into multiple documents for processing, however, the customer needs one e-mail (attachments included) to be equal to 1 Solr document, as they find the comparison of number of Solr documents in the e-mail index would be equal to the number of actual e-mails received.
What was attempted is their custom plugin put attachments into multi-valued string data type for the e-mail document. That way they could achieve being able to search for all e-mails (and its attachments) that have a phrase like "dog park" without getting false positives for one attachment having "dog" and the other having "park". Problem with this approach is they easily get into the issue where an attachment is larger than 1MB thus does not fit into a string value.
Then the thought was perhaps having a multi-valued "stream". Is this possible? Any other suggestions on how to accomplish this?